Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué

Size: px
Start display at page:

Download "Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué"

Transcription

1 Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG 1

2 Where are you at? Nonconvex optimization via Newton-CG 2

3 Where are you at? Wisconsin Institute for Discovery (WID) Part of UW-Madison; Multi-disciplinary institute created in 2010; Currently organized around hubs. Me and WID Optimization theme, group of Stephen Wright; Affiliated with the Data Science hub and institute. Nonconvex optimization via Newton-CG 2

4 Where are you at? IFDS: Institute for Foundations of Data Science Hosted at WID, funded by NSF (13 centers US-wide, 3-4 larger institutes selected in 2020); Lead by Stephen Wright; Gathering Math, Stat and CS expertise in Data Science; Nonconvex optimization via Newton-CG 3

5 Context Nonconvex optimization Many data science problems are convex: linear classification, logistic regression,... Yet there is a shift of focus from convex to nonconvex: Because of deep learning; But also in many other problems: matrix/tensor completion, robust statistics,etc. Nonconvex optimization via Newton-CG 4

6 Context Nonconvex optimization Many data science problems are convex: linear classification, logistic regression,... Yet there is a shift of focus from convex to nonconvex: Because of deep learning; But also in many other problems: matrix/tensor completion, robust statistics,etc. Example: Nonconvex formulation of low-rank matrix completion min P Ω(X M) 2 X R n m F,,rank(X )=r Factored reformulation (Burer and Monteiro, 2003) PΩ min (U V M) U R n r,v R m r 2 F M Rn m, Ω [n] [m]., nonconvex in U and V! Nonconvex optimization via Newton-CG 4

7 Nonconvex smooth optimization We consider the smooth unconstrained problem: Assumptions on f min f (x), x R n f C 2 (R n ), bounded below, nonconvex. Nonconvex optimization via Newton-CG 5

8 Nonconvex smooth optimization We consider the smooth unconstrained problem: Assumptions on f min f (x), x R n f C 2 (R n ), bounded below, nonconvex. Definitions in smooth nonconvex minimization First-order stationary point: f (x) = 0; Second-order stationary point: f (x) = 0, 2 f (x) 0. If x does not satisfy these conditions, d such that 1 d f (x) < 0: gradient-related direction. and/or 2 d 2 f (x)d < 0: negative curvature direction specific to nonconvex problems. Nonconvex optimization via Newton-CG 5

9 Examples Nonconvex formulations for low-rank matrix problems (Bhojanapalli et al. 2016, Ge et al. 2017) min f (U V ). U R n r,v R m r Points that satisfy second-order necessary conditions are global minima (or are close in function value); Strict saddle property: any first-order stationary point that is not a local minimum possesses negative curvature. Our goal: develop efficient algorithms to obtain second-order necessary points; We measure efficiency based on complexity. Nonconvex optimization via Newton-CG 6

10 Worst-case complexity analysis Complexity bounds Bound the cost of an algorithm in the worst case; Ubiquitous in theoretical computer science; Major impact in convex optimization (Nemirovski and Yudin 1983). The accelerated gradient case Consider f = min x R n f (x) where f is convex and let ɛ (0, 1). Then, to find x such that f ( x) f ɛ: Gradient descent needs at most O(ɛ 1 ) iterations; Accelerated methods (Nesterov s method, Heavy ball,etc) require at most O(ɛ 1/2 ) iterations. Nonconvex optimization via Newton-CG 7

11 Accelerated methods: illustration Several methods applied to min x R 100 x Ax. Nonconvex optimization via Newton-CG 8

12 Complexity in nonconvex optimization For an algorithm applied to min x R n f (x): Definition (first order) Let {x k } be a sequence of iterates generated by the algorithm and ɛ (0, 1). Worst-case cost to obtain an ɛ-point x K such that f (x K ) ɛ. Focus: Dependency on ɛ. Definition (second order) Let {x k } be the iterate sequence generated by the algorithm, two tolerances ɛ g, ɛ H (0, 1): Worst-case cost to obtain an (ɛ g, ɛ H )-point x K such that f (x K ) ɛ g, λ min ( 2 f (x K )) ɛ H. Focus: Dependencies on ɛ g, ɛ H. Nonconvex optimization via Newton-CG 9

13 Complexity results From nonconvex optimization (2006-) Cost measure: Number of iterations (but those may be expensive); Two types of guarantees: 1 f (x) ɛ g ; 2 f (x) ɛ g and 2 f (x) ɛ H I. Best methods: Second-order methods, deterministic variations on Newton s iteration involving Hessians. Nonconvex optimization via Newton-CG 10

14 Complexity results From nonconvex optimization (2006-) Cost measure: Number of iterations (but those may be expensive); Two types of guarantees: 1 f (x) ɛ g ; 2 f (x) ɛ g and 2 f (x) ɛ H I. Best methods: Second-order methods, deterministic variations on Newton s iteration involving Hessians. Trust Region 1 O ( ɛ 2 ) g 2 O ( max{ɛ 2 g ɛ 1 H, ɛ 3 H }) Gradient Descent 1 O ( ɛ 2 ) g + Negative Curvature 2 O ( max{ɛg 2, ɛ ( 3 H }) Cubic Regularization 1 O ɛ 3 ) 2 g ( 2 O max{ɛ 3 ) 2 g ɛ 1 H, ɛ 3 H } Nonconvex optimization via Newton-CG 10

15 Complexity results (2) Influenced by convex optimization (e.g. learning) Cost measure: gradient evaluations+hessian-vector products main iteration cost. Two types of guarantees: 1 f (x) ɛ; 2 f (x) ɛ and 2 f (x) ɛ 1/2 I. Best methods: developed from accelerated gradient, assume knowledge of Lipschitz constants. Nonconvex optimization via Newton-CG 11

16 Complexity results (2) Influenced by convex optimization (e.g. learning) Cost measure: gradient evaluations+hessian-vector products main iteration cost. Two types of guarantees: 1 f (x) ɛ; 2 f (x) ɛ and 2 f (x) ɛ 1/2 I. Best methods: developed from accelerated gradient, assume knowledge of Lipschitz constants. Gradient descent + random perturbation Accelerated gradient descent + random perturbation Accelerated gradient descent with nonconvexity detection 1, 2 Õ ( ɛ 2) (High probability) 1, 2 Õ(ɛ 7 4 ) (High probability) 1 Õ(ɛ 7 4 ) (Deterministic) Nonconvex optimization via Newton-CG 11

17 In this talk Cover the range of complexity results... Iterations, evaluations, computation; Different choices for ɛ g, ɛ H ; Deterministic, high probability. Nonconvex optimization via Newton-CG 12

18 In this talk Cover the range of complexity results... Iterations, evaluations, computation; Different choices for ɛ g, ɛ H ; Deterministic, high probability....through a single framework... Newton-type iterations, with line search; Main cost: gradient/hessian-vector product; Nonconvex optimization via Newton-CG 12

19 In this talk Cover the range of complexity results... Iterations, evaluations, computation; Different choices for ɛ g, ɛ H ; Deterministic, high probability....through a single framework... Newton-type iterations, with line search; Main cost: gradient/hessian-vector product;...with best complexity guarantees Revisit the Conjugate Gradient algorithm; Exploit its relationship with accelerated gradient methods. Nonconvex optimization via Newton-CG 12

20 Outline 1 Newton-type methods with negative curvature General framework Inexact variants 2 Newton-Capped Conjugate Gradient Conjugate gradient and nonconvex quadratics Newton-Capped CG algorithms 3 Numerical results Nonconvex optimization via Newton-CG 13

21 Outline 1 Newton-type methods with negative curvature General framework Inexact variants 2 Newton-Capped Conjugate Gradient 3 Numerical results Nonconvex optimization via Newton-CG 14

22 Line-search framework Inputs: x 0 R n, θ (0, 1), η > 0, ɛ g (0, 1), ɛ H (0, 1). For k=0, 1, 2,... 1 Compute a direction d k = d k (ɛ g, ɛ H ). 2 Backtracking line search Compute the largest α k {θ j } j N such that f (x k + α k d k ) < f (x k ) η 6 α3 k d k 3. 3 Set x k+1 = x k + α k d k. Nonconvex optimization via Newton-CG 15

23 Line-search framework Inputs: x 0 R n, θ (0, 1), η > 0, ɛ g (0, 1), ɛ H (0, 1). For k=0, 1, 2,... 1 Compute a direction d k = d k (ɛ g, ɛ H ). 2 Backtracking line search Compute the largest α k {θ j } j N such that f (x k + α k d k ) < f (x k ) η 6 α3 k d k 3. 3 Set x k+1 = x k + α k d k. About the line search Guarantee of cubic decrease; Simplest one giving complexity guarantees. Nonconvex optimization via Newton-CG 15

24 Newton s iteration Basics Iteration k: Compute d k by solving the linear system 2 f (x k )d k = f (x k ); and set x k+1 = x k + d k ; Unique direction when 2 f (x k ) 0; Can guarantee global convergence with (e.g.) line search. For nonconvex problems Use a threshold ɛ H for λ min ( 2 f (x k ) ) ; Regularize to ensure 2 f (x k ) + αi ɛ H I ; Second order: Leverage negative curvature directions d such that d 2 f (x k )d ɛ H d 2. Nonconvex optimization via Newton-CG 16

25 Second-order Newton method with line search Inputs: x 0 R n, θ (0, 1), η > 0, ɛ H (0, 1). Fork = 0, 1, 2,... 1 Computation of a search direction d k Compute λ = λ min ( 2 f (x k )); If λ < ɛ H, choose d k as a negative curvature direction such that d k f (x k ) 0, d k 2 f (x k )d k = λ d k 2 ; Otherwise, choose d k as a Newton direction (possibly regularized) by solving ( 2 f (x k ) + 2ɛ H I ) d k = f (x k ). 2 Backtracking line search (unchanged) 3 Set x k+1 = x k + α k d k. Nonconvex optimization via Newton-CG 17

26 Complexity of the second-order Newton method Theorem (Royer and Wright 2018) The method returns x k such that f (x k ) ɛ g and 2 f (x k ) ɛ H I in at most O(max{ɛ 3 g ɛ 3 H, ɛ 3 H }) iterations. Nonconvex optimization via Newton-CG 18

27 Complexity of the second-order Newton method Theorem (Royer and Wright 2018) The method returns x k such that f (x k ) ɛ g and 2 f (x k ) ɛ H I in at most O(max{ɛ 3 g ɛ 3 H, ɛ 3 H }) iterations. ɛ H = ɛg 1/2 : bound in O(max{ɛ 3/2 g, ɛ 3 H })O(ɛ 3/2 g ); Optimal over a class of second-order methods (Cartis, Gould and Toint 2018). Nonconvex optimization via Newton-CG 18

28 Outline 1 Newton-type methods with negative curvature General framework Inexact variants 2 Newton-Capped Conjugate Gradient 3 Numerical results Nonconvex optimization via Newton-CG 19

29 Introducing inexactness We are concerned with inexactness in the step computation; Inexactness in the function/derivatives requires a different treatment Ex) Bergou, Diouane, Kungurtsev and Royer (2018) when f (x) = i f i(x). Nonconvex optimization via Newton-CG 20

30 Introducing inexactness We are concerned with inexactness in the step computation; Inexactness in the function/derivatives requires a different treatment Ex) Bergou, Diouane, Kungurtsev and Royer (2018) when f (x) = i f i(x). Our framework uses matrix operations to compute a search direction: Eigenvalue/eigenvector calculation; Linear system solve. Inexact strategies Iterative linear algebra (with/without randomness) for matrix operations. Main cost: matrix-vector products. Nonconvex optimization via Newton-CG 20

31 From optimization to linear algebra Two types of direction Depending on ɛ H > 0 (minimum eigenvalue estimate): Regularized Newton direction: ( 2 f (x k ) + 2ɛ H I )d = f (x k ), 2 f (x k ) + 2ɛ H I ɛ H I. Sufficient negative curvature direction: d f (x k ) 0, d 2 f (x k )d ɛ H d 2. Nonconvex optimization via Newton-CG 21

32 From optimization to linear algebra Two types of direction Depending on ɛ H > 0 (minimum eigenvalue estimate): Regularized Newton direction: ( 2 f (x k ) + 2ɛ H I )d = f (x k ), 2 f (x k ) + 2ɛ H I ɛ H I. Sufficient negative curvature direction: d f (x k ) 0, d 2 f (x k )d ɛ H d 2. Related linear algebra problems Let H = H R n n, g R n and ɛ H > 0: Solve (H + 2ɛ H I )d = g where λ min (H) > ɛ H ; Compute d such that d Hd ɛ H d 2 otherwise. Nonconvex optimization via Newton-CG 21

33 From optimization to linear algebra Two types of direction Depending on ɛ H > 0 (minimum eigenvalue estimate): Approximate regularized Newton direction: ( 2 f (x k ) + 2ɛ H I )d f (x k ), 2 f (x k ) + 2ɛ H I ɛ H I. Sufficient negative curvature direction: d f (x k ) 0, d 2 f (x k )d ɛ H d 2. Related linear algebra problems Let H = H R n n, g R n and ɛ H > 0: Approximate the solution of (H + 2ɛ H I )d = g where λ > ɛ H, λ λ min (H); Compute d such that d Hd ɛ H d 2 otherwise. Nonconvex optimization via Newton-CG 21

34 Conjugate gradient (CG) for symmetric linear systems Problem: Hd = g, where H = H ɛ H I. Nonconvex optimization via Newton-CG 22

35 Conjugate gradient (CG) for symmetric linear systems Problem: Hd = g, where H = H ɛ H I. Conjugate Gradient (CG) properties Applied with the stopping criterion: Hd + g ξ 2 min { g, ɛ H d }, ξ (0, 1). If κ = λ max (H)/λ min (H), CG terminates in at most { ( )} min n, 1 2 κ log 4κ 5 2 /ξ = min { n, O ( κ log(κ/ξ) )} iterations/matrix-vector products. CG does not explicitly use eigenvalues of H! Nonconvex optimization via Newton-CG 22

36 Lanczos for minimum eigenvalue estimation Key idea (Kuczyński and Woźniakowski 1992): use a random starting vector uniformly distributed on the unit sphere. Nonconvex optimization via Newton-CG 23

37 Lanczos for minimum eigenvalue estimation Key idea (Kuczyński and Woźniakowski 1992): use a random starting vector uniformly distributed on the unit sphere. Lanczos with a random start Let H R n n symmetric with H M, ɛ H > 0, δ (0, 1). With a probability of at most 1 δ, the Lanczos process returns an unit vector v such that v Hv λ min (H) + ɛ H 2 in at most { min iterations/matrix-vector products. n, ln(3n/δ2 ) 2 Corollary: If λ min (H) ɛ H, v Hv ɛ H 2. } M ɛ H Nonconvex optimization via Newton-CG 23

38 Conjugate gradient for minimum eigenvalue estimation? Conjugate gradient and Lanczos work on the same Krylov subspaces (invariant by translation) when started from the same point; If they detect negative curvature, it will be at the same iteration. Nonconvex optimization via Newton-CG 24

39 Conjugate gradient for minimum eigenvalue estimation? Conjugate gradient and Lanczos work on the same Krylov subspaces (invariant by translation) when started from the same point; If they detect negative curvature, it will be at the same iteration. Theorem (Royer, O Neill, Wright 2018) ( Let H R n n symmetric with H M, δ [0, 1), and CG be applied to H + ɛ H2 I ) d = b with b U(S n 1 ). Then, if λ min (H) < ɛ H, CG outputs a direction of (negative) curvature ɛ H 2 in at most { } ln(3n/δ 2 ) M J = min n,. 2 ɛ H iterations, with probability at least 1 δ. Nonconvex optimization via Newton-CG 24

40 Minimum eigenvalue oracles Corollary For the matrix 2 f (x k ), consider: Either CG applied to ( 2 f (x k ) + ɛ H 2 I ) d = b, with b S n 1 ; Or Lanczos applied to 2 f (x k ), starting from b S n 1. Then, for every δ [0, 1), we obtain one of the two outcomes below: 1 a direction of negative curvature ɛ H /2, 2 a certificate that 2 f (x k ) ɛ H I, ( ) using at most Õ min{n, ɛ 1/2 gradients/hessian-vector products, with probability at least 1 δ. H } Nonconvex optimization via Newton-CG 25

41 Minimum eigenvalue oracles Corollary For the matrix 2 f (x k ), consider: Either CG applied to ( 2 f (x k ) + ɛ H 2 I ) d = b, with b S n 1 ; Or Lanczos applied to 2 f (x k ), starting from b S n 1. Then, for every δ [0, 1), we obtain one of the two outcomes below: 1 a direction of negative curvature ɛ H /2, 2 a certificate that 2 f (x k ) ɛ H I, ( ) using at most Õ min{n, ɛ 1/2 gradients/hessian-vector products, with probability at least 1 δ. H } We say that those methods are (minimum) eigenvalue oracles. Nonconvex optimization via Newton-CG 25

42 Inexact Newton-type variants Inputs: x 0 R n, θ (0, 1), η > 0, ɛ H (0, 1). For k = 0, 1, 2,... 1 Computation of a search direction d k Compute λ λ min ( 2 f (x k )) using an eigenvalue oracle; If λ < ɛ H, choose d k as a negative curvature direction such that d k f (x k ) 0, d k 2 f (x k )d k = λ d k 2 ; Otherwise, choose d k as a Newton direction (possibly regularized) by CG, such that ( 2 f (x k ) + 2ɛ H I ) d k + f (x k ) ξ 2 min { f (x k), ɛ H d k } 2 Backtracking line search (unchanged) 3 Set x k+1 = x k + α k d k. Nonconvex optimization via Newton-CG 26

43 Complexity result for inexact variants Set ɛ g = ɛ, ɛ H = ɛ. The methods returns an (ɛ, ɛ)-point using at most O(ɛ 3 2 ) outer iterations, ( { }) Õ min nɛ 3 2, ɛ 7 4 gradients/hessian-vector products, with probability at least 1 O(ɛ 3 2 δ). Nonconvex optimization via Newton-CG 27

44 Complexity result for inexact variants Set ɛ g = ɛ, ɛ H = ɛ. The methods returns an (ɛ, ɛ)-point using at most O(ɛ 3 2 ) outer iterations, ( { }) Õ min nɛ 3 2, ɛ 7 4 gradients/hessian-vector products, with probability at least 1 O(ɛ 3 2 δ). Setting δ = 0 and assuming that n >> ɛ 1/2 yields almost sure results: Iterations: O(ɛ 3 2 ). Gradients/Hessian-vector products: O ( ) ɛ 7 4. Nonconvex optimization via Newton-CG 27

45 Outline 1 Newton-type methods with negative curvature 2 Newton-Capped Conjugate Gradient Conjugate gradient and nonconvex quadratics Newton-Capped CG algorithms 3 Numerical results Nonconvex optimization via Newton-CG 28

46 Revisiting conjugate gradient Idea Consider applying CG to a linear system Hd = g; where H may not be positive definite. Equivalently, apply CG to the quadratic min q(d) = 1 d 2 d Hd + g d without knowing if q is (strongly) convex. Motivation Rich CV theory for CG in the positive definite/strongly convex case; When applied to an indefinite system: May break down......but then reveals negative curvature. Nonconvex optimization via Newton-CG 29

47 Conjugate gradient for Hy = g Algorithm Init: Set y 0 = 0 R n, r 0 = g, p 0 = g, j = 0. While pj Hp j > 0 Compute y j+1 = y j + α j p j, r j+1 = Hy j+1 + g and p j+1. Set j = j + 1; terminate if r j ζ r 0. Nonconvex optimization via Newton-CG 30

48 Conjugate gradient for Hy = g Algorithm Init: Set y 0 = 0 R n, r 0 = g, p 0 = g, j = 0. While pj Hp j > 0 Compute y j+1 = y j + α j p j, r j+1 = Hy j+1 + g and p j+1. Set j = j + 1; terminate if r j ζ r 0. If H 0, r n = 0; If ɛ H I H MI, ( ) r j 2 2 2j 4κ 1 r 0 2, κ = M κ + 1 ɛ H. Nonconvex optimization via Newton-CG 30

49 Conjugate gradient for Hy = g Algorithm assuming ɛ H I H MI Init: Set y 0 = 0 R n, r 0 = g, p 0 = g, j = 0. ( ) 2j While pj Hp j > ɛ H p j 2 and r j 2 4κ 1 2 r0 κ+1 2 Compute y j+1 = y j + α j p j, r j+1 = Hy j+1 + g and p j+1. Set j = j + 1; terminate if r j ζ r 0. If H 0, r n = 0; If ɛ H I H MI, ( ) r j 2 2 2j 4κ 1 r 0 2, κ = M κ + 1 ɛ H. Nonconvex optimization via Newton-CG 30

50 Conjugate gradient for Hy = g Algorithm assuming ɛ H I H MI Init: Set y 0 = 0 R n, r 0 = g, p 0 = g, j = 0. ( ) 2j While pj Hp j > ɛ H p j 2 and r j 2 4κ 1 2 r0 κ+1 2 Compute y j+1 = y j + α j p j, r j+1 = Hy j+1 + g and p j+1. Set j = j + 1; terminate if r j ζ r 0. If H 0, r n = 0; If ɛ H I H MI, ( ) r j 2 2 2j 4κ 1 r 0 2, κ = M κ + 1 ɛ H. What if H 0? Nonconvex optimization via Newton-CG 30

51 Conjugate gradient for possibly indefinite systems Capped Conjugate Gradient Init: Set y 0 = 0 R n, r 0 = g, p 0 = g, j = 0. While pj Hp j > ɛ H p j 2 and r j 2 T τ j r 0 2 Compute y j+1, r j+1 = Hy j+1 + g and p j+1. Set j = j + 1; terminate if r j ζ r 0. Nonconvex optimization via Newton-CG 31

52 Conjugate gradient for possibly indefinite systems Capped Conjugate Gradient Init: Set y 0 = 0 R n, r 0 = g, p 0 = g, j = 0. While pj Hp j > ɛ H p j 2 and r j 2 T τ j r 0 2 Compute y j+1, r j+1 = Hy j+1 + g and p j+1. Set j = j + 1; terminate if r j ζ r 0. Properties of Capped CG For any matrix H MI : As long as r j is computed: r j 2 T τ j r 0 2, T = 16κ 5, τ = κ κ+1, κ = M ɛ H. { ( )} The method runs at most j pla = min n, Õ M iterations ( cap") ɛh before terminating or violating one condition. Nonconvex optimization via Newton-CG 31

53 Main result - Violating conditions in Capped CG Theorem (Royer, O Neill, Wright 2018) If Capped CG applied to Hd = g runs for J iterations and r J > ζ r 0, then Either p J Hp J ɛ H p J 2 Or r J > T τ J r 0, y J+1 can be computed and there exists j {0,..., J 1} such (y J+1 y j ) H(y J+1 y j ) ɛ H y J+1 y j 2. Nonconvex optimization via Newton-CG 32

54 Main result - Violating conditions in Capped CG Theorem (Royer, O Neill, Wright 2018) If Capped CG applied to (H + 2ɛ H I )d = g runs for J iterations and r J > ζ r 0, then Either p J Hp J ɛ H p J 2 Or r J > T τ J r 0, y J+1 can be computed and there exists j {0,..., J 1} such (y J+1 y j ) H(y J+1 y j ) ɛ H y J+1 y j 2. Nonconvex optimization via Newton-CG 32

55 Main result - Violating conditions in Capped CG Theorem (Royer, O Neill, Wright 2018) If Capped CG applied to (H + 2ɛ H I )d = g runs for J iterations and r J > ζ r 0, then Either p J Hp J ɛ H p J 2 Or r J > T τ J r 0, y J+1 can be computed and there exists j {0,..., J 1} such (y J+1 y j ) H(y J+1 y j ) ɛ H y J+1 y j 2. Proof: follows a proof of accelerated methods from Bubeck (2014) and its variant for nonconvex accelerated gradient (Carmon et al 2017), but applied to quadratic functions. But in our case, we only use intrinsic properties of CG and look at quadratics we directly obtain negative curvature directions! Nonconvex optimization via Newton-CG 32

56 Capped Conjugate Gradient - summary Running Capped CG Applying Capped CG to ( 2 f (x k ) + 2ɛ H I ) d = f (x k ) yields one of the two following outcomes: 1 a regularized Newton step d k with ( 2 f (x k ) + 2ɛ H I )d k + f (x k ) ζ r 0 ; 2 a direction of negative curvature ɛ H. in at most Õ(min{n, ɛ 1/2 H }) iterations/hessian-vector products. Nonconvex optimization via Newton-CG 33

57 Outline 1 Newton-type methods with negative curvature 2 Newton-Capped Conjugate Gradient Conjugate gradient and nonconvex quadratics Newton-Capped CG algorithms 3 Numerical results Nonconvex optimization via Newton-CG 34

58 Building on Capped CG Two new instances of our generic method Phase One: when the gradient norm is large, use Capped CG only to compute search directions; Phase Two: when the gradient norm is small, use standard CG to estimate the smallest eigenvalue. We no longer compute λ min ( 2 f ) at the beginning of the iteration. Nonconvex optimization via Newton-CG 35

59 Newton-Capped CG Inputs: x 0 R n, θ (0, 1), η > 0, ɛ g (0, 1), ɛ H (0, 1), δ (0, 1]. For k=0, 1, 2,... 1 If f (x k ) > ɛ g, compute d k via Capped CG. 2 Otherwise, apply CG as an eigenvalue oracle. If this oracle returns a certificate that 2 f (x k ) ɛ H I terminate, otherwise use its output as d k. 3 Backtracking line search (unchanged) 4 Set x k+1 = x k + α k d k. Probabilistic analysis We may terminate at a non-stationary point....yet the method is always well defined. Nonconvex optimization via Newton-CG 36

60 Complexity results (order one) Theorem - Number of iterations The method finds x k such that f (x k ) ɛ g (ɛ g -point) in at most O ( max { ɛ 3 g ɛ 3 H, }) ɛ 3 H iterations; Each iteration corresponds to at most Õ(min{n, ɛ 1/2 H }) Capped CG iterations. Theorem - Computational complexity The method reaches an ɛ g -point using at most ( Õ min{n, ɛ 1/2 H } max { ɛ 3 g ɛ 3 H, } ) ɛ 3. gradients/hessian-vector products. ɛ H = ɛ 1/2 g Õ ( max{nɛ 3/2 g, ɛ 7/4 g } ). Best known bound without direct Hessian calculation. H Nonconvex optimization via Newton-CG 37

61 Second-order complexity results (general) Theorem Goal: Reach an (ɛ g, ɛ H )-point x k such that f (x k ) ɛ g, λ k = λ min ( 2 f (x k )) ɛ H. Use CG as eigenvalue oracle with δ [0, 1). An (ɛ g, ɛ H )-point is reached using at most O(max{ɛ 3 g ɛ 3 H, ɛ 3 }) iterations, H Õ ( min{n, ɛ 1/2 } max{ɛ 3 g ɛ 3 H, ɛ 3 H }) gradients/hessian-vector products, with probability at least (1 δ) O(min{ɛ3 g ɛ 3 H,ɛ3 H }). Nonconvex optimization via Newton-CG 38

62 Second-order complexity results (specific) Goal: Reach an (ɛ, ɛ)-point x k such that f (x k ) ɛ g, λ k = λ min ( 2 f (x k )) ɛ. Use CG as eigenvalue oracle with δ [0, 1). Theorem An (ɛ, ɛ)-point is reached using at most O(ɛ 3/2 ) iterations, Õ ( min{nɛ 3/2, ɛ 7/4 } ) gradients/hessian-vector products, with probability at least (1 δ) O(ɛ 3/2). Nonconvex optimization via Newton-CG 39

63 Outline 1 Newton-type methods with negative curvature 2 Newton-Capped Conjugate Gradient 3 Numerical results Nonconvex optimization via Newton-CG 40

64 Testing framework Part of an ongoing numerical study; Focus on Newton+Capped CG (best performance among our variants); Comparison with other methods popular in: Large-scale optimization: Nonlinear CG, L-BFGS; Data science: Variants of (accelerated) gradient descent. Nonconvex optimization via Newton-CG 41

65 A classical optimization benchmark Setup 61 nonconvex problems from CUTEst, dimensions from 2 to 500; ɛ g = 10 5, ɛ H = ɛ g ; Algorithms Newton-Capped CG; Nonlinear CG (Polak-Ribière); Gradient descent + negative curvature (2 versions). Nonconvex optimization via Newton-CG 42

66 A classical optimization benchmark Setup 61 nonconvex problems from CUTEst, dimensions from 2 to 500; ɛ g = 10 5, ɛ H = ɛ g ; Nonconvex optimization via Newton-CG 42

67 A nonconvex estimation problem Fonction de perte de Tukey (Carmon et al, ICML 2017) min f (x) = 30 h(a x R n i x b i ) where h(θ) = θ 2 /(1 + θ 2 ), i=1 with a i N (0, I n ) and b i = ai x + bruit non Gaussien. Stopping criterion: f (x) ɛ g = Four algorithms Newton+Capped CG; Nonlinear CG (Polak-Ribière); L-BFGS; Heavy ball. Nonconvex optimization via Newton-CG 43

68 Nonconvex estimation problem: results Nonconvex optimization via Newton-CG 44

69 A matrix optimization problem Matrix problem min U,V 1 2 P Ω (UV M) 2, F avec M R m n, U R m r, V R n r, Ω = 15% mn. Fraction of MNIST dataset (0-1 digits): find first principal component (r = 1). Comparison Generic purpose optimization methods: Newton+Capped CG; Nonlinear CG (Polak-Ribière); Dedicated solvers: Alternated gradient descent (Tanner and Wei 2016); LMaFit (Wen et al. 2012). Nonconvex optimization via Newton-CG 45

70 A matrix completion problem: results Nonconvex optimization via Newton-CG 46

71 Conclusion Newton-CG: standard wisdom Useful for large-scale optimization; No specific complexity guarantees......nor justification for its ability to detect negative curvature through CG. Newton-CG: our point of view Conjugate gradient: Analysis for indefinite quadratics; Can be used as eigenvalue oracle; Newton + Capped CG: Best known complexity bounds; First order: deterministic Õ(ɛ 7/4 g ) complexity; Probabilistic results for second order. Nonconvex optimization via Newton-CG 47

72 To be continued For more information... Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization, C. W. Royer and S. J. Wright, SIAM J. Optim. 28(2): , A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization, C. W. Royer, M. O Neill and S. J. Wright, arxiv: Accepted in Math. Prog. Ongoing work Numerical study; Trust-region framework; Extension to constrained problems. Nonconvex optimization via Newton-CG 48

73 To be continued For more information... Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization, C. W. Royer and S. J. Wright, SIAM J. Optim. 28(2): , A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization, C. W. Royer, M. O Neill and S. J. Wright, arxiv: Accepted in Math. Prog. Ongoing work Numerical study; Trust-region framework; Extension to constrained problems. Thank you for your attention! croyer2@wisc.edu Nonconvex optimization via Newton-CG 48

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

A Subsampling Line-Search Method with Second-Order Results

A Subsampling Line-Search Method with Second-Order Results A Subsampling Line-Search Method with Second-Order Results E. Bergou Y. Diouane V. Kungurtsev C. W. Royer November 21, 2018 Abstract In many contemporary optimization problems, such as hyperparameter tuning

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

An introduction to complexity analysis for nonconvex optimization

An introduction to complexity analysis for nonconvex optimization An introduction to complexity analysis for nonconvex optimization Philippe Toint (with Coralia Cartis and Nick Gould) FUNDP University of Namur, Belgium Séminaire Résidentiel Interdisciplinaire, Saint

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

arxiv: v1 [math.oc] 9 Oct 2018

arxiv: v1 [math.oc] 9 Oct 2018 Cubic Regularization with Momentum for Nonconvex Optimization Zhe Wang Yi Zhou Yingbin Liang Guanghui Lan Ohio State University Ohio State University zhou.117@osu.edu liang.889@osu.edu Ohio State University

More information

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima Third-order Smoothness elps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima Yaodong Yu and Pan Xu and Quanquan Gu arxiv:171.06585v1 [math.oc] 18 Dec 017 Abstract We propose stochastic

More information

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il

More information

Mesures de criticalité d'ordres 1 et 2 en recherche directe

Mesures de criticalité d'ordres 1 et 2 en recherche directe Mesures de criticalité d'ordres 1 et 2 en recherche directe From rst to second-order criticality measures in direct search Clément Royer ENSEEIHT-IRIT, Toulouse, France Co-auteurs: S. Gratton, L. N. Vicente

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Yi Zhou Department of ECE The Ohio State University zhou.1172@osu.edu Zhe Wang Department of ECE The Ohio State University

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Nonlinear Optimization Methods for Machine Learning

Nonlinear Optimization Methods for Machine Learning Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

arxiv: v1 [math.oc] 16 Oct 2018

arxiv: v1 [math.oc] 16 Oct 2018 A Subsampling Line-Search Method with Second-Order Results E. Bergou Y. Diouane V. Kungurtsev C. W. Royer October 18, 2018 arxiv:1810.07211v1 [math.oc] 16 Oct 2018 Abstract In many contemporary optimization

More information

A line-search algorithm inspired by the adaptive cubic regularization framework with a worst-case complexity O(ɛ 3/2 )

A line-search algorithm inspired by the adaptive cubic regularization framework with a worst-case complexity O(ɛ 3/2 ) A line-search algorithm inspired by the adaptive cubic regularization framewor with a worst-case complexity Oɛ 3/ E. Bergou Y. Diouane S. Gratton December 4, 017 Abstract Adaptive regularized framewor

More information

Lecture 1: Supervised Learning

Lecture 1: Supervised Learning Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Non-convex optimization. Issam Laradji

Non-convex optimization. Issam Laradji Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective

More information

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Mathematical Programming manuscript No. (will be inserted by the editor) Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Wei Bian Xiaojun Chen Yinyu Ye July

More information

Optimization Methods for Machine Learning

Optimization Methods for Machine Learning Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction

More information

A line-search algorithm inspired by the adaptive cubic regularization framework, with a worst-case complexity O(ɛ 3/2 ).

A line-search algorithm inspired by the adaptive cubic regularization framework, with a worst-case complexity O(ɛ 3/2 ). A line-search algorithm inspired by the adaptive cubic regularization framewor, with a worst-case complexity Oɛ 3/. E. Bergou Y. Diouane S. Gratton June 16, 017 Abstract Adaptive regularized framewor using

More information

Second-Order Methods for Stochastic Optimization

Second-Order Methods for Stochastic Optimization Second-Order Methods for Stochastic Optimization Frank E. Curtis, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University

More information

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization

More information

On the complexity of an Inexact Restoration method for constrained optimization

On the complexity of an Inexact Restoration method for constrained optimization On the complexity of an Inexact Restoration method for constrained optimization L. F. Bueno J. M. Martínez September 18, 2018 Abstract Recent papers indicate that some algorithms for constrained optimization

More information

A Line search Multigrid Method for Large-Scale Nonlinear Optimization

A Line search Multigrid Method for Large-Scale Nonlinear Optimization A Line search Multigrid Method for Large-Scale Nonlinear Optimization Zaiwen Wen Donald Goldfarb Department of Industrial Engineering and Operations Research Columbia University 2008 Siam Conference on

More information

A Second-Order Method for Strongly Convex l 1 -Regularization Problems

A Second-Order Method for Strongly Convex l 1 -Regularization Problems Noname manuscript No. (will be inserted by the editor) A Second-Order Method for Strongly Convex l 1 -Regularization Problems Kimon Fountoulakis and Jacek Gondzio Technical Report ERGO-13-11 June, 13 Abstract

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization

More information

Mini-Course 1: SGD Escapes Saddle Points

Mini-Course 1: SGD Escapes Saddle Points Mini-Course 1: SGD Escapes Saddle Points Yang Yuan Computer Science Department Cornell University Gradient Descent (GD) Task: min x f (x) GD does iterative updates x t+1 = x t η t f (x t ) Gradient Descent

More information

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Numerical Optimization: Basic Concepts and Algorithms

Numerical Optimization: Basic Concepts and Algorithms May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Lecture 5: September 12

Lecture 5: September 12 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel

More information

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search ANZIAM J. 55 (E) pp.e79 E89, 2014 E79 On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search Lijun Li 1 Weijun Zhou 2 (Received 21 May 2013; revised

More information

An Inexact Newton Method for Nonlinear Constrained Optimization

An Inexact Newton Method for Nonlinear Constrained Optimization An Inexact Newton Method for Nonlinear Constrained Optimization Frank E. Curtis Numerical Analysis Seminar, January 23, 2009 Outline Motivation and background Algorithm development and theoretical results

More information

Taylor-like models in nonsmooth optimization

Taylor-like models in nonsmooth optimization Taylor-like models in nonsmooth optimization Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with Ioffe (Technion), Lewis (Cornell), and Paquette (UW) SIAM Optimization 2017 AFOSR,

More information

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems Linear algebra issues in Interior Point methods for bound-constrained least-squares problems Stefania Bellavia Dipartimento di Energetica S. Stecco Università degli Studi di Firenze Joint work with Jacek

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Static unconstrained optimization

Static unconstrained optimization Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Adaptive Negative Curvature Descent with Applications in Non-convex Optimization

Adaptive Negative Curvature Descent with Applications in Non-convex Optimization Adaptive Negative Curvature Descent with Applications in Non-convex Optimization Mingrui Liu, Zhe Li, Xiaoyu Wang, Jinfeng Yi, Tianbao Yang Department of Computer Science, The University of Iowa, Iowa

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

The Randomized Newton Method for Convex Optimization

The Randomized Newton Method for Convex Optimization The Randomized Newton Method for Convex Optimization Vaden Masrani UBC MLRG April 3rd, 2018 Introduction We have some unconstrained, twice-differentiable convex function f : R d R that we want to minimize:

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Complexity of gradient descent for multiobjective optimization

Complexity of gradient descent for multiobjective optimization Complexity of gradient descent for multiobjective optimization J. Fliege A. I. F. Vaz L. N. Vicente July 18, 2018 Abstract A number of first-order methods have been proposed for smooth multiobjective optimization

More information

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification

A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification JMLR: Workshop and Conference Proceedings 1 16 A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification Chih-Yang Hsia r04922021@ntu.edu.tw Dept. of Computer Science,

More information

arxiv: v2 [math.oc] 1 Nov 2017

arxiv: v2 [math.oc] 1 Nov 2017 Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence arxiv:1710.09447v [math.oc] 1 Nov 017 Mingrui Liu, Tianbao Yang Department of Computer Science The University of

More information

Optimization for neural networks

Optimization for neural networks 0 - : Optimization for neural networks Prof. J.C. Kao, UCLA Optimization for neural networks We previously introduced the principle of gradient descent. Now we will discuss specific modifications we make

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

A Conservation Law Method in Optimization

A Conservation Law Method in Optimization A Conservation Law Method in Optimization Bin Shi Florida International University Tao Li Florida International University Sundaraja S. Iyengar Florida International University Abstract bshi1@cs.fiu.edu

More information

Numerical Methods for PDE-Constrained Optimization

Numerical Methods for PDE-Constrained Optimization Numerical Methods for PDE-Constrained Optimization Richard H. Byrd 1 Frank E. Curtis 2 Jorge Nocedal 2 1 University of Colorado at Boulder 2 Northwestern University Courant Institute of Mathematical Sciences,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

On Lagrange multipliers of trust region subproblems

On Lagrange multipliers of trust region subproblems On Lagrange multipliers of trust region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Applied Linear Algebra April 28-30, 2008 Novi Sad, Serbia Outline

More information

On Lagrange multipliers of trust-region subproblems

On Lagrange multipliers of trust-region subproblems On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

MATH 4211/6211 Optimization Basics of Optimization Problems

MATH 4211/6211 Optimization Basics of Optimization Problems MATH 4211/6211 Optimization Basics of Optimization Problems Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 A standard minimization

More information

1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem

1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem SIAM J. OPTIM. Vol. 2, No. 6, pp. 2833 2852 c 2 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF STEEPEST DESCENT, NEWTON S AND REGULARIZED NEWTON S METHODS FOR NONCONVEX UNCONSTRAINED

More information

On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL) Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization Nick Gould (RAL) x IR n f(x) subject to c(x) = Part C course on continuoue optimization CONSTRAINED MINIMIZATION x

More information

Evaluation complexity for nonlinear constrained optimization using unscaled KKT conditions and high-order models by E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos and Ph. L. Toint Report NAXYS-08-2015

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)

More information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

A multistart multisplit direct search methodology for global optimization

A multistart multisplit direct search methodology for global optimization 1/69 A multistart multisplit direct search methodology for global optimization Ismael Vaz (Univ. Minho) Luis Nunes Vicente (Univ. Coimbra) IPAM, Optimization and Optimal Control for Complex Energy and

More information