Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué
|
|
- Kathryn Mason
- 5 years ago
- Views:
Transcription
1 Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG 1
2 Where are you at? Nonconvex optimization via Newton-CG 2
3 Where are you at? Wisconsin Institute for Discovery (WID) Part of UW-Madison; Multi-disciplinary institute created in 2010; Currently organized around hubs. Me and WID Optimization theme, group of Stephen Wright; Affiliated with the Data Science hub and institute. Nonconvex optimization via Newton-CG 2
4 Where are you at? IFDS: Institute for Foundations of Data Science Hosted at WID, funded by NSF (13 centers US-wide, 3-4 larger institutes selected in 2020); Lead by Stephen Wright; Gathering Math, Stat and CS expertise in Data Science; Nonconvex optimization via Newton-CG 3
5 Context Nonconvex optimization Many data science problems are convex: linear classification, logistic regression,... Yet there is a shift of focus from convex to nonconvex: Because of deep learning; But also in many other problems: matrix/tensor completion, robust statistics,etc. Nonconvex optimization via Newton-CG 4
6 Context Nonconvex optimization Many data science problems are convex: linear classification, logistic regression,... Yet there is a shift of focus from convex to nonconvex: Because of deep learning; But also in many other problems: matrix/tensor completion, robust statistics,etc. Example: Nonconvex formulation of low-rank matrix completion min P Ω(X M) 2 X R n m F,,rank(X )=r Factored reformulation (Burer and Monteiro, 2003) PΩ min (U V M) U R n r,v R m r 2 F M Rn m, Ω [n] [m]., nonconvex in U and V! Nonconvex optimization via Newton-CG 4
7 Nonconvex smooth optimization We consider the smooth unconstrained problem: Assumptions on f min f (x), x R n f C 2 (R n ), bounded below, nonconvex. Nonconvex optimization via Newton-CG 5
8 Nonconvex smooth optimization We consider the smooth unconstrained problem: Assumptions on f min f (x), x R n f C 2 (R n ), bounded below, nonconvex. Definitions in smooth nonconvex minimization First-order stationary point: f (x) = 0; Second-order stationary point: f (x) = 0, 2 f (x) 0. If x does not satisfy these conditions, d such that 1 d f (x) < 0: gradient-related direction. and/or 2 d 2 f (x)d < 0: negative curvature direction specific to nonconvex problems. Nonconvex optimization via Newton-CG 5
9 Examples Nonconvex formulations for low-rank matrix problems (Bhojanapalli et al. 2016, Ge et al. 2017) min f (U V ). U R n r,v R m r Points that satisfy second-order necessary conditions are global minima (or are close in function value); Strict saddle property: any first-order stationary point that is not a local minimum possesses negative curvature. Our goal: develop efficient algorithms to obtain second-order necessary points; We measure efficiency based on complexity. Nonconvex optimization via Newton-CG 6
10 Worst-case complexity analysis Complexity bounds Bound the cost of an algorithm in the worst case; Ubiquitous in theoretical computer science; Major impact in convex optimization (Nemirovski and Yudin 1983). The accelerated gradient case Consider f = min x R n f (x) where f is convex and let ɛ (0, 1). Then, to find x such that f ( x) f ɛ: Gradient descent needs at most O(ɛ 1 ) iterations; Accelerated methods (Nesterov s method, Heavy ball,etc) require at most O(ɛ 1/2 ) iterations. Nonconvex optimization via Newton-CG 7
11 Accelerated methods: illustration Several methods applied to min x R 100 x Ax. Nonconvex optimization via Newton-CG 8
12 Complexity in nonconvex optimization For an algorithm applied to min x R n f (x): Definition (first order) Let {x k } be a sequence of iterates generated by the algorithm and ɛ (0, 1). Worst-case cost to obtain an ɛ-point x K such that f (x K ) ɛ. Focus: Dependency on ɛ. Definition (second order) Let {x k } be the iterate sequence generated by the algorithm, two tolerances ɛ g, ɛ H (0, 1): Worst-case cost to obtain an (ɛ g, ɛ H )-point x K such that f (x K ) ɛ g, λ min ( 2 f (x K )) ɛ H. Focus: Dependencies on ɛ g, ɛ H. Nonconvex optimization via Newton-CG 9
13 Complexity results From nonconvex optimization (2006-) Cost measure: Number of iterations (but those may be expensive); Two types of guarantees: 1 f (x) ɛ g ; 2 f (x) ɛ g and 2 f (x) ɛ H I. Best methods: Second-order methods, deterministic variations on Newton s iteration involving Hessians. Nonconvex optimization via Newton-CG 10
14 Complexity results From nonconvex optimization (2006-) Cost measure: Number of iterations (but those may be expensive); Two types of guarantees: 1 f (x) ɛ g ; 2 f (x) ɛ g and 2 f (x) ɛ H I. Best methods: Second-order methods, deterministic variations on Newton s iteration involving Hessians. Trust Region 1 O ( ɛ 2 ) g 2 O ( max{ɛ 2 g ɛ 1 H, ɛ 3 H }) Gradient Descent 1 O ( ɛ 2 ) g + Negative Curvature 2 O ( max{ɛg 2, ɛ ( 3 H }) Cubic Regularization 1 O ɛ 3 ) 2 g ( 2 O max{ɛ 3 ) 2 g ɛ 1 H, ɛ 3 H } Nonconvex optimization via Newton-CG 10
15 Complexity results (2) Influenced by convex optimization (e.g. learning) Cost measure: gradient evaluations+hessian-vector products main iteration cost. Two types of guarantees: 1 f (x) ɛ; 2 f (x) ɛ and 2 f (x) ɛ 1/2 I. Best methods: developed from accelerated gradient, assume knowledge of Lipschitz constants. Nonconvex optimization via Newton-CG 11
16 Complexity results (2) Influenced by convex optimization (e.g. learning) Cost measure: gradient evaluations+hessian-vector products main iteration cost. Two types of guarantees: 1 f (x) ɛ; 2 f (x) ɛ and 2 f (x) ɛ 1/2 I. Best methods: developed from accelerated gradient, assume knowledge of Lipschitz constants. Gradient descent + random perturbation Accelerated gradient descent + random perturbation Accelerated gradient descent with nonconvexity detection 1, 2 Õ ( ɛ 2) (High probability) 1, 2 Õ(ɛ 7 4 ) (High probability) 1 Õ(ɛ 7 4 ) (Deterministic) Nonconvex optimization via Newton-CG 11
17 In this talk Cover the range of complexity results... Iterations, evaluations, computation; Different choices for ɛ g, ɛ H ; Deterministic, high probability. Nonconvex optimization via Newton-CG 12
18 In this talk Cover the range of complexity results... Iterations, evaluations, computation; Different choices for ɛ g, ɛ H ; Deterministic, high probability....through a single framework... Newton-type iterations, with line search; Main cost: gradient/hessian-vector product; Nonconvex optimization via Newton-CG 12
19 In this talk Cover the range of complexity results... Iterations, evaluations, computation; Different choices for ɛ g, ɛ H ; Deterministic, high probability....through a single framework... Newton-type iterations, with line search; Main cost: gradient/hessian-vector product;...with best complexity guarantees Revisit the Conjugate Gradient algorithm; Exploit its relationship with accelerated gradient methods. Nonconvex optimization via Newton-CG 12
20 Outline 1 Newton-type methods with negative curvature General framework Inexact variants 2 Newton-Capped Conjugate Gradient Conjugate gradient and nonconvex quadratics Newton-Capped CG algorithms 3 Numerical results Nonconvex optimization via Newton-CG 13
21 Outline 1 Newton-type methods with negative curvature General framework Inexact variants 2 Newton-Capped Conjugate Gradient 3 Numerical results Nonconvex optimization via Newton-CG 14
22 Line-search framework Inputs: x 0 R n, θ (0, 1), η > 0, ɛ g (0, 1), ɛ H (0, 1). For k=0, 1, 2,... 1 Compute a direction d k = d k (ɛ g, ɛ H ). 2 Backtracking line search Compute the largest α k {θ j } j N such that f (x k + α k d k ) < f (x k ) η 6 α3 k d k 3. 3 Set x k+1 = x k + α k d k. Nonconvex optimization via Newton-CG 15
23 Line-search framework Inputs: x 0 R n, θ (0, 1), η > 0, ɛ g (0, 1), ɛ H (0, 1). For k=0, 1, 2,... 1 Compute a direction d k = d k (ɛ g, ɛ H ). 2 Backtracking line search Compute the largest α k {θ j } j N such that f (x k + α k d k ) < f (x k ) η 6 α3 k d k 3. 3 Set x k+1 = x k + α k d k. About the line search Guarantee of cubic decrease; Simplest one giving complexity guarantees. Nonconvex optimization via Newton-CG 15
24 Newton s iteration Basics Iteration k: Compute d k by solving the linear system 2 f (x k )d k = f (x k ); and set x k+1 = x k + d k ; Unique direction when 2 f (x k ) 0; Can guarantee global convergence with (e.g.) line search. For nonconvex problems Use a threshold ɛ H for λ min ( 2 f (x k ) ) ; Regularize to ensure 2 f (x k ) + αi ɛ H I ; Second order: Leverage negative curvature directions d such that d 2 f (x k )d ɛ H d 2. Nonconvex optimization via Newton-CG 16
25 Second-order Newton method with line search Inputs: x 0 R n, θ (0, 1), η > 0, ɛ H (0, 1). Fork = 0, 1, 2,... 1 Computation of a search direction d k Compute λ = λ min ( 2 f (x k )); If λ < ɛ H, choose d k as a negative curvature direction such that d k f (x k ) 0, d k 2 f (x k )d k = λ d k 2 ; Otherwise, choose d k as a Newton direction (possibly regularized) by solving ( 2 f (x k ) + 2ɛ H I ) d k = f (x k ). 2 Backtracking line search (unchanged) 3 Set x k+1 = x k + α k d k. Nonconvex optimization via Newton-CG 17
26 Complexity of the second-order Newton method Theorem (Royer and Wright 2018) The method returns x k such that f (x k ) ɛ g and 2 f (x k ) ɛ H I in at most O(max{ɛ 3 g ɛ 3 H, ɛ 3 H }) iterations. Nonconvex optimization via Newton-CG 18
27 Complexity of the second-order Newton method Theorem (Royer and Wright 2018) The method returns x k such that f (x k ) ɛ g and 2 f (x k ) ɛ H I in at most O(max{ɛ 3 g ɛ 3 H, ɛ 3 H }) iterations. ɛ H = ɛg 1/2 : bound in O(max{ɛ 3/2 g, ɛ 3 H })O(ɛ 3/2 g ); Optimal over a class of second-order methods (Cartis, Gould and Toint 2018). Nonconvex optimization via Newton-CG 18
28 Outline 1 Newton-type methods with negative curvature General framework Inexact variants 2 Newton-Capped Conjugate Gradient 3 Numerical results Nonconvex optimization via Newton-CG 19
29 Introducing inexactness We are concerned with inexactness in the step computation; Inexactness in the function/derivatives requires a different treatment Ex) Bergou, Diouane, Kungurtsev and Royer (2018) when f (x) = i f i(x). Nonconvex optimization via Newton-CG 20
30 Introducing inexactness We are concerned with inexactness in the step computation; Inexactness in the function/derivatives requires a different treatment Ex) Bergou, Diouane, Kungurtsev and Royer (2018) when f (x) = i f i(x). Our framework uses matrix operations to compute a search direction: Eigenvalue/eigenvector calculation; Linear system solve. Inexact strategies Iterative linear algebra (with/without randomness) for matrix operations. Main cost: matrix-vector products. Nonconvex optimization via Newton-CG 20
31 From optimization to linear algebra Two types of direction Depending on ɛ H > 0 (minimum eigenvalue estimate): Regularized Newton direction: ( 2 f (x k ) + 2ɛ H I )d = f (x k ), 2 f (x k ) + 2ɛ H I ɛ H I. Sufficient negative curvature direction: d f (x k ) 0, d 2 f (x k )d ɛ H d 2. Nonconvex optimization via Newton-CG 21
32 From optimization to linear algebra Two types of direction Depending on ɛ H > 0 (minimum eigenvalue estimate): Regularized Newton direction: ( 2 f (x k ) + 2ɛ H I )d = f (x k ), 2 f (x k ) + 2ɛ H I ɛ H I. Sufficient negative curvature direction: d f (x k ) 0, d 2 f (x k )d ɛ H d 2. Related linear algebra problems Let H = H R n n, g R n and ɛ H > 0: Solve (H + 2ɛ H I )d = g where λ min (H) > ɛ H ; Compute d such that d Hd ɛ H d 2 otherwise. Nonconvex optimization via Newton-CG 21
33 From optimization to linear algebra Two types of direction Depending on ɛ H > 0 (minimum eigenvalue estimate): Approximate regularized Newton direction: ( 2 f (x k ) + 2ɛ H I )d f (x k ), 2 f (x k ) + 2ɛ H I ɛ H I. Sufficient negative curvature direction: d f (x k ) 0, d 2 f (x k )d ɛ H d 2. Related linear algebra problems Let H = H R n n, g R n and ɛ H > 0: Approximate the solution of (H + 2ɛ H I )d = g where λ > ɛ H, λ λ min (H); Compute d such that d Hd ɛ H d 2 otherwise. Nonconvex optimization via Newton-CG 21
34 Conjugate gradient (CG) for symmetric linear systems Problem: Hd = g, where H = H ɛ H I. Nonconvex optimization via Newton-CG 22
35 Conjugate gradient (CG) for symmetric linear systems Problem: Hd = g, where H = H ɛ H I. Conjugate Gradient (CG) properties Applied with the stopping criterion: Hd + g ξ 2 min { g, ɛ H d }, ξ (0, 1). If κ = λ max (H)/λ min (H), CG terminates in at most { ( )} min n, 1 2 κ log 4κ 5 2 /ξ = min { n, O ( κ log(κ/ξ) )} iterations/matrix-vector products. CG does not explicitly use eigenvalues of H! Nonconvex optimization via Newton-CG 22
36 Lanczos for minimum eigenvalue estimation Key idea (Kuczyński and Woźniakowski 1992): use a random starting vector uniformly distributed on the unit sphere. Nonconvex optimization via Newton-CG 23
37 Lanczos for minimum eigenvalue estimation Key idea (Kuczyński and Woźniakowski 1992): use a random starting vector uniformly distributed on the unit sphere. Lanczos with a random start Let H R n n symmetric with H M, ɛ H > 0, δ (0, 1). With a probability of at most 1 δ, the Lanczos process returns an unit vector v such that v Hv λ min (H) + ɛ H 2 in at most { min iterations/matrix-vector products. n, ln(3n/δ2 ) 2 Corollary: If λ min (H) ɛ H, v Hv ɛ H 2. } M ɛ H Nonconvex optimization via Newton-CG 23
38 Conjugate gradient for minimum eigenvalue estimation? Conjugate gradient and Lanczos work on the same Krylov subspaces (invariant by translation) when started from the same point; If they detect negative curvature, it will be at the same iteration. Nonconvex optimization via Newton-CG 24
39 Conjugate gradient for minimum eigenvalue estimation? Conjugate gradient and Lanczos work on the same Krylov subspaces (invariant by translation) when started from the same point; If they detect negative curvature, it will be at the same iteration. Theorem (Royer, O Neill, Wright 2018) ( Let H R n n symmetric with H M, δ [0, 1), and CG be applied to H + ɛ H2 I ) d = b with b U(S n 1 ). Then, if λ min (H) < ɛ H, CG outputs a direction of (negative) curvature ɛ H 2 in at most { } ln(3n/δ 2 ) M J = min n,. 2 ɛ H iterations, with probability at least 1 δ. Nonconvex optimization via Newton-CG 24
40 Minimum eigenvalue oracles Corollary For the matrix 2 f (x k ), consider: Either CG applied to ( 2 f (x k ) + ɛ H 2 I ) d = b, with b S n 1 ; Or Lanczos applied to 2 f (x k ), starting from b S n 1. Then, for every δ [0, 1), we obtain one of the two outcomes below: 1 a direction of negative curvature ɛ H /2, 2 a certificate that 2 f (x k ) ɛ H I, ( ) using at most Õ min{n, ɛ 1/2 gradients/hessian-vector products, with probability at least 1 δ. H } Nonconvex optimization via Newton-CG 25
41 Minimum eigenvalue oracles Corollary For the matrix 2 f (x k ), consider: Either CG applied to ( 2 f (x k ) + ɛ H 2 I ) d = b, with b S n 1 ; Or Lanczos applied to 2 f (x k ), starting from b S n 1. Then, for every δ [0, 1), we obtain one of the two outcomes below: 1 a direction of negative curvature ɛ H /2, 2 a certificate that 2 f (x k ) ɛ H I, ( ) using at most Õ min{n, ɛ 1/2 gradients/hessian-vector products, with probability at least 1 δ. H } We say that those methods are (minimum) eigenvalue oracles. Nonconvex optimization via Newton-CG 25
42 Inexact Newton-type variants Inputs: x 0 R n, θ (0, 1), η > 0, ɛ H (0, 1). For k = 0, 1, 2,... 1 Computation of a search direction d k Compute λ λ min ( 2 f (x k )) using an eigenvalue oracle; If λ < ɛ H, choose d k as a negative curvature direction such that d k f (x k ) 0, d k 2 f (x k )d k = λ d k 2 ; Otherwise, choose d k as a Newton direction (possibly regularized) by CG, such that ( 2 f (x k ) + 2ɛ H I ) d k + f (x k ) ξ 2 min { f (x k), ɛ H d k } 2 Backtracking line search (unchanged) 3 Set x k+1 = x k + α k d k. Nonconvex optimization via Newton-CG 26
43 Complexity result for inexact variants Set ɛ g = ɛ, ɛ H = ɛ. The methods returns an (ɛ, ɛ)-point using at most O(ɛ 3 2 ) outer iterations, ( { }) Õ min nɛ 3 2, ɛ 7 4 gradients/hessian-vector products, with probability at least 1 O(ɛ 3 2 δ). Nonconvex optimization via Newton-CG 27
44 Complexity result for inexact variants Set ɛ g = ɛ, ɛ H = ɛ. The methods returns an (ɛ, ɛ)-point using at most O(ɛ 3 2 ) outer iterations, ( { }) Õ min nɛ 3 2, ɛ 7 4 gradients/hessian-vector products, with probability at least 1 O(ɛ 3 2 δ). Setting δ = 0 and assuming that n >> ɛ 1/2 yields almost sure results: Iterations: O(ɛ 3 2 ). Gradients/Hessian-vector products: O ( ) ɛ 7 4. Nonconvex optimization via Newton-CG 27
45 Outline 1 Newton-type methods with negative curvature 2 Newton-Capped Conjugate Gradient Conjugate gradient and nonconvex quadratics Newton-Capped CG algorithms 3 Numerical results Nonconvex optimization via Newton-CG 28
46 Revisiting conjugate gradient Idea Consider applying CG to a linear system Hd = g; where H may not be positive definite. Equivalently, apply CG to the quadratic min q(d) = 1 d 2 d Hd + g d without knowing if q is (strongly) convex. Motivation Rich CV theory for CG in the positive definite/strongly convex case; When applied to an indefinite system: May break down......but then reveals negative curvature. Nonconvex optimization via Newton-CG 29
47 Conjugate gradient for Hy = g Algorithm Init: Set y 0 = 0 R n, r 0 = g, p 0 = g, j = 0. While pj Hp j > 0 Compute y j+1 = y j + α j p j, r j+1 = Hy j+1 + g and p j+1. Set j = j + 1; terminate if r j ζ r 0. Nonconvex optimization via Newton-CG 30
48 Conjugate gradient for Hy = g Algorithm Init: Set y 0 = 0 R n, r 0 = g, p 0 = g, j = 0. While pj Hp j > 0 Compute y j+1 = y j + α j p j, r j+1 = Hy j+1 + g and p j+1. Set j = j + 1; terminate if r j ζ r 0. If H 0, r n = 0; If ɛ H I H MI, ( ) r j 2 2 2j 4κ 1 r 0 2, κ = M κ + 1 ɛ H. Nonconvex optimization via Newton-CG 30
49 Conjugate gradient for Hy = g Algorithm assuming ɛ H I H MI Init: Set y 0 = 0 R n, r 0 = g, p 0 = g, j = 0. ( ) 2j While pj Hp j > ɛ H p j 2 and r j 2 4κ 1 2 r0 κ+1 2 Compute y j+1 = y j + α j p j, r j+1 = Hy j+1 + g and p j+1. Set j = j + 1; terminate if r j ζ r 0. If H 0, r n = 0; If ɛ H I H MI, ( ) r j 2 2 2j 4κ 1 r 0 2, κ = M κ + 1 ɛ H. Nonconvex optimization via Newton-CG 30
50 Conjugate gradient for Hy = g Algorithm assuming ɛ H I H MI Init: Set y 0 = 0 R n, r 0 = g, p 0 = g, j = 0. ( ) 2j While pj Hp j > ɛ H p j 2 and r j 2 4κ 1 2 r0 κ+1 2 Compute y j+1 = y j + α j p j, r j+1 = Hy j+1 + g and p j+1. Set j = j + 1; terminate if r j ζ r 0. If H 0, r n = 0; If ɛ H I H MI, ( ) r j 2 2 2j 4κ 1 r 0 2, κ = M κ + 1 ɛ H. What if H 0? Nonconvex optimization via Newton-CG 30
51 Conjugate gradient for possibly indefinite systems Capped Conjugate Gradient Init: Set y 0 = 0 R n, r 0 = g, p 0 = g, j = 0. While pj Hp j > ɛ H p j 2 and r j 2 T τ j r 0 2 Compute y j+1, r j+1 = Hy j+1 + g and p j+1. Set j = j + 1; terminate if r j ζ r 0. Nonconvex optimization via Newton-CG 31
52 Conjugate gradient for possibly indefinite systems Capped Conjugate Gradient Init: Set y 0 = 0 R n, r 0 = g, p 0 = g, j = 0. While pj Hp j > ɛ H p j 2 and r j 2 T τ j r 0 2 Compute y j+1, r j+1 = Hy j+1 + g and p j+1. Set j = j + 1; terminate if r j ζ r 0. Properties of Capped CG For any matrix H MI : As long as r j is computed: r j 2 T τ j r 0 2, T = 16κ 5, τ = κ κ+1, κ = M ɛ H. { ( )} The method runs at most j pla = min n, Õ M iterations ( cap") ɛh before terminating or violating one condition. Nonconvex optimization via Newton-CG 31
53 Main result - Violating conditions in Capped CG Theorem (Royer, O Neill, Wright 2018) If Capped CG applied to Hd = g runs for J iterations and r J > ζ r 0, then Either p J Hp J ɛ H p J 2 Or r J > T τ J r 0, y J+1 can be computed and there exists j {0,..., J 1} such (y J+1 y j ) H(y J+1 y j ) ɛ H y J+1 y j 2. Nonconvex optimization via Newton-CG 32
54 Main result - Violating conditions in Capped CG Theorem (Royer, O Neill, Wright 2018) If Capped CG applied to (H + 2ɛ H I )d = g runs for J iterations and r J > ζ r 0, then Either p J Hp J ɛ H p J 2 Or r J > T τ J r 0, y J+1 can be computed and there exists j {0,..., J 1} such (y J+1 y j ) H(y J+1 y j ) ɛ H y J+1 y j 2. Nonconvex optimization via Newton-CG 32
55 Main result - Violating conditions in Capped CG Theorem (Royer, O Neill, Wright 2018) If Capped CG applied to (H + 2ɛ H I )d = g runs for J iterations and r J > ζ r 0, then Either p J Hp J ɛ H p J 2 Or r J > T τ J r 0, y J+1 can be computed and there exists j {0,..., J 1} such (y J+1 y j ) H(y J+1 y j ) ɛ H y J+1 y j 2. Proof: follows a proof of accelerated methods from Bubeck (2014) and its variant for nonconvex accelerated gradient (Carmon et al 2017), but applied to quadratic functions. But in our case, we only use intrinsic properties of CG and look at quadratics we directly obtain negative curvature directions! Nonconvex optimization via Newton-CG 32
56 Capped Conjugate Gradient - summary Running Capped CG Applying Capped CG to ( 2 f (x k ) + 2ɛ H I ) d = f (x k ) yields one of the two following outcomes: 1 a regularized Newton step d k with ( 2 f (x k ) + 2ɛ H I )d k + f (x k ) ζ r 0 ; 2 a direction of negative curvature ɛ H. in at most Õ(min{n, ɛ 1/2 H }) iterations/hessian-vector products. Nonconvex optimization via Newton-CG 33
57 Outline 1 Newton-type methods with negative curvature 2 Newton-Capped Conjugate Gradient Conjugate gradient and nonconvex quadratics Newton-Capped CG algorithms 3 Numerical results Nonconvex optimization via Newton-CG 34
58 Building on Capped CG Two new instances of our generic method Phase One: when the gradient norm is large, use Capped CG only to compute search directions; Phase Two: when the gradient norm is small, use standard CG to estimate the smallest eigenvalue. We no longer compute λ min ( 2 f ) at the beginning of the iteration. Nonconvex optimization via Newton-CG 35
59 Newton-Capped CG Inputs: x 0 R n, θ (0, 1), η > 0, ɛ g (0, 1), ɛ H (0, 1), δ (0, 1]. For k=0, 1, 2,... 1 If f (x k ) > ɛ g, compute d k via Capped CG. 2 Otherwise, apply CG as an eigenvalue oracle. If this oracle returns a certificate that 2 f (x k ) ɛ H I terminate, otherwise use its output as d k. 3 Backtracking line search (unchanged) 4 Set x k+1 = x k + α k d k. Probabilistic analysis We may terminate at a non-stationary point....yet the method is always well defined. Nonconvex optimization via Newton-CG 36
60 Complexity results (order one) Theorem - Number of iterations The method finds x k such that f (x k ) ɛ g (ɛ g -point) in at most O ( max { ɛ 3 g ɛ 3 H, }) ɛ 3 H iterations; Each iteration corresponds to at most Õ(min{n, ɛ 1/2 H }) Capped CG iterations. Theorem - Computational complexity The method reaches an ɛ g -point using at most ( Õ min{n, ɛ 1/2 H } max { ɛ 3 g ɛ 3 H, } ) ɛ 3. gradients/hessian-vector products. ɛ H = ɛ 1/2 g Õ ( max{nɛ 3/2 g, ɛ 7/4 g } ). Best known bound without direct Hessian calculation. H Nonconvex optimization via Newton-CG 37
61 Second-order complexity results (general) Theorem Goal: Reach an (ɛ g, ɛ H )-point x k such that f (x k ) ɛ g, λ k = λ min ( 2 f (x k )) ɛ H. Use CG as eigenvalue oracle with δ [0, 1). An (ɛ g, ɛ H )-point is reached using at most O(max{ɛ 3 g ɛ 3 H, ɛ 3 }) iterations, H Õ ( min{n, ɛ 1/2 } max{ɛ 3 g ɛ 3 H, ɛ 3 H }) gradients/hessian-vector products, with probability at least (1 δ) O(min{ɛ3 g ɛ 3 H,ɛ3 H }). Nonconvex optimization via Newton-CG 38
62 Second-order complexity results (specific) Goal: Reach an (ɛ, ɛ)-point x k such that f (x k ) ɛ g, λ k = λ min ( 2 f (x k )) ɛ. Use CG as eigenvalue oracle with δ [0, 1). Theorem An (ɛ, ɛ)-point is reached using at most O(ɛ 3/2 ) iterations, Õ ( min{nɛ 3/2, ɛ 7/4 } ) gradients/hessian-vector products, with probability at least (1 δ) O(ɛ 3/2). Nonconvex optimization via Newton-CG 39
63 Outline 1 Newton-type methods with negative curvature 2 Newton-Capped Conjugate Gradient 3 Numerical results Nonconvex optimization via Newton-CG 40
64 Testing framework Part of an ongoing numerical study; Focus on Newton+Capped CG (best performance among our variants); Comparison with other methods popular in: Large-scale optimization: Nonlinear CG, L-BFGS; Data science: Variants of (accelerated) gradient descent. Nonconvex optimization via Newton-CG 41
65 A classical optimization benchmark Setup 61 nonconvex problems from CUTEst, dimensions from 2 to 500; ɛ g = 10 5, ɛ H = ɛ g ; Algorithms Newton-Capped CG; Nonlinear CG (Polak-Ribière); Gradient descent + negative curvature (2 versions). Nonconvex optimization via Newton-CG 42
66 A classical optimization benchmark Setup 61 nonconvex problems from CUTEst, dimensions from 2 to 500; ɛ g = 10 5, ɛ H = ɛ g ; Nonconvex optimization via Newton-CG 42
67 A nonconvex estimation problem Fonction de perte de Tukey (Carmon et al, ICML 2017) min f (x) = 30 h(a x R n i x b i ) where h(θ) = θ 2 /(1 + θ 2 ), i=1 with a i N (0, I n ) and b i = ai x + bruit non Gaussien. Stopping criterion: f (x) ɛ g = Four algorithms Newton+Capped CG; Nonlinear CG (Polak-Ribière); L-BFGS; Heavy ball. Nonconvex optimization via Newton-CG 43
68 Nonconvex estimation problem: results Nonconvex optimization via Newton-CG 44
69 A matrix optimization problem Matrix problem min U,V 1 2 P Ω (UV M) 2, F avec M R m n, U R m r, V R n r, Ω = 15% mn. Fraction of MNIST dataset (0-1 digits): find first principal component (r = 1). Comparison Generic purpose optimization methods: Newton+Capped CG; Nonlinear CG (Polak-Ribière); Dedicated solvers: Alternated gradient descent (Tanner and Wei 2016); LMaFit (Wen et al. 2012). Nonconvex optimization via Newton-CG 45
70 A matrix completion problem: results Nonconvex optimization via Newton-CG 46
71 Conclusion Newton-CG: standard wisdom Useful for large-scale optimization; No specific complexity guarantees......nor justification for its ability to detect negative curvature through CG. Newton-CG: our point of view Conjugate gradient: Analysis for indefinite quadratics; Can be used as eigenvalue oracle; Newton + Capped CG: Best known complexity bounds; First order: deterministic Õ(ɛ 7/4 g ) complexity; Probabilistic results for second order. Nonconvex optimization via Newton-CG 47
72 To be continued For more information... Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization, C. W. Royer and S. J. Wright, SIAM J. Optim. 28(2): , A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization, C. W. Royer, M. O Neill and S. J. Wright, arxiv: Accepted in Math. Prog. Ongoing work Numerical study; Trust-region framework; Extension to constrained problems. Nonconvex optimization via Newton-CG 48
73 To be continued For more information... Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization, C. W. Royer and S. J. Wright, SIAM J. Optim. 28(2): , A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization, C. W. Royer, M. O Neill and S. J. Wright, arxiv: Accepted in Math. Prog. Ongoing work Numerical study; Trust-region framework; Extension to constrained problems. Thank you for your attention! croyer2@wisc.edu Nonconvex optimization via Newton-CG 48
Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization
Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationA Subsampling Line-Search Method with Second-Order Results
A Subsampling Line-Search Method with Second-Order Results E. Bergou Y. Diouane V. Kungurtsev C. W. Royer November 21, 2018 Abstract In many contemporary optimization problems, such as hyperparameter tuning
More informationIPAM Summer School Optimization methods for machine learning. Jorge Nocedal
IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationAn introduction to complexity analysis for nonconvex optimization
An introduction to complexity analysis for nonconvex optimization Philippe Toint (with Coralia Cartis and Nick Gould) FUNDP University of Namur, Belgium Séminaire Résidentiel Interdisciplinaire, Saint
More informationAccelerated Block-Coordinate Relaxation for Regularized Optimization
Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth
More informationarxiv: v1 [math.oc] 9 Oct 2018
Cubic Regularization with Momentum for Nonconvex Optimization Zhe Wang Yi Zhou Yingbin Liang Guanghui Lan Ohio State University Ohio State University zhou.117@osu.edu liang.889@osu.edu Ohio State University
More informationThird-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima
Third-order Smoothness elps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima Yaodong Yu and Pan Xu and Quanquan Gu arxiv:171.06585v1 [math.oc] 18 Dec 017 Abstract We propose stochastic
More informationOracle Complexity of Second-Order Methods for Smooth Convex Optimization
racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il
More informationMesures de criticalité d'ordres 1 et 2 en recherche directe
Mesures de criticalité d'ordres 1 et 2 en recherche directe From rst to second-order criticality measures in direct search Clément Royer ENSEEIHT-IRIT, Toulouse, France Co-auteurs: S. Gratton, L. N. Vicente
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationConvergence of Cubic Regularization for Nonconvex Optimization under KŁ Property
Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Yi Zhou Department of ECE The Ohio State University zhou.1172@osu.edu Zhe Wang Department of ECE The Ohio State University
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationNonlinear Optimization Methods for Machine Learning
Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationarxiv: v1 [math.oc] 16 Oct 2018
A Subsampling Line-Search Method with Second-Order Results E. Bergou Y. Diouane V. Kungurtsev C. W. Royer October 18, 2018 arxiv:1810.07211v1 [math.oc] 16 Oct 2018 Abstract In many contemporary optimization
More informationA line-search algorithm inspired by the adaptive cubic regularization framework with a worst-case complexity O(ɛ 3/2 )
A line-search algorithm inspired by the adaptive cubic regularization framewor with a worst-case complexity Oɛ 3/ E. Bergou Y. Diouane S. Gratton December 4, 017 Abstract Adaptive regularized framewor
More informationLecture 1: Supervised Learning
Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationWorst-Case Complexity Guarantees and Nonconvex Smooth Optimization
Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex
More informationAn Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization
An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationNon-convex optimization. Issam Laradji
Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationComplexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization
Mathematical Programming manuscript No. (will be inserted by the editor) Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Wei Bian Xiaojun Chen Yinyu Ye July
More informationOptimization Methods for Machine Learning
Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction
More informationA line-search algorithm inspired by the adaptive cubic regularization framework, with a worst-case complexity O(ɛ 3/2 ).
A line-search algorithm inspired by the adaptive cubic regularization framewor, with a worst-case complexity Oɛ 3/. E. Bergou Y. Diouane S. Gratton June 16, 017 Abstract Adaptive regularized framewor using
More informationSecond-Order Methods for Stochastic Optimization
Second-Order Methods for Stochastic Optimization Frank E. Curtis, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationA Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity
A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University
More informationHow to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India
How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization
More informationOn the complexity of an Inexact Restoration method for constrained optimization
On the complexity of an Inexact Restoration method for constrained optimization L. F. Bueno J. M. Martínez September 18, 2018 Abstract Recent papers indicate that some algorithms for constrained optimization
More informationA Line search Multigrid Method for Large-Scale Nonlinear Optimization
A Line search Multigrid Method for Large-Scale Nonlinear Optimization Zaiwen Wen Donald Goldfarb Department of Industrial Engineering and Operations Research Columbia University 2008 Siam Conference on
More informationA Second-Order Method for Strongly Convex l 1 -Regularization Problems
Noname manuscript No. (will be inserted by the editor) A Second-Order Method for Strongly Convex l 1 -Regularization Problems Kimon Fountoulakis and Jacek Gondzio Technical Report ERGO-13-11 June, 13 Abstract
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More informationOPER 627: Nonlinear Optimization Lecture 14: Mid-term Review
OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization
More informationMini-Course 1: SGD Escapes Saddle Points
Mini-Course 1: SGD Escapes Saddle Points Yang Yuan Computer Science Department Cornell University Gradient Descent (GD) Task: min x f (x) GD does iterative updates x t+1 = x t η t f (x t ) Gradient Descent
More informationOptimal Newton-type methods for nonconvex smooth optimization problems
Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationNumerical Optimization: Basic Concepts and Algorithms
May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationLecture 5: September 12
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationNumerical Optimization of Partial Differential Equations
Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada
More informationInfeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization
Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel
More informationOn the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search
ANZIAM J. 55 (E) pp.e79 E89, 2014 E79 On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search Lijun Li 1 Weijun Zhou 2 (Received 21 May 2013; revised
More informationAn Inexact Newton Method for Nonlinear Constrained Optimization
An Inexact Newton Method for Nonlinear Constrained Optimization Frank E. Curtis Numerical Analysis Seminar, January 23, 2009 Outline Motivation and background Algorithm development and theoretical results
More informationTaylor-like models in nonsmooth optimization
Taylor-like models in nonsmooth optimization Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with Ioffe (Technion), Lewis (Cornell), and Paquette (UW) SIAM Optimization 2017 AFOSR,
More informationLinear algebra issues in Interior Point methods for bound-constrained least-squares problems
Linear algebra issues in Interior Point methods for bound-constrained least-squares problems Stefania Bellavia Dipartimento di Energetica S. Stecco Università degli Studi di Firenze Joint work with Jacek
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationStatic unconstrained optimization
Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationAdaptive Negative Curvature Descent with Applications in Non-convex Optimization
Adaptive Negative Curvature Descent with Applications in Non-convex Optimization Mingrui Liu, Zhe Li, Xiaoyu Wang, Jinfeng Yi, Tianbao Yang Department of Computer Science, The University of Iowa, Iowa
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationThe Randomized Newton Method for Convex Optimization
The Randomized Newton Method for Convex Optimization Vaden Masrani UBC MLRG April 3rd, 2018 Introduction We have some unconstrained, twice-differentiable convex function f : R d R that we want to minimize:
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationComplexity of gradient descent for multiobjective optimization
Complexity of gradient descent for multiobjective optimization J. Fliege A. I. F. Vaz L. N. Vicente July 18, 2018 Abstract A number of first-order methods have been proposed for smooth multiobjective optimization
More informationIntroduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems
New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationA Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification
JMLR: Workshop and Conference Proceedings 1 16 A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification Chih-Yang Hsia r04922021@ntu.edu.tw Dept. of Computer Science,
More informationarxiv: v2 [math.oc] 1 Nov 2017
Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence arxiv:1710.09447v [math.oc] 1 Nov 017 Mingrui Liu, Tianbao Yang Department of Computer Science The University of
More informationOptimization for neural networks
0 - : Optimization for neural networks Prof. J.C. Kao, UCLA Optimization for neural networks We previously introduced the principle of gradient descent. Now we will discuss specific modifications we make
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More information1 Numerical optimization
Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationA Conservation Law Method in Optimization
A Conservation Law Method in Optimization Bin Shi Florida International University Tao Li Florida International University Sundaraja S. Iyengar Florida International University Abstract bshi1@cs.fiu.edu
More informationNumerical Methods for PDE-Constrained Optimization
Numerical Methods for PDE-Constrained Optimization Richard H. Byrd 1 Frank E. Curtis 2 Jorge Nocedal 2 1 University of Colorado at Boulder 2 Northwestern University Courant Institute of Mathematical Sciences,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationOn Lagrange multipliers of trust region subproblems
On Lagrange multipliers of trust region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Applied Linear Algebra April 28-30, 2008 Novi Sad, Serbia Outline
More informationOn Lagrange multipliers of trust-region subproblems
On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient
More informationMATH 4211/6211 Optimization Basics of Optimization Problems
MATH 4211/6211 Optimization Basics of Optimization Problems Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 A standard minimization
More information1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem
SIAM J. OPTIM. Vol. 2, No. 6, pp. 2833 2852 c 2 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF STEEPEST DESCENT, NEWTON S AND REGULARIZED NEWTON S METHODS FOR NONCONVEX UNCONSTRAINED
More informationOn Nesterov s Random Coordinate Descent Algorithms - Continued
On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationPart 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)
Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization Nick Gould (RAL) x IR n f(x) subject to c(x) = Part C course on continuoue optimization CONSTRAINED MINIMIZATION x
More informationEvaluation complexity for nonlinear constrained optimization using unscaled KKT conditions and high-order models by E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos and Ph. L. Toint Report NAXYS-08-2015
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationQuasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization
Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)
More informationRanking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization
Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationA multistart multisplit direct search methodology for global optimization
1/69 A multistart multisplit direct search methodology for global optimization Ismael Vaz (Univ. Minho) Luis Nunes Vicente (Univ. Coimbra) IPAM, Optimization and Optimal Control for Complex Energy and
More information