Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization
|
|
- Vincent Wilkins
- 5 years ago
- Views:
Transcription
1 Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem, Pennsylvania, USA - August 17, 2017 Complexity of second order line search 1
2 Smooth nonconvex optimization We consider an unconstrained smooth problem: min x R f (x). n Assumptions on f f bounded from below. f twice continuously dierentiable. f is not convex. Complexity of second order line search 2
3 Optimality conditions Second-order necessary point x satises the second-order necessary conditions if f (x ) = 0, 2 f (x ) 0. Basic paradigm If x is not a second-order necessary point, d such that 1 d f (x) < 0: gradient-type direction. and/or 2 d 2 f (x)d < 0: negative curvature direction specic to nonconvex problems. Complexity of second order line search 3
4 Motivation Example: Nonconvex formulation of low-rank matrix problems For common classes of problems: min f (U V ). U R n r,v R m r Second-order necessary points are global minimizers (or close). Saddle points have negative curvature. Complexity of second order line search 4
5 Motivation Example: Nonconvex formulation of low-rank matrix problems For common classes of problems: min f (U V ). U R n r,v R m r Second-order necessary points are global minimizers (or close). Saddle points have negative curvature. Renewed interested: Second-order necessary points of nonconvex problems. Needed: Ecient algorithms. Complexity of second order line search 4
6 Second-order complexity Principle For a given method, two tolerances ɛ g, ɛ H (0, 1): Obj: bound the worst-case cost of reaching x k such that f (x k ) ɛ g, λ k = λ min ( 2 f (x k )) ɛ H. Focus: Bound dependencies on ɛ g, ɛ H. Complexity of second order line search 5
7 Second-order complexity Principle For a given method, two tolerances ɛ g, ɛ H (0, 1): Obj: bound the worst-case cost of reaching x k such that f (x k ) ɛ g, λ k = λ min ( 2 f (x k )) ɛ H. Focus: Bound dependencies on ɛ g, ɛ H. Denition of cost? Best rates? Complexity of second order line search 5
8 Existing complexity results Nonconvex optimization literature Classical cost: Number of (expensive) iterations. Best methods: Newton-type frameworks. Complexity of second order line search 6
9 Existing complexity results Nonconvex optimization literature Classical cost: Number of (expensive) iterations. Best methods: Newton-type frameworks. Algorithms Classical trust region Cubic regularization TRACE trust region Bounds O ( max{ɛ 2 g ɛ 1 H, ɛ 3 H }) ( ) O max{ɛ 3 2 g, ɛ 3 H } Complexity of second order line search 6
10 Existing complexity results (2) Learning/Statistics community Specic setting ɛ g = ɛ, ɛ H = O( ɛ). Best Newton-type bound: O(ɛ 3 2 ). Gradient-based cheaper iterations. Cost measure: Hessian-vector products/gradient evaluations. Complexity of second order line search 7
11 Existing complexity results (2) Learning/Statistics community Specic setting ɛ g = ɛ, ɛ H = O( ɛ). Best Newton-type bound: O(ɛ 3 2 ). Gradient-based cheaper iterations. Cost measure: Hessian-vector products/gradient evaluations. Algorithms Gradient descent methods with random noise Accelerated gradient methods for nonconvex problems Bounds Õ ( ɛ 2) Õ(ɛ 7 4 ) Õ( ): logarithmic factors. Results hold with high probability. Complexity of second order line search 7
12 Our objective Illustrate all the possible complexities... In terms of iterations, evaluations, etc. For arbitrary ɛ g, ɛ H. Deterministic and high probability results. Complexity of second order line search 8
13 Our objective Illustrate all the possible complexities... In terms of iterations, evaluations, etc. For arbitrary ɛ g, ɛ H. Deterministic and high probability results....in a single framework Based on line search. Matrix-free: only require Hessian-vector products. Good complexity guarantees. Complexity of second order line search 8
14 Outline 1 Our algorithm 2 Complexity analysis 3 Inexact variants Complexity of second order line search 9
15 Outline 1 Our algorithm 2 Complexity analysis 3 Inexact variants Complexity of second order line search 10
16 Basic framework Parameters: x 0 R n, θ (0, 1), η > 0, ɛ g (0, 1), ɛ H (0, 1). For k=0, 1, 2,... 1 Compute a search direction d k. 2 Perform a backtracking line search to compute α k = θ j k such that f (x k + α k d k ) < f (x k ) η 6 α3 k d k 3. 3 Set x k+1 = x k + α k d k. Complexity of second order line search 11
17 Selecting the search direction d k Step 1: Use gradient related information Compute If R k < ɛ H, set g k = f (x k ), R k = g k 2 f (x k )g k g k 2. d k = R k g k g k. Elseif R k [ ɛ H, ɛ H ] and g k > ɛ g, set Otherwise perform Step 2. g k d k = g k. 1/2 Complexity of second order line search 12
18 Selecting the search direction d k (2) Step 2: Use eigenvalue information Compute an eigenpair (v k, λ k ) such that λ k = λ min ( 2 f (x k )) and 2 f (x k )v k = λ k v k, vk g k 0, v k = 1. Case λ k < ɛ H : d k = λ k v k ; Case λ k > ɛ H - Newton step: d k = dk n, 2 f (x k )dk n = g k; Case λ k [ ɛ H, ɛ H ] - regularized Newton step: d k = dk r, ( ) f 2 (x k ) + 2ɛ H dk r = g k. Complexity of second order line search 13
19 Outline 1 Our algorithm 2 Complexity analysis 3 Inexact variants Complexity of second order line search 14
20 Assumptions and notations Assumptions L f (x 0 ) = {x f (x) f (x 0 )} compact. f twice continuously dierentiable on a open set containing L f (x 0 ), with Lipschitz continuous Hessian. L H : Lipschitz constant for 2 f. flow: lower bound on {f (x k )}. U H : upper bound on 2 f (x k ). Complexity of second order line search 15
21 Criterion Approximate solution x k is an (ɛ g, ɛ H )-point if min { g k, g k+1 } ɛ g, λ k ɛ H. Complexity of second order line search 16
22 Criterion Approximate solution x k is an (ɛ g, ɛ H )-point if Other possibilities: min { g k, g k+1 } ɛ g, λ k ɛ H. Remove gradient directions and use g k+1 No cheap gradient steps. Add a stopping criterion and use g k. No global/local convergence. Complexity of second order line search 16
23 Analysis of the method Key principle Bound the decrease produced at every step while an (ɛ g, ɛ H )-point has not been reached. Complexity of second order line search 17
24 Analysis of the method Key principle Bound the decrease produced at every step while an (ɛ g, ɛ H )-point has not been reached. Five possible directions. Two ways of scaling g k : By its (negative) curvature; By its norm; Negative eigenvector; Newton step; Regularized Newton step. Complexity of second order line search 17
25 Analysis of the method Key principle Bound the decrease produced at every step while an (ɛ g, ɛ H )-point has not been reached. Five possible directions. Two ways of scaling g k : By its (negative) curvature; By its norm; Negative eigenvector; Newton step; Regularized Newton step. One proof technique, typical of backtracking line search If unit step is accepted, guaranteed decrease; Otherwise, lower bound on accepted step size. Complexity of second order line search 17
26 Example: When d k = g k / g k 1/2 In that case: g k 2 f (x k )g k g k 2 [ ɛ H, ɛ H ], g k > ɛ g. Unit step accepted: f (x k ) f (x k+1 ) η 6 d k 3 η 6 ɛ 3 2 g. Unit step rejected: By Taylor expansion, there exists a step α k = θ j k that is accepted such that { } θ j 1 5 k θ min 3, 1 2 ɛg ɛ 1 L H + η H. So the line search terminates and f (x k ) f (x k+1 ) η 6 α3 k d k 3 O ( ) ɛ 3 gɛ 3 H. Complexity of second order line search 18
27 Example: When d k = g k / g k 1/2 In that case: g k 2 f (x k )g k g k 2 [ ɛ H, ɛ H ], g k > ɛ g. Unit step accepted: f (x k ) f (x k+1 ) η 6 d k 3 η 6 ɛ 3 2 g. Unit step rejected: By Taylor expansion, there exists a step α k = θ j k that is accepted such that { } θ j 1 5 k θ min 3, 1 2 ɛg ɛ 1 L H + η H. So the line search terminates and Final decrease: f (x k ) f (x k+1 ) η 6 α3 k d k 3 f (x k ) f (x k+1 ) c g min O ( ) ɛ 3 gɛ 3 H. { ɛ 3 gɛ 3 3 H, ɛ 2 g }. Complexity of second order line search 18
28 Decrease bound General decrease lemma If at the k-th iteration, an (ɛ g, ɛ H )-point has not been reached, then { } 3 2 f (x k ) f (x k+1 ) c min ɛg, ɛ 3 H, ɛ3 gɛ 3 H, ϕ(ɛ g, ɛ H ) 3, where ϕ(ɛ g, ɛ H ) = L 1 H ɛ H ( 2 + ) 4 + 2L H ɛ g /ɛ 2 H. c depends on L H, η, θ. Complexity of second order line search 19
29 Iteration complexity Iteration complexity bound The method reaches an (ɛ g, ɛ H )-point in at most iterations. Specic rates: f 0 flow c max { ɛ g = ɛ, ɛ H = ɛ: O(ɛ 3 2 ). ɛ g = ɛ H = ɛ: O(ɛ 3 ). } ɛ 3 2 g, ɛ 3 H, ɛ 3 g ɛ 3 H, ϕ(ɛ g, ɛ H ) 3 Optimal bounds for Newton-type methods. Complexity of second order line search 20
30 Function evaluation complexity #Iterations = #Gradient/#Hessian evaluations. #Iterations #Function evaluations. Complexity of second order line search 21
31 Function evaluation complexity #Iterations = #Gradient/#Hessian evaluations. #Iterations #Function evaluations. Line-search iterations If x k is not a (ɛ g, ɛ H )-point, the line search takes at most ( )) 1 2 O (log θ min{ɛg ɛ 1 H, ɛ2 H } iterations. Evaluation complexity bound The method reaches an (ɛ g, ɛ H )-point in at most function evaluations. ( { }) Õ max ɛ 3 2 g, ɛ 3 H, ɛ 3 g ɛ 3 H, ϕ(ɛ g, ɛ H ) 3 Complexity of second order line search 21
32 Outline 1 Our algorithm 2 Complexity analysis 3 Inexact variants Complexity of second order line search 22
33 Motivation Algorithmic cost The method should be matrix-free. We use matrix-related operations: Linear system solve; Eigenvalue/Eigenvector computation. Inexactness Perform the matrix operations inexactly. Main cost unit: matrix-vector product/gradient evaluation. Complexity of second order line search 23
34 Conjugate gradient for linear systems We solve systems of the form Hd = g, with H ɛ H I. Complexity of second order line search 24
35 Conjugate gradient for linear systems We solve systems of the form Hd = g, with H ɛ H I. Conjugate Gradient (CG) We apply the conjugate gradient algorithm with stopping criterion: Hd + g ξ 2 min { g, ɛ H d }, ξ (0, 1). If κ = λ max (H)/λ min (H), the CG method will nd such a vector in at most { ( )} min n, 1 2 κ log 4κ 5 2 /ξ = min { n, O ( κ log(κ/ξ) )} matrix-vector products. Complexity of second order line search 24
36 Lanczos for eigenvalue computation Lanczos method to compute a minimum eigenvector. Can fail if deterministic Random start. Results for matrices A 0 Change the Hessian. Complexity of second order line search 25
37 Lanczos for eigenvalue computation Lanczos method to compute a minimum eigenvector. Can fail if deterministic Random start. Results for matrices A 0 Change the Hessian. Lanczos iterations Let H R n n symmetric with H U H, ɛ > 0, δ (0, 1). With probability at least 1 δ, the Lanczos procedure applied to U H I H outputs a vector v such that v Hv λ min (H) + ɛ. in at most min { n, ln(n/δ2 ) 2 2 U H ɛ } iterations/matrix-vector products. Complexity of second order line search 25
38 Selecting the search direction d k - Inexact version Step 1: Use gradient related information Compute If R k < ɛ H, set g k = f (x k ), R k = g k 2 f (x k )g k g k 2. d k = R k g k g k. Elseif R k [ ɛ H, ɛ H ] and g k ɛ g, set d k = Otherwise perform the Inexact Step 2. g k g k 1 2. Complexity of second order line search 26
39 Selecting the direction d k - Inexact version (2) Inexact Step 2: Use (inexact) eigenvalue information Compute an eigenpair (v k i, λi k ) such that with probability 1 δ, λ i k = [v i k ] 2 f (x k )v i k λ k + ɛ H 2, [v i k ] g k 0, v i k = 1. Case λ i k < 1 2 ɛ H: d k = v i k ; Case λ i k > 3 2 ɛ H: - Inexact Newton: Use CG to obtain d k = d in k, 2 f (x k )d in k + g k ξ min { g 2 k, ɛ H dk in } ; Case λ i k [ 1 2 ɛ H, 3 2 ɛ H]: - Inexact regularized Newton: Use CG to obtain d k = d ir k, [ 2 f (x k ) + 2ɛ H ] d ir k + g k ξ min { g 2 k, ɛ H dk ir }. Complexity of second order line search 27
40 Complexity analysis of the inexact method Identical reasoning: 5 steps, 1 proof. Using Lanczos with a random start, the negative curvature decrease only holds with probability 1 δ. With CG, the inexact Newton and regularized Newton give slightly dierent formulas. Decrease lemma For any iteration k, if x k is not an (ɛ g, ɛ H )-point, f (x k ) f (x k+1 ) ĉ min { 3 ɛ 3 gɛ 3 H, ɛ 2 g, ɛ 3 H, ϕ (ɛ g, ξ 2 ɛ H with probability at least 1 δ, and ĉ only depends on L H, η, θ. ) 3 ( ) }, ϕ ɛ g, 4+ξ ɛ 3 2 H, Complexity of second order line search 28
41 Complexity results Iteration complexity An (ɛ g, ɛ H )-point is reached in at most ˆK := f 0 flow ĉ max { ɛ 3 3 g ɛ 3 H, ɛ 2 g ), ɛ 3 H (ɛ, ϕ g, ξ ɛ 3 ( ) } 2 H, ϕ ɛ g, 4+ξ ɛ 3 2 H, iterations, with probability at least 1 ˆKδ. Cost complexity The number of Hessian-vector products or gradient evaluations needed to reach an (ɛ g, ɛ H )-point is at most min { ( ) ( )} n, O U 1/2 1 H ɛ 2 H log(ɛ 1 H /ξ), O U 1/2 1 H ɛ 2 H log(n/δ2 ) ˆK, with probability at least 1 ˆKδ. Complexity of second order line search 29
42 Complexity results (simplied) Setting: ɛ g = ɛ, ɛ H = ɛ. An (ɛ, ɛ)-point is reached in at most O(ɛ 3 2 ) iterations, ( ) Õ ɛ 7 4 Hessian-vector products/gradient evaluations, with probability 1 O(ɛ 3 2 δ). Complexity of second order line search 30
43 Complexity results (simplied) Setting: ɛ g = ɛ, ɛ H = ɛ. An (ɛ, ɛ)-point is reached in at most O(ɛ 3 2 ) iterations, ( ) Õ ɛ 7 4 Hessian-vector products/gradient evaluations, with probability 1 O(ɛ 3 2 δ). Setting δ = 0 gives results in probability 1: Iterations: O(ɛ 3 2 ). Hessian-vector/gradients: O ( ) nɛ 3 2. Complexity of second order line search 30
44 Summary Our proposal A class of second-order line-search methods. Best known complexity guarantees. Features gradient steps and inexactness. Can be implemented matrix-free. For more details... Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization, C. W. Royer and S. J. Wright, arxiv: Also contains local convergence results. Complexity of second order line search 31
45 Follow-up Perspectives Numerical testing of our class of methods. Extension to constrained problems. Complexity of second order line search 32
46 Follow-up Perspectives Numerical testing of our class of methods. Extension to constrained problems. Thank you for your attention! Complexity of second order line search 32
Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué
Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationMesures de criticalité d'ordres 1 et 2 en recherche directe
Mesures de criticalité d'ordres 1 et 2 en recherche directe From rst to second-order criticality measures in direct search Clément Royer ENSEEIHT-IRIT, Toulouse, France Co-auteurs: S. Gratton, L. N. Vicente
More informationWorst-Case Complexity Guarantees and Nonconvex Smooth Optimization
Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex
More informationA Subsampling Line-Search Method with Second-Order Results
A Subsampling Line-Search Method with Second-Order Results E. Bergou Y. Diouane V. Kungurtsev C. W. Royer November 21, 2018 Abstract In many contemporary optimization problems, such as hyperparameter tuning
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationOn Nesterov s Random Coordinate Descent Algorithms - Continued
On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower
More informationAn introduction to complexity analysis for nonconvex optimization
An introduction to complexity analysis for nonconvex optimization Philippe Toint (with Coralia Cartis and Nick Gould) FUNDP University of Namur, Belgium Séminaire Résidentiel Interdisciplinaire, Saint
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationAn Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization
An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns
More informationAccelerated Block-Coordinate Relaxation for Regularized Optimization
Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationA Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity
A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University
More informationIntroduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems
New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,
More informationNumerical Methods for PDE-Constrained Optimization
Numerical Methods for PDE-Constrained Optimization Richard H. Byrd 1 Frank E. Curtis 2 Jorge Nocedal 2 1 University of Colorado at Boulder 2 Northwestern University Courant Institute of Mathematical Sciences,
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationA trust region algorithm with a worst-case iteration complexity of O(ɛ 3/2 ) for nonconvex optimization
Math. Program., Ser. A DOI 10.1007/s10107-016-1026-2 FULL LENGTH PAPER A trust region algorithm with a worst-case iteration complexity of O(ɛ 3/2 ) for nonconvex optimization Frank E. Curtis 1 Daniel P.
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationarxiv: v1 [math.oc] 16 Oct 2018
A Subsampling Line-Search Method with Second-Order Results E. Bergou Y. Diouane V. Kungurtsev C. W. Royer October 18, 2018 arxiv:1810.07211v1 [math.oc] 16 Oct 2018 Abstract In many contemporary optimization
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationLecture 5: September 12
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationHow to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization
How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationLecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent
10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for
More informationTrust Regions. Charles J. Geyer. March 27, 2013
Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but
More informationOPER 627: Nonlinear Optimization Lecture 14: Mid-term Review
OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization
More informationOPER 627: Nonlinear Optimization Lecture 9: Trust-region methods
OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods Department of Statistical Sciences and Operations Research Virginia Commonwealth University Sept 25, 2013 (Lecture 9) Nonlinear Optimization
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationSub-Sampled Newton Methods
Sub-Sampled Newton Methods F. Roosta-Khorasani and M. W. Mahoney ICSI and Dept of Statistics, UC Berkeley February 2016 F. Roosta-Khorasani and M. W. Mahoney (UCB) Sub-Sampled Newton Methods Feb 2016 1
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationThird-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima
Third-order Smoothness elps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima Yaodong Yu and Pan Xu and Quanquan Gu arxiv:171.06585v1 [math.oc] 18 Dec 017 Abstract We propose stochastic
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationA Second-Order Method for Strongly Convex l 1 -Regularization Problems
Noname manuscript No. (will be inserted by the editor) A Second-Order Method for Strongly Convex l 1 -Regularization Problems Kimon Fountoulakis and Jacek Gondzio Technical Report ERGO-13-11 June, 13 Abstract
More informationAn Inexact Newton Method for Optimization
New York University Brown Applied Mathematics Seminar, February 10, 2009 Brief biography New York State College of William and Mary (B.S.) Northwestern University (M.S. & Ph.D.) Courant Institute (Postdoc)
More informationNumerical Optimization
Numerical Optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Spring 2010 Emo Todorov (UW) AMATH/CSE 579, Spring 2010 Lecture 9 1 / 8 Gradient descent
More informationInexact Newton Methods and Nonlinear Constrained Optimization
Inexact Newton Methods and Nonlinear Constrained Optimization Frank E. Curtis EPSRC Symposium Capstone Conference Warwick Mathematics Institute July 2, 2009 Outline PDE-Constrained Optimization Newton
More informationIPAM Summer School Optimization methods for machine learning. Jorge Nocedal
IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep
More informationSecond Order Optimization Algorithms I
Second Order Optimization Algorithms I Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 7, 8, 9 and 10 1 The
More informationStochastic Analogues to Deterministic Optimizers
Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured
More informationContents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016
ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................
More informationInfeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization
Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel
More informationOptimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization
5.93 Optimization Methods Lecture 8: Optimality Conditions and Gradient Methods for Unconstrained Optimization Outline. Necessary and sucient optimality conditions Slide. Gradient m e t h o d s 3. The
More informationIntroduction to Nonlinear Optimization Paul J. Atzberger
Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,
More informationSub-Sampled Newton Methods I: Globally Convergent Algorithms
Sub-Sampled Newton Methods I: Globally Convergent Algorithms arxiv:1601.04737v3 [math.oc] 26 Feb 2016 Farbod Roosta-Khorasani February 29, 2016 Abstract Michael W. Mahoney Large scale optimization problems
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationConstrained Optimization Theory
Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August
More informationA Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification
JMLR: Workshop and Conference Proceedings 1 16 A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification Chih-Yang Hsia r04922021@ntu.edu.tw Dept. of Computer Science,
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationOracle Complexity of Second-Order Methods for Smooth Convex Optimization
racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationOptimization Methods. Lecture 19: Line Searches and Newton s Method
15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationLecture 1: Supervised Learning
Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)
More informationA Line search Multigrid Method for Large-Scale Nonlinear Optimization
A Line search Multigrid Method for Large-Scale Nonlinear Optimization Zaiwen Wen Donald Goldfarb Department of Industrial Engineering and Operations Research Columbia University 2008 Siam Conference on
More informationIntroduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1
A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions R. Yousefpour 1 1 Department Mathematical Sciences, University of Mazandaran, Babolsar, Iran; yousefpour@umz.ac.ir
More informationThis manuscript is for review purposes only.
1 2 3 4 5 6 7 8 9 10 11 12 THE USE OF QUADRATIC REGULARIZATION WITH A CUBIC DESCENT CONDITION FOR UNCONSTRAINED OPTIMIZATION E. G. BIRGIN AND J. M. MARTíNEZ Abstract. Cubic-regularization and trust-region
More informationOptimal Newton-type methods for nonconvex smooth optimization problems
Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations
More informationTrajectory-based optimization
Trajectory-based optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 6 1 / 13 Using
More informationSecond order machine learning
Second order machine learning Michael W. Mahoney ICSI and Department of Statistics UC Berkeley Michael W. Mahoney (UC Berkeley) Second order machine learning 1 / 88 Outline Machine Learning s Inverse Problem
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationMath 164: Optimization Barzilai-Borwein Method
Math 164: Optimization Barzilai-Borwein Method Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 online discussions on piazza.com Main features of the Barzilai-Borwein (BB) method The BB
More informationTowards stability and optimality in stochastic gradient descent
Towards stability and optimality in stochastic gradient descent Panos Toulis, Dustin Tran and Edoardo M. Airoldi August 26, 2016 Discussion by Ikenna Odinaka Duke University Outline Introduction 1 Introduction
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More informationNonlinear Optimization Methods for Machine Learning
Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationNon-convex optimization. Issam Laradji
Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationAccelerating Nesterov s Method for Strongly Convex Functions
Accelerating Nesterov s Method for Strongly Convex Functions Hao Chen Xiangrui Meng MATH301, 2011 Outline The Gap 1 The Gap 2 3 Outline The Gap 1 The Gap 2 3 Our talk begins with a tiny gap For any x 0
More informationAdaptive Negative Curvature Descent with Applications in Non-convex Optimization
Adaptive Negative Curvature Descent with Applications in Non-convex Optimization Mingrui Liu, Zhe Li, Xiaoyu Wang, Jinfeng Yi, Tianbao Yang Department of Computer Science, The University of Iowa, Iowa
More informationNumerical optimization
Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal
More informationAn Inexact Newton Method for Nonlinear Constrained Optimization
An Inexact Newton Method for Nonlinear Constrained Optimization Frank E. Curtis Numerical Analysis Seminar, January 23, 2009 Outline Motivation and background Algorithm development and theoretical results
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationMini-Course 1: SGD Escapes Saddle Points
Mini-Course 1: SGD Escapes Saddle Points Yang Yuan Computer Science Department Cornell University Gradient Descent (GD) Task: min x f (x) GD does iterative updates x t+1 = x t η t f (x t ) Gradient Descent
More informationSuppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.
Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of
More informationApplied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationThe Randomized Newton Method for Convex Optimization
The Randomized Newton Method for Convex Optimization Vaden Masrani UBC MLRG April 3rd, 2018 Introduction We have some unconstrained, twice-differentiable convex function f : R d R that we want to minimize:
More informationNewton-MR: Newton s Method Without Smoothness or Convexity
Newton-MR: Newton s Method Without Smoothness or Convexity arxiv:1810.00303v1 [math.oc] 30 Sep 018 Fred (Farbod) Roosta Yang Liu Peng Xu Michael W. Mahoney October, 018 Abstract Establishing global convergence
More informationNumerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems
1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationOn the complexity of an Inexact Restoration method for constrained optimization
On the complexity of an Inexact Restoration method for constrained optimization L. F. Bueno J. M. Martínez September 18, 2018 Abstract Recent papers indicate that some algorithms for constrained optimization
More informationEvaluation complexity for nonlinear constrained optimization using unscaled KKT conditions and high-order models by E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos and Ph. L. Toint Report NAXYS-08-2015
More informationarxiv: v2 [math.oc] 1 Nov 2017
Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence arxiv:1710.09447v [math.oc] 1 Nov 017 Mingrui Liu, Tianbao Yang Department of Computer Science The University of
More informationOn Lagrange multipliers of trust-region subproblems
On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008
More informationarxiv: v1 [math.oc] 9 Oct 2018
Cubic Regularization with Momentum for Nonconvex Optimization Zhe Wang Yi Zhou Yingbin Liang Guanghui Lan Ohio State University Ohio State University zhou.117@osu.edu liang.889@osu.edu Ohio State University
More informationComplexity of gradient descent for multiobjective optimization
Complexity of gradient descent for multiobjective optimization J. Fliege A. I. F. Vaz L. N. Vicente July 18, 2018 Abstract A number of first-order methods have been proposed for smooth multiobjective optimization
More informationGeometry optimization
Geometry optimization Trygve Helgaker Centre for Theoretical and Computational Chemistry Department of Chemistry, University of Oslo, Norway European Summer School in Quantum Chemistry (ESQC) 211 Torre
More information