Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal
|
|
- Norma Morris
- 5 years ago
- Views:
Transcription
1 Zero-Order Methods for the Optimization of Noisy Functions Jorge Nocedal Northwestern University Simons Institute, October
2 Collaborators Albert Berahas Northwestern University Richard Byrd University of Colorado 2
3 Motivation Problem 1: min f (x) f smooth but derivatives not available 1. Scalability 2. Parallelism 3. Beyond linear models 4. But should not aim for fully quadratic model 5. Spread function evaluations effectively Problem 2: min f (x;ξ) f ( ;ξ) smooth 1. Use noise estimation techniques (Hamming 1960s) 2. Estimate good finite-difference interval h 3. Classical quasi-newton updating using finite-difference gradients 4. Deal with noise adaptively 5. Can solve problems with thousands of variables 6. Convergence to a neighborhood of solution 3
4 Nonsmooth Optimization Lewis and Overton BFGS method with the right line search is more effective in practice than bundle methods or any other approach they tried (Curtis) The Wolfe line search ensures that a convex model can be created. Only assume function is bounded below x k+1 = x k α k H k f (x k ) f (x) = x Gradient exists almost everywhere: f (x k + αd) f (x k ) + αc 1 f (x k ) T d Armijo f (x k + αd) T d c 2 f (x k ) T d Wolfe 0 < c 1 < c 2 < 1 4
5 Nonsmooth Optimization Lewis and Overton: The BFGS matrix captures the U-V structure of the objective x k+1 = x k α k H k f (x k ) Hessian approximation blows up (good thing) Never observed failures Very limited convergence results Where do we go from here? Power of Armijo-Wolfe line search not appreciated by the convex analysis community Rather than constructing a majorizing function, one constructs a convex model along the search direction 5
6 Discussion 1. The BFGS method continues to surprise 2. One of the leading algorithms for nonsmooth optimization 3. Leading approach for (deterministic) derivative-free optimization 4. This talk: Leading method for the minimization of noisy functions These observations do not apply to: 1. Structured nonsmooth optimization (e.g. lasso) 2. Stochastic objectives with cheap gradient, as in machine learning 3. Nonlinear least squares objectives; Gauss-Newton is the right approach We had not fully recognized the power and generality of quasi-newton updating 6
7 Derivative free deterministic optimization (no noise) min f (x) f is smooth Interpolation based models with trust regions (Katya) min m(x) = xt Bx + g T x s.t. x 2 Δ 1. Need (n+1)(n+2)/2 function values to define quadratic model by pure interpolation 2. Can use n points and assume minimum norm change in the Hessian 3. Arithmetic costs high: n 4 4. Placement of interpolation points is important 5. Trust region constraint needed and natural 6. Parallelizable? 7
8 BFGS with finite difference gradients: deterministic case x k+1 = x k α k H k f (x k ) f (x) x i f (x + he i ) f (x) h Invest significant effort in estimation of gradient Delegate construction of model to BFGS Interpolating gradients Modest linear algebra costs O(n) Placement of sample points on an orthogonal set BFGS is an overwriting process: no inconsistencies or ill conditioning with Armijo-Wolfe line search Gradient evaluation parallelizes easily Why now? Perception that n function evaluations per step is too high Derivative-free literature rarely compares with FD quasi-newton Already used extensively: fminunc MATLAB 8
9 Comparison: function decrease vs total # of function evaluations Schittkowski: s201 Stochastic Additive, Noise Level = 0 DFOtr FDLBFGS (FD) FDLBFGS (CD) 10 0 Schittkowski: s207 Stochastic Additive, Noise Level = 0 DFOtr FDLBFGS (FD) FDLBFGS (CD) Schittkowski: s287 Stochastic Additive, Noise Level = 0 DFOtr FDLBFGS (FD) FDLBFGS (CD) F(x)-F * F(x)-F * F(x)-F * Number of function evaluations Number of function evaluations Number of function evaluations
10 Optimization of Noisy Functions min f (x;ξ) min f (x) =φ(x) + ε(x) where f ( ;ξ) is smooth f (x) = φ(x)(1+ ε(x)) Additive and multiplicative noise. Focus on additive Outline of adaptive finite-difference BFGS method 1. Estimate noise e(x) at every iteration, 2. Possibly change h 3. Corrective Procedure in case line search fails 4. (need to modify line search) f(x) f(x) = sin(x) + cos(x) U(0,2sqrt(3)) Smooth Noisy x 10
11 Noise estimation More -Wild (2011) Noise level: σ = [var(ε(x))] 1/2 Noise estimate: ε f min f (x) =φ(x) + ε(x) At x choose a random direction v evaluate f at q +1 equally spaced points x + iβv, i = 0,...,q Compute function differences: Δ 0 f (x) = f (x) Δ j+1 f (x) = Δ j [Δf (x)] = Δ j [ f (x + β)] Δ j [ f (x)]] Compute finite diverence table: T ij = Δ j f (x + iβv) 1 < j < q 0 < i < j q σ j = γ j q 1 j q j i=0 2 T i, j γ j = ( j!)2 (2 j)! 11
12 min f (x) =sin(x) + cos(x) U(0,2 3 ) q = 6 β = 10 2 x f f 2 f 3 f 4 f 5 f 6 f e e e e e e e e e e e e e e e e e e e e e k 6.65e e e e e e 4 High order differences of a smooth function tend to zero rapidly, while differences in noise are bounded away from zero. Changes in sign, useful. Procedure is scale invariant!
13 Finite difference itervals Once noise estimate ε f has been chosen: Forward difference: h = 8 1/4 ( ε f µ 2 ) 1/2 Central difference: h = 3 1/3 ( ε f µ 3 ) 1/3 µ 2 = max x I f (x) µ 3 f (x) Bad estimates of second and third derivatives can make cause problems (not often) 13
14 Adaptive Finite Difference L-BFGS Method Estimate noise ε f Compute h by forward or central differences [(4-8) function evaluations] Compute g k While convergence test not satisfied: d = H k g k [L-BFGS procedure] (x +, f +, flag) = LineSearch(x k, f k,g k,d k, f s ) IF flag=1 endif [line search failed] (x +, f +,h) = Recovery(x k, f k,g k,d k,max iter ) x k+1 = x +, f k+1 = f + Compute g k+1 [finite differences using h] s k = x k+1 x k, y k = g k+1 g k Discard (s k, y k ) if s k T y k 0 k = k +1 endwhile 14
15 Adaptive Finite Difference L-BFGS Method Estimate noise ε f Compute h by forward or central differences [(4-8) function evaluations] Compute g k While convergence test not satisfied: d = H k g k [L-BFGS procedure] (x +, f +, flag) = LineSearch(x k, f k,g k,d k, f s ) IF flag=1 endif [line search failed] (x +, f +,h) = Recovery(x k, f k,g k,d k,max iter ) x k+1 = x +, f k+1 = f + Compute g k+1 [finite differences using h] s k = x k+1 x k, y k = g k+1 g k Discard (s k, y k ) if s k T y k 0 k = k +1 endwhile 15
16 Corrective Procedure Compute new noise estimate ε f along search direction d k ; Compute corresponding h IF h [0.7h,1.5h] h = h, x + = x k, f + = f k [update h; do not move] ELSE x + = x k + hd k / d k, f + = f (x + ) [perturbation] If x + satisfies the relaxed Armijo condition return x +,h else Finite difference Stencil (Kelley) x s if f + f s and f + f k accept x + else if f k > f s and f + > f s x + = x s, f + = f s else x + = x k f + = f k compute new ε f, h [random v] end if end if ENDIF 16
17 Line Search BFGS method requires Armijo-Wolfe line search f (x k + αd) f (x k ) + αc 1 f (x k )d f (x k + αd) T d c 2 f (x k ) T d Wolfe Armijo Deterministic case: always possible if f is bounded below Can be problematic in the noisy case. Direction d may not be a descent direction for smooth underlying function Strategy: try to satisfy both but limit the number of attempts If first trial point (unit steplength) is not acceptable relax: f (x k + αd) f (x k ) + αc 1 f (x k )d + 2ε f relaxed Armijo Three outcomes: a) both satisfied; b) only Armijo; c) none 17
18 10 3 Schittkowski: s286 Stochastic Additive, Noise Level = 1e-02 DFOtr FDLBFGS (FD) FDLBFGS (CD) 10 5 Schittkowski: s287 Stochastic Additive, Noise Level = 1e-02 DFOtr FDLBFGS (FD) FDLBFGS (CD) F(x)-F * 10 1 F(x)-F * Number of function evaluations Number of function evaluations
19 A simple convergence result: constant steplength Assumptions: 1. f (x) = φ(x) + ε(x). Function φ is twice differentiable 2. Strong convexity. µi 2 φ(x) LI x R n 3. H k has bounded eigenvalues 4. Bounded noise. ε(x) ε x R n Theorem. If α < Then for all k 1 β (1 β)l + β 2 L for any β (0,1) φ(x k ) φ * (1 ρ) k (φ(x 0 ) φ * η / ρ) + η / ρ ρ = 1 αµ(1 β) η = α[ 1 α L β + α L 2 ]ε 2 19
20 END 20
Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods. Jorge Nocedal
Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods Jorge Nocedal Northwestern University Huatulco, Jan 2018 1 Collaborators Albert Berahas Northwestern University Richard Byrd University
More informationDerivative-Free Optimization of Noisy Functions via Quasi-Newton Methods: Experiments and Theory. Richard Byrd
Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods: Experiments and Theory Richard Byrd University of Colorado, Boulder Albert Berahas Northwestern University Jorge Nocedal Northwestern
More informationAn Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal
An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve
More informationIPAM Summer School Optimization methods for machine learning. Jorge Nocedal
IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep
More informationNonlinear Optimization Methods for Machine Learning
Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks
More informationIntroduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems
New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationSub-Sampled Newton Methods for Machine Learning. Jorge Nocedal
Sub-Sampled Newton Methods for Machine Learning Jorge Nocedal Northwestern University Goldman Lecture, Sept 2016 1 Collaborators Raghu Bollapragada Northwestern University Richard Byrd University of Colorado
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationNumerical Methods I Solving Nonlinear Equations
Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationIntroduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1
A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions R. Yousefpour 1 1 Department Mathematical Sciences, University of Mazandaran, Babolsar, Iran; yousefpour@umz.ac.ir
More informationUnconstrained Optimization
1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationStochastic Optimization Methods for Machine Learning. Jorge Nocedal
Stochastic Optimization Methods for Machine Learning Jorge Nocedal Northwestern University SIAM CSE, March 2017 1 Collaborators Richard Byrd R. Bollagragada N. Keskar University of Colorado Northwestern
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationStep lengths in BFGS method for monotone gradients
Noname manuscript No. (will be inserted by the editor) Step lengths in BFGS method for monotone gradients Yunda Dong Received: date / Accepted: date Abstract In this paper, we consider how to directly
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationSub-Sampled Newton Methods
Sub-Sampled Newton Methods F. Roosta-Khorasani and M. W. Mahoney ICSI and Dept of Statistics, UC Berkeley February 2016 F. Roosta-Khorasani and M. W. Mahoney (UCB) Sub-Sampled Newton Methods Feb 2016 1
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationNumerical Optimization: Basic Concepts and Algorithms
May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some
More informationSecond-Order Methods for Stochastic Optimization
Second-Order Methods for Stochastic Optimization Frank E. Curtis, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationAn Inexact Newton Method for Nonlinear Constrained Optimization
An Inexact Newton Method for Nonlinear Constrained Optimization Frank E. Curtis Numerical Analysis Seminar, January 23, 2009 Outline Motivation and background Algorithm development and theoretical results
More informationAn Inexact Newton Method for Optimization
New York University Brown Applied Mathematics Seminar, February 10, 2009 Brief biography New York State College of William and Mary (B.S.) Northwestern University (M.S. & Ph.D.) Courant Institute (Postdoc)
More informationAn Inexact Newton Method for Nonconvex Equality Constrained Optimization
Noname manuscript No. (will be inserted by the editor) An Inexact Newton Method for Nonconvex Equality Constrained Optimization Richard H. Byrd Frank E. Curtis Jorge Nocedal Received: / Accepted: Abstract
More informationOPER 627: Nonlinear Optimization Lecture 9: Trust-region methods
OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods Department of Statistical Sciences and Operations Research Virginia Commonwealth University Sept 25, 2013 (Lecture 9) Nonlinear Optimization
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one
More informationInfeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization
Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationPDE-Constrained and Nonsmooth Optimization
Frank E. Curtis October 1, 2009 Outline PDE-Constrained Optimization Introduction Newton s method Inexactness Results Summary and future work Nonsmooth Optimization Sequential quadratic programming (SQP)
More informationOptimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng
Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38
More informationWorst-Case Complexity Guarantees and Nonconvex Smooth Optimization
Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex
More informationGradient Estimation for Attractor Networks
Gradient Estimation for Attractor Networks Thomas Flynn Department of Computer Science Graduate Center of CUNY July 2017 1 Outline Motivations Deterministic attractor networks Stochastic attractor networks
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationQuasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization
Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More informationAn Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization
An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)
EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in
More informationMA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS
MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS 1. Please write your name and student number clearly on the front page of the exam. 2. The exam is
More information1 Numerical optimization
Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms
More informationQuasi-Newton Methods. Javier Peña Convex Optimization /36-725
Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex
More informationMotivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)
AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 5: Nonlinear Equations Dianne P. O Leary c 2001, 2002, 2007 Solving Nonlinear Equations and Optimization Problems Read Chapter 8. Skip Section 8.1.1.
More informationMethods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent
Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived
More informationChapter 3 Numerical Methods
Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2
More informationIntroduction to Nonlinear Optimization Paul J. Atzberger
Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,
More informationBindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.
Problem du jour Week 6: Monday, Mar 7 Show that for any initial guess x 0 > 0, Newton iteration on f(x) = x 2 a produces a decreasing sequence x 1 x 2... x n a. What is the rate of convergence if a = 0?
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationImproving L-BFGS Initialization for Trust-Region Methods in Deep Learning
Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Jacob Rafati http://rafati.net jrafatiheravi@ucmerced.edu Ph.D. Candidate, Electrical Engineering and Computer Science University
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationAdvanced computational methods X Selected Topics: SGD
Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationLecture 17: October 27
0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationInterpolation-Based Trust-Region Methods for DFO
Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationNumerical solutions of nonlinear systems of equations
Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points
More information14. Nonlinear equations
L. Vandenberghe ECE133A (Winter 2018) 14. Nonlinear equations Newton method for nonlinear equations damped Newton method for unconstrained minimization Newton method for nonlinear least squares 14-1 Set
More informationStochastic Quasi-Newton Methods
Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationMath 409/509 (Spring 2011)
Math 409/509 (Spring 2011) Instructor: Emre Mengi Study Guide for Homework 2 This homework concerns the root-finding problem and line-search algorithms for unconstrained optimization. Please don t hesitate
More information1 Numerical optimization
Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More informationOPER 627: Nonlinear Optimization Lecture 14: Mid-term Review
OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationNumerical Optimization of Partial Differential Equations
Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada
More informationPenalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.
AMSC 607 / CMSC 878o Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 3: Penalty and Barrier Methods Dianne P. O Leary c 2008 Reference: N&S Chapter 16 Penalty and Barrier
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationStatic unconstrained optimization
Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R
More informationImproved Damped Quasi-Newton Methods for Unconstrained Optimization
Improved Damped Quasi-Newton Methods for Unconstrained Optimization Mehiddin Al-Baali and Lucio Grandinetti August 2015 Abstract Recently, Al-Baali (2014) has extended the damped-technique in the modified
More informationNumerical Methods for PDE-Constrained Optimization
Numerical Methods for PDE-Constrained Optimization Richard H. Byrd 1 Frank E. Curtis 2 Jorge Nocedal 2 1 University of Colorado at Boulder 2 Northwestern University Courant Institute of Mathematical Sciences,
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationA New Penalty-SQP Method
Background and Motivation Illustration of Numerical Results Final Remarks Frank E. Curtis Informs Annual Meeting, October 2008 Background and Motivation Illustration of Numerical Results Final Remarks
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationLine search methods with variable sample size for unconstrained optimization
Line search methods with variable sample size for unconstrained optimization Nataša Krejić Nataša Krklec June 27, 2011 Abstract Minimization of unconstrained objective function in the form of mathematical
More informationBenchmarking Derivative-Free Optimization Algorithms
ARGONNE NATIONAL LABORATORY 9700 South Cass Avenue Argonne, Illinois 60439 Benchmarking Derivative-Free Optimization Algorithms Jorge J. Moré and Stefan M. Wild Mathematics and Computer Science Division
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More informationOutline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems
Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More informationReduced-Hessian Methods for Constrained Optimization
Reduced-Hessian Methods for Constrained Optimization Philip E. Gill University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong 11th US & Mexico Workshop on Optimization and its
More informationStochastic Gradient Methods for Large-Scale Machine Learning
Stochastic Gradient Methods for Large-Scale Machine Learning Leon Bottou Facebook A Research Frank E. Curtis Lehigh University Jorge Nocedal Northwestern University 1 Our Goal 1. This is a tutorial about
More informationLine Search Algorithms
Lab 1 Line Search Algorithms Investigate various Line-Search algorithms for numerical opti- Lab Objective: mization. Overview of Line Search Algorithms Imagine you are out hiking on a mountain, and you
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationSecond order machine learning
Second order machine learning Michael W. Mahoney ICSI and Department of Statistics UC Berkeley Michael W. Mahoney (UC Berkeley) Second order machine learning 1 / 88 Outline Machine Learning s Inverse Problem
More informationComplexity analysis of second-order algorithms based on line search for smooth nonconvex optimization
Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,
More information