LBFGS. John Langford, Large Scale Machine Learning Class, February 5. (post presentation version)
|
|
- Russell Shepherd
- 5 years ago
- Views:
Transcription
1 LBFGS John Langford, Large Scale Machine Learning Class, February 5 (post presentation version)
2 We are still doing Linear Learning Features: a vector x R n Label: y R Goal: Learn w R n such that ŷ w (x) = i w i x i close to y. is
3 But, this time in a batch fashion Initialize w Repeatedly: 1 Let ŷ w (x) = i w i x i L(ŷ w (x),y) 2 Let g i = (x,y) w i 3 Compute update direction d(g) 4 Update weights w i w i + d i (g)
4 The BFGS Update d(g) = Dg for some Direction matrix D What is D?
5 The BFGS Update d(g) = Dg for some Direction matrix D What is D? D is dened purely in terms of two empirical observations: g = gnew gprev w = wnew wprev
6 Assertion 1 i g i w i = g w should be positive for convex functions. convex function Change in weight*gradient convex function a gradient another gradient parameter
7 Assertion 2 T kj = g w k j g w i i i = g w g w direction w and vice versa. Transforms direction g to
8 Assertion 2 T kj = g w k j g w i i i = g w g w Transforms direction g to direction w and vice versa. A matrix is a linear function which transforms one vector into another. j k T kj v j = v k T kj = j g k w j v j i g i w i k v k g k w j i g i w i = g k = w j j v j w j i g i w i k g k v k i g i w i
9 3 vectors, v, w, g v w g
10 3 vectors, v, w, g v <v,w> w g
11 3 vectors, v, w, g v <v,w> w g<v,w> g
12 Assertion 3 Let δ kj = I (k = j). if k = j then 1 and 0 otherwise S kj = δ kj T kj Subtracts transform T kj while keeping everything else.
13 Assertion 3 Let δ kj = I (k = j). if k = j then 1 and 0 otherwise S kj = δ kj T kj Subtracts transform T kj while keeping everything else. S kj v j = v k g k j j v k S kj = v k w j j v j w j i g i w i k g k v k i g i w i
14 Assertion 4 F kj = w w k j g w i i i Hessian. = w w g w is an estimate of the inverse
15 Assertion 4 F kj = w w k j g w i i i Hessian. H kj = 2 L w k w j = w w g w = g k w j is an estimate of the inverse
16 Assertion 4 F kj = w w k j g w i i i Hessian. H kj = 2 L w k w j So, Hw g. = w w g w = g k w j is an estimate of the inverse
17 Assertion 4 F kj = w w k j g w i i i Hessian. H kj = 2 L w k w j = w w g w = g k w j is an estimate of the inverse So, Hw g. So an inverse should satisfy Fg w.
18 The BFGS direction D kj il S ik D il S lj + F kj Or in recursive matrix form: D t = S t D t 1 S t + F t
19 The BFGS direction D kj il S ik D il S lj + F kj Or in recursive matrix form: D t = S t D t 1 S t + F t Unwinding, we get: D t = S t S t 1...S 1 D 0 S 1 S 2...S t +S t...s 2 F 1 S 2...S t S t F t 1 S t + F t
20 The BFGS direction D kj il S ik D il S lj + F kj Or in recursive matrix form: D t = S t D t 1 S t + F t Unwinding, we get: D t = S t S t 1...S 1 D 0 S 1 S 2...S t +S t...s 2 F 1 S 2...S t S t F t 1 S t + F t LBFGS is the low rank approximation. L t = S t...s t m D 0 S t m...s t +S t...s t m+1 F t m S t m+1...s t S t F t 1 S t + F t
21 Questions What is D 0? How do you make it fast? How do you start? What if loss goes up? How do you regularize?
22 Questions What is D 0? δ jk 2 L w j w j How do you make it fast? How do you start? is a reasonable choice. What if loss goes up? How do you regularize?
23 Questions What is D 0? δ jk 2 L w j w j is a reasonable choice. How do you make it fast? All operations decompose into dense vector products. How do you start? What if loss goes up? How do you regularize?
24 Questions What is D 0? δ jk 2 L w j w j is a reasonable choice. How do you make it fast? All operations decompose into dense vector products. How do you start? Seed w with an online pass rst. Initially, step size may be crazy. Make a second pass computing the second derivative in the chosen direction. What if loss goes up? How do you regularize?
25 Questions What is D 0? δ jk 2 L w j w j is a reasonable choice. How do you make it fast? All operations decompose into dense vector products. How do you start? Seed w with an online pass rst. Initially, step size may be crazy. Make a second pass computing the second derivative in the chosen direction. What if loss goes up? Backstep along previous direction. How do you regularize?
26 Questions What is D 0? δ jk 2 L w j w j is a reasonable choice. How do you make it fast? All operations decompose into dense vector products. How do you start? Seed w with an online pass rst. Initially, step size may be crazy. Make a second pass computing the second derivative in the chosen direction. What if loss goes up? Backstep along previous direction. How do you regularize? Regularized loss has the form: L (ŷ, y) = L(ŷ, y) + c 2 i w 2. Imposing i regularization is a once-per-pass dense operation.
27 How do you restart with new data?
28 How do you restart with new data? 2 Curvature at solution 1.5 f(x) Loss around solution x Compute and store: r i = 2 L w i w i On resumption, regularize by i r i(w i o i ) 2 where is the old weight value. o i
29 Why LBFGS? Theorem: If L is quadratic and an exact line search was done for the step size, a variant satises e t C 2 2t for some C.
30 Why LBFGS? Theorem: If L is quadratic and an exact line search was done for the step size, a variant satises e t C 2 2t for some C. Of course, it's rarely quadratic and you never perform exact line search.
31 What happens here? Absolute Value x-1 f(x) x
32 What happens here? Absolute Value 1 x f(x) What happens to a true Newton step here? x
33 References [L] Nocedal, J., Updating quasi-newton matrices with limited storage, Math. of Comp., 35, [B] Broyden, C., The convergence of a class of double-rank minimization algorithms, Journal of the Inst. of Math. and Its Applications, 6: [F] Fletcher, R., A New Approach to Variable Metric Algorithms, Computer Journal 13 (3): [G] Goldfarb, D., A Family of Variable Metric Updates Derived by Variational Means, Math. of Comp. 24 (109): [S] Shanno, D. Conditioning of quasi-newton methods for function minimization, Math. of Comp. 24(111):
34 More References Incremental LBFGS Olivier Chapelle
Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning
Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Jacob Rafati http://rafati.net jrafatiheravi@ucmerced.edu Ph.D. Candidate, Electrical Engineering and Computer Science University
More information2. Quasi-Newton methods
L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization
More informationQuasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationQuasi-Newton Methods. Javier Peña Convex Optimization /36-725
Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex
More informationMethods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent
Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived
More informationMATH 4211/6211 Optimization Quasi-Newton Method
MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationA projected Hessian for full waveform inversion
CWP-679 A projected Hessian for full waveform inversion Yong Ma & Dave Hale Center for Wave Phenomena, Colorado School of Mines, Golden, CO 80401, USA (c) Figure 1. Update directions for one iteration
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationData Mining (Mineria de Dades)
Data Mining (Mineria de Dades) Lluís A. Belanche belanche@lsi.upc.edu Soft Computing Research Group Dept. de Llenguatges i Sistemes Informàtics (Software department) Universitat Politècnica de Catalunya
More informationQuasi-Newton Methods
Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationStochastic Quasi-Newton Methods
Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationOptimization for neural networks
0 - : Optimization for neural networks Prof. J.C. Kao, UCLA Optimization for neural networks We previously introduced the principle of gradient descent. Now we will discuss specific modifications we make
More informationQuasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization
Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)
More informationLinear Regression. S. Sumitra
Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D
More informationComparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems
International Journal of Scientific and Research Publications, Volume 3, Issue 10, October 013 1 ISSN 50-3153 Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationOptimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng
Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationOptimization II: Unconstrained Multivariable
Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Optimization II: Unconstrained
More informationIncremental Quasi-Newton methods with local superlinear convergence rate
Incremental Quasi-Newton methods wh local superlinear convergence rate Aryan Mokhtari, Mark Eisen, and Alejandro Ribeiro Department of Electrical and Systems Engineering Universy of Pennsylvania Int. Conference
More informationVariable Metric Stochastic Approximation Theory
Variable Metric Stochastic Approximation Theory Abstract We provide a variable metric stochastic approximation theory. In doing so, we provide a convergence theory for a large class of online variable
More informationStochastic Optimization Methods for Machine Learning. Jorge Nocedal
Stochastic Optimization Methods for Machine Learning Jorge Nocedal Northwestern University SIAM CSE, March 2017 1 Collaborators Richard Byrd R. Bollagragada N. Keskar University of Colorado Northwestern
More informationMinimax Design of Complex-Coefficient FIR Filters with Low Group Delay
Minimax Design of Complex-Coefficient FIR Filters with Low Group Delay Wu-Sheng Lu Takao Hinamoto Dept. of Elec. and Comp. Engineering Graduate School of Engineering University of Victoria Hiroshima University
More informationReduced-Hessian Methods for Constrained Optimization
Reduced-Hessian Methods for Constrained Optimization Philip E. Gill University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong 11th US & Mexico Workshop on Optimization and its
More informationOptimization II: Unconstrained Multivariable
Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1
More informationImproved Damped Quasi-Newton Methods for Unconstrained Optimization
Improved Damped Quasi-Newton Methods for Unconstrained Optimization Mehiddin Al-Baali and Lucio Grandinetti August 2015 Abstract Recently, Al-Baali (2014) has extended the damped-technique in the modified
More informationMA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS
MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS 1. Please write your name and student number clearly on the front page of the exam. 2. The exam is
More informationDeep Learning & Neural Networks Lecture 4
Deep Learning & Neural Networks Lecture 4 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 23, 2014 2/20 3/20 Advanced Topics in Optimization Today we ll briefly
More informationExtra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization M. Al-Baali y December 7, 2000 Abstract This pape
SULTAN QABOOS UNIVERSITY Department of Mathematics and Statistics Extra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization by M. Al-Baali December 2000 Extra-Updates
More informationLecture 7 Unconstrained nonlinear programming
Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,
More informationOptimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23
Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is
More informationLogistic Regression. Mohammad Emtiyaz Khan EPFL Oct 8, 2015
Logistic Regression Mohammad Emtiyaz Khan EPFL Oct 8, 2015 Mohammad Emtiyaz Khan 2015 Classification with linear regression We can use y = 0 for C 1 and y = 1 for C 2 (or vice-versa), and simply use least-squares
More informationSTAT Advanced Bayesian Inference
1 / 8 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics March 5, 2018 Distributional approximations 2 / 8 Distributional approximations are useful for quick inferences, as starting
More informationUniversity of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number
Limited-Memory Matrix Methods with Applications 1 Tamara Gibson Kolda 2 Applied Mathematics Program University of Maryland at College Park Abstract. The focus of this dissertation is on matrix decompositions
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationAlgorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Language Modeling III Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Office hours on website but no OH for Taylor until next week. Efficient Hashing Closed address
More informationLecture 18: November Review on Primal-dual interior-poit methods
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationA Trust-region-based Sequential Quadratic Programming Algorithm
Downloaded from orbit.dtu.dk on: Oct 19, 2018 A Trust-region-based Sequential Quadratic Programming Algorithm Henriksen, Lars Christian; Poulsen, Niels Kjølstad Publication date: 2010 Document Version
More informationConjugate Directions for Stochastic Gradient Descent
Conjugate Directions for Stochastic Gradient Descent Nicol N Schraudolph Thore Graepel Institute of Computational Science ETH Zürich, Switzerland {schraudo,graepel}@infethzch Abstract The method of conjugate
More informationSeminal papers in nonlinear optimization
Seminal papers in nonlinear optimization Nick Gould, CSED, RAL, Chilton, OX11 0QX, England (n.gould@rl.ac.uk) December 7, 2006 The following papers are classics in the field. Although many of them cover
More informationExtra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization 1
journal of complexity 18, 557 572 (2002) doi:10.1006/jcom.2001.0623 Extra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization 1 M. Al-Baali Department of Mathematics
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More informationOptimization Methods for Machine Learning
Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction
More informationDENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS
DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS by Johannes Brust, Oleg Burdaov, Jennifer B. Erway, and Roummel F. Marcia Technical Report 07-, Department of Mathematics and Statistics, Wae
More informationECE580 Exam 2 November 01, Name: Score: / (20 points) You are given a two data sets
ECE580 Exam 2 November 01, 2011 1 Name: Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc. I do
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one
More informationNonlinear Optimization Methods for Machine Learning
Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks
More informationTHE RELATIONSHIPS BETWEEN CG, BFGS, AND TWO LIMITED-MEMORY ALGORITHMS
Furman University Electronic Journal of Undergraduate Mathematics Volume 12, 5 20, 2007 HE RELAIONSHIPS BEWEEN CG, BFGS, AND WO LIMIED-MEMORY ALGORIHMS ZHIWEI (ONY) QIN Abstract. For the solution of linear
More informationMath 408A: Non-Linear Optimization
February 12 Broyden Updates Given g : R n R n solve g(x) = 0. Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x k, B k ) compute (x k+1, B x+1 ) as follows: Solve B k s k = g(x
More informationBFGS WITH UPDATE SKIPPING AND VARYING MEMORY. July 9, 1996
BFGS WITH UPDATE SKIPPING AND VARYING MEMORY TAMARA GIBSON y, DIANNE P. O'LEARY z, AND LARRY NAZARETH x July 9, 1996 Abstract. We give conditions under which limited-memory quasi-newton methods with exact
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationSecond-Order Methods for Stochastic Optimization
Second-Order Methods for Stochastic Optimization Frank E. Curtis, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationNonlinearOptimization
1/35 NonlinearOptimization Pavel Kordík Department of Computer Systems Faculty of Information Technology Czech Technical University in Prague Jiří Kašpar, Pavel Tvrdík, 2011 Unconstrained nonlinear optimization,
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More informationIntroduction to Optimization
Introduction to Optimization Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Machine learning is important and interesting The general concept: Fitting models to data So far Machine
More information1 Numerical optimization
Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms
More informationEmpirical Risk Minimization and Optimization
Statistical Machine Learning Notes 3 Empirical Risk Minimization and Optimization Instructor: Justin Domke 1 Empirical Risk Minimization Empirical Risk Minimization is a fancy sounding name for a very
More informationThe multidimensional moment-constrained maximum entropy problem: A BFGS algorithm with constraint scaling
The multidimensional moment-constrained maximum entropy problem: A BFGS algorithm with constraint scaling Rafail V. Abramov Department of Mathematics, Statistics and Computer Science University of Illinois
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More information1. Introduction Let the least value of an objective function F (x), x2r n, be required, where F (x) can be calculated for any vector of variables x2r
DAMTP 2002/NA08 Least Frobenius norm updating of quadratic models that satisfy interpolation conditions 1 M.J.D. Powell Abstract: Quadratic models of objective functions are highly useful in many optimization
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More information1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)
Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from
More informationMinimum Norm Symmetric Quasi-Newton Updates Restricted to Subspaces
MATHEMATICS OF COMPUTATION, VOLUME 32, NUMBER 143 JULY 1978, PAGES 829-837 Minimum Norm Symmetric Quasi-Newton Updates Restricted to Subspaces By Robert B. Schnabel* Abstract. The Davidon-Fletcher-Powell
More informationAn Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal
An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationTrust-Region Optimization Methods Using Limited-Memory Symmetric Rank-One Updates for Off-The-Shelf Machine Learning
Trust-Region Optimization Methods Using Limited-Memory Symmetric Rank-One Updates for Off-The-Shelf Machine Learning by Jennifer B. Erway, Joshua Griffin, Riadh Omheni, and Roummel Marcia Technical report
More informationMotivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning
Convex Optimization Lecture 15 - Gradient Descent in Machine Learning Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 21 Today s Lecture 1 Motivation 2 Subgradient Method 3 Stochastic
More informationAn Iterative Descent Method
Conjugate Gradient: An Iterative Descent Method The Plan Review Iterative Descent Conjugate Gradient Review : Iterative Descent Iterative Descent is an unconstrained optimization process x (k+1) = x (k)
More informationStatistics 580 Optimization Methods
Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of
More informationLinear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4
Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMarch 8, 2010 MATH 408 FINAL EXAM SAMPLE
March 8, 200 MATH 408 FINAL EXAM SAMPLE EXAM OUTLINE The final exam for this course takes place in the regular course classroom (MEB 238) on Monday, March 2, 8:30-0:20 am. You may bring two-sided 8 page
More informationA Primer on Solving Systems of Linear Equations
A Primer on Solving Systems of Linear Equations In Signals and Systems, as well as other subjects in Unified, it will often be necessary to solve systems of linear equations, such as x + 2y + z = 2x +
More informationMultivariate Newton Minimanization
Multivariate Newton Minimanization Optymalizacja syntezy biosurfaktantu Rhamnolipid Rhamnolipids are naturally occuring glycolipid produced commercially by the Pseudomonas aeruginosa species of bacteria.
More informationCubic regularization in symmetric rank-1 quasi-newton methods
Math. Prog. Comp. (2018) 10:457 486 https://doi.org/10.1007/s12532-018-0136-7 FULL LENGTH PAPER Cubic regularization in symmetric rank-1 quasi-newton methods Hande Y. Benson 1 David F. Shanno 2 Received:
More informationImproving L-BFGS Initialization For Trust-Region Methods In Deep Learning
Improving L-BFGS Initialization For Trust-Region Methods In Deep Learning Jacob Rafati Electrical Engineering and Computer Science University of California, Merced Merced, CA 95340 USA jrafatiheravi@ucmerced.edu
More informationPorting a sphere optimization program from LAPACK to ScaLAPACK
Porting a sphere optimization program from LAPACK to ScaLAPACK Mathematical Sciences Institute, Australian National University. For presentation at Computational Techniques and Applications Conference
More informationRecommendation Systems
Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS
ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS Anders FORSGREN Tove ODLAND Technical Report TRITA-MAT-203-OS-03 Department of Mathematics KTH Royal
More informationECS550NFB Introduction to Numerical Methods using Matlab Day 2
ECS550NFB Introduction to Numerical Methods using Matlab Day 2 Lukas Laffers lukas.laffers@umb.sk Department of Mathematics, University of Matej Bel June 9, 2015 Today Root-finding: find x that solves
More informationHigh Order Methods for Empirical Risk Minimization
High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu IPAM Workshop of Emerging Wireless
More informationAn Algorithm for Unconstrained Quadratically Penalized Convex Optimization (post conference version)
7/28/ 10 UseR! The R User Conference 2010 An Algorithm for Unconstrained Quadratically Penalized Convex Optimization (post conference version) Steven P. Ellis New York State Psychiatric Institute at Columbia
More informationAM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α
More informationSub-Sampled Newton Methods
Sub-Sampled Newton Methods F. Roosta-Khorasani and M. W. Mahoney ICSI and Dept of Statistics, UC Berkeley February 2016 F. Roosta-Khorasani and M. W. Mahoney (UCB) Sub-Sampled Newton Methods Feb 2016 1
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)
More informationPrincipal Component Analysis (PCA) for Sparse High-Dimensional Data
AB Principal Component Analysis (PCA) for Sparse High-Dimensional Data Tapani Raiko, Alexander Ilin, and Juha Karhunen Helsinki University of Technology, Finland Adaptive Informatics Research Center Principal
More informationLecture V. Numerical Optimization
Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19 Isomorphism I We describe minimization problems: to maximize
More information