Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Size: px
Start display at page:

Download "Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method"

Transcription

1 Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method

2 Shiqian Ma, MAT-258A: Numerical Optimization Gradient method Classical gradient method: to minimize a differentiable convex function f min x R n f(x) The algorithm: choose x 0 and repeat x k+1 = x k t k f(x k ), k = 0, 1, 2,... Note that p k := f(x k ) is a descent direction. (we call p k a descent direction if p k f k < 0) question: how to choose step size t k?

3 Shiqian Ma, MAT-258A: Numerical Optimization 3 step size rules exact line search: t k = argmin t f(x k t f(x k )) fixed: t k constant backtracking line search (most practical) Backtracking line search: initialize t k at some ˆt > 0 (for example, ˆt = 1), repeat t k := βt k until f(x t k f(x)) < f(x) αt k f(x) 2 2 This is called Amijo condition. Two parameters: 0 < β < 1 and 0 < α 0.5

4 Shiqian Ma, MAT-258A: Numerical Optimization 4 General line search at x k, compute a descent direction p k, what step size to take to move? x k+1 = x k + t k p k exact line search: min t f(x k + tp k ) Amijo condition: f(x k + tp k ) f(x k ) + c 1 t f(x k ) p k Wolfe condition: f(x k + tp k ) f(x k ) + c 1 t f(x k ) p k f(x k + tp k ) p k c 2 f(x k ) p k with 0 < c 1 < c 2 < 1.

5 Shiqian Ma, MAT-258A: Numerical Optimization Analysis of gradient method x k+1 = x k t k f(x k ), k = 0, 1, 2,... with fixed step size or backtracking line search assumptions: f is convex and differentiable with dom f = R n f(x) is Lipschitz continuous with parameter L > 0 optimal value f = inf x f(x) is finite and attained at x

6 Shiqian Ma, MAT-258A: Numerical Optimization 6 Lipschitz continuity: a function h is called Lipschitz continuous with Lipschitz constant L, if h(x) h(y) L x y, x, y dom h. If f(x) is Lipschitz-continuous with parameter L > 0, then (quadratic upper bound) f(y) f(x) + f(x) (y x) + L 2 x y 2 2, x, y dom f if dom f = R n and f has a minimizer x, then 1 2L f(x) 2 2 f(x) f(x ) L 2 x x 2 2

7 Shiqian Ma, MAT-258A: Numerical Optimization 7 Strongly convex f is strongly convex with parameter µ > 0 if f(x) µ 2 x 2 2 is convex First-order condition f(y) f(x) + f(x) (y x) + µ 2 x y 2 2, x, y dom f Second-order condition 2 f(x) µi, x dom f if dom f = R n, then f has a minimizer x, and µ 2 x x 2 2 f(x) f(x ) 1 2µ f(x) 2 2

8 Shiqian Ma, MAT-258A: Numerical Optimization Analysis of constant step size recall quadratic upper bound: f(y) f(x) + f(x), y x + L 2 y x 2 2 plug in y = x t f(x) to obtain f(x t f(x)) f(x) t let x + = x t f(x) and assume 0 < t 1/L, ( 1 Lt ) f(x) f(x + ) f(x) t 2 f(x) 2 2 f + f(x), x x t 2 f(x) 2 2 = f + 1 2t ( x x 2 2 x x t f(x) 2 2) = f + 1 2t ( x x 2 2 x + x 2 2)

9 Shiqian Ma, MAT-258A: Numerical Optimization 9 take x = x i 1, x + = x i, t i = t, and the bounds for i = 1,..., k: k ( i=1 f(x i ) f ) 1 k ( 2t ( i=1 x i 1 x 2 2 x i x 2) 2 x 0 x 2 2 x k x 2) 2 = 1 2t since f(x i ) is non-increasing, f(x k ) f 1 k 1 2t x0 x 2 2 k i=1 ( f(x i ) f ) 1 2kt x0 x 2 2 conclusion: number of iterations to reach f(x k ) f ɛ is O(1/ɛ)

10 Shiqian Ma, MAT-258A: Numerical Optimization Analysis for strongly convex functions faster convergence rate with additional assumption of strong convexity analysis for exact line search: recall from quadratic upper bound f(x t f(x)) f(x) t(1 Lt 2 ) f(x) 2 2 use x + = argmin t f(x t f(x)) to obtain f(x + ) f(x 1 1 f(x)) f(x) L 2L f(x) 2 2 subtract f from both sides f(x + ) f f(x) f 1 2L f(x) 2 2 now use strong convexity: f(x) f 1 2µ f(x) 2 2 f(x + ) f (1 µ L )(f(x) f )

11 Shiqian Ma, MAT-258A: Numerical Optimization 11 therefore f(x k ) f (1 µ L )k (f(x 0 ) f ) conclusion: number of iterations to reach f(x k ) f ɛ is log((f(x 0 ) f )/ɛ) L ( f(x 0 log(1 µ/l) 1 µ log ) f ) ɛ roughly proportional to condition number L/µ when it is large

12 Shiqian Ma, MAT-258A: Numerical Optimization 12 A quadratic example quadratic example f(x) = (1/2)(x γx 2 2) (γ > 1) with exact line search, starting at x 0 = (γ, 1) f(x k ) = ( ) 2k γ 1 f(x 0 ) γ + 1 x k x 2 = x 0 x 2 ( γ 1 γ + 1 ) k if γ = 10 4, k = 100, then ( γ 1 γ+1) k = 0.98 gradient method can be very slow, and very much dependent on scaling

13 Shiqian Ma, MAT-258A: Numerical Optimization 13 The contour is more eccentric, and gradient method is slow.

14 Shiqian Ma, MAT-258A: Numerical Optimization 14 A non-quadratic example f(x 1, x 2 ) = exp(x 1 + 3x 2 0.1) + exp(x 1 3x 2 0.1) + exp( x 1 0.1) its gradient is f(x 1, x 2 ) = ( ) exp(x1 + 3x 2 0.1) + exp(x 1 3x 2 0.1) exp( x 1 0.1) 3 exp(x 1 + 3x 2 0.1) 3 exp(x 1 3x 2 0.1)

15 Shiqian Ma, MAT-258A: Numerical Optimization 15 run gradient method with line search (α = 0.1, β = 0.7). upper: f(x k ) f ; lower: f(x k ) 2

16 Shiqian Ma, MAT-258A: Numerical Optimization 16 Convergence rate sublinear rate: r k c/k p linear rate: r k c(1 q) k quadratic rate: r k+1 crk 2 r k can be f(x k ) f, x k x 2, or f(x k ) 2 ; c is some constant

17 Shiqian Ma, MAT-258A: Numerical Optimization Newton s Method assume f(x) is twice continuously differentiable and convex given x k, use a quadratic function to approximate f(x) locally: use Taylor s expansion: f(x k + p k ) = f(x k ) + f(x k ) p k pk 2 f(x k )p k choose p k such that this quadratic approximation is minimized: (pure) Newton method damped Newton method f(x k ) + 2 f(x k )p k = 0 x k+1 = x k 2 f(x k ) 1 f(x k ) x k+1 = x k t k 2 f(x k ) 1 f(x k )

18 Shiqian Ma, MAT-258A: Numerical Optimization 18 advantages: fast convergence, affine invariance affine invariant means: independent of linear changes of coordinates for example: Newton iterates for f(y) = f(t y) with starting point y 0 = T 1 x 0 are y k = T 1 x k disadvantages: requires second derivatives, solution to linear equation can be too expensive for large-scale applications

19 Shiqian Ma, MAT-258A: Numerical Optimization 19 Quasi-Newton Method It is too expensive to compute 2 f(x k ), use some B k to replace it. use a quadratic model to approximate f(x) locally at x k : m k (p) = f(x k ) + f(x k ) p p B k p here B k is an n n symmetric positive definite matrix. Note that the function value and gradient of this model at p = 0 match f(x k ) and f(x k ), respectively. by minimizing this quadratic approximation, we obtain then we update the iterate via p k = B 1 k f(xk ) x k+1 = x k + α k p k

20 Shiqian Ma, MAT-258A: Numerical Optimization 20 how to compute B k? when we are at x k+1, we want to construct m k+1 (p) = f(x k+1 ) + f(x k+1 ) p p B k+1 p we want the gradient of m k+1 to match the gradient of f at x k and x k+1 : m k+1 (0) = f(x k+1 ) and so we have, m k+1 ( α k p k ) = f(x k+1 ) α k B k+1 p k = f(x k ) to simplify the notation, define B k+1 α k p k = f(x k+1 ) f(x k ) s k = x k+1 x k = α k p k, y k = f(x k+1 ) f(x k )

21 Shiqian Ma, MAT-258A: Numerical Optimization 21 then we get B k+1 s k = y k and this is called secant equation

22 Shiqian Ma, MAT-258A: Numerical Optimization 22 To compute B k+1, we solve DFP and BFGS min B B k s.t., B = B, Bs k = y k This gives the following DFP updating formula (originally given by Davidon in 1959, and subsequently studied by Fletcher and Powell) B k+1 = (I ρ k y k s k )B k (I ρ k s k y k ) + ρ k y k y k with ρ k = 1/y k s k. The other way to compute B k+1 : denote its inverse as H k+1, solve the following problem min H H k s.t., H = H, Hy k = s k

23 Shiqian Ma, MAT-258A: Numerical Optimization 23 This gives the following BFGS updating formula (proposed by Broyden, Fletcher, Goldfarb and Shanno, independently) H k+1 = (I ρ k s k y k )H k (I ρ k y k s k ) + ρ k s k s k or, by Sherman-Morrison-Woodbury formula: B k+1 = B k B ks k s k B k s k B ks k + y ky k y k s k

24 Shiqian Ma, MAT-258A: Numerical Optimization 24 The complete BFGS method given initial point x 0 and B 0 0 repeat for k = 0, 1, 2,... until a stopping criterion is satisfied compute quasi-newton direction p k = B 1 k f(xk ) determine step size t k (via backtracking line search) update x k+1 = x k + t k p k and compute f(x k+1 ) compute B k+1

25 Shiqian Ma, MAT-258A: Numerical Optimization 25 Convergence of BFGS method global convergence if f is strongly convex, then BFGS with backtracking line search converges to the optimum for any x 0 and B 0 0 local convergence if f is strongly convex and 2 f(x) is Lipschitz continuous, then local convergence is superlinear: for sufficiently large k, where c k 0. x k+1 x 2 c k x k x 2

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23 Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Unconstrained minimization

Unconstrained minimization CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

Lecture 18: November Review on Primal-dual interior-poit methods

Lecture 18: November Review on Primal-dual interior-poit methods 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725 Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Optimization II: Unconstrained

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems International Journal of Scientific and Research Publications, Volume 3, Issue 10, October 013 1 ISSN 50-3153 Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming

More information

Maria Cameron. f(x) = 1 n

Maria Cameron. f(x) = 1 n Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725 Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent 10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

More information

5. Subgradient method

5. Subgradient method L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

Lecture 7 Unconstrained nonlinear programming

Lecture 7 Unconstrained nonlinear programming Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

Math 408A: Non-Linear Optimization

Math 408A: Non-Linear Optimization February 12 Broyden Updates Given g : R n R n solve g(x) = 0. Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x k, B k ) compute (x k+1, B x+1 ) as follows: Solve B k s k = g(x

More information

Data Mining (Mineria de Dades)

Data Mining (Mineria de Dades) Data Mining (Mineria de Dades) Lluís A. Belanche belanche@lsi.upc.edu Soft Computing Research Group Dept. de Llenguatges i Sistemes Informàtics (Software department) Universitat Politècnica de Catalunya

More information

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Jacob Rafati http://rafati.net jrafatiheravi@ucmerced.edu Ph.D. Candidate, Electrical Engineering and Computer Science University

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

Lecture 5: September 15

Lecture 5: September 15 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

ECS550NFB Introduction to Numerical Methods using Matlab Day 2 ECS550NFB Introduction to Numerical Methods using Matlab Day 2 Lukas Laffers lukas.laffers@umb.sk Department of Mathematics, University of Matej Bel June 9, 2015 Today Root-finding: find x that solves

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent

Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent CMU 10-725/36-725: Convex Optimization (Fall 2017) OUT: Sep 29 DUE: Oct 13, 5:00

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Numerical solutions of nonlinear systems of equations

Numerical solutions of nonlinear systems of equations Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points

More information

Stochastic Quasi-Newton Methods

Stochastic Quasi-Newton Methods Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Gradient Descent, Newton-like Methods Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them in class/help-session/tutorial this

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Static unconstrained optimization

Static unconstrained optimization Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Lecture V. Numerical Optimization

Lecture V. Numerical Optimization Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19 Isomorphism I We describe minimization problems: to maximize

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

Lecture 1: Supervised Learning

Lecture 1: Supervised Learning Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)

More information

14. Nonlinear equations

14. Nonlinear equations L. Vandenberghe ECE133A (Winter 2018) 14. Nonlinear equations Newton method for nonlinear equations damped Newton method for unconstrained minimization Newton method for nonlinear least squares 14-1 Set

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Line search methods with variable sample size. Nataša Krklec Jerinkić. - PhD thesis -

Line search methods with variable sample size. Nataša Krklec Jerinkić. - PhD thesis - UNIVERSITY OF NOVI SAD FACULTY OF SCIENCES DEPARTMENT OF MATHEMATICS AND INFORMATICS Nataša Krklec Jerinkić Line search methods with variable sample size - PhD thesis - Novi Sad, 2013 2. 3 Introduction

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

March 8, 2010 MATH 408 FINAL EXAM SAMPLE March 8, 200 MATH 408 FINAL EXAM SAMPLE EXAM OUTLINE The final exam for this course takes place in the regular course classroom (MEB 238) on Monday, March 2, 8:30-0:20 am. You may bring two-sided 8 page

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Lecture 5: September 12

Lecture 5: September 12 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N.

MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N. MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N. Dmitriy Leykekhman Fall 2008 Goals Learn about different methods for the solution of F (x) = 0, their advantages and disadvantages.

More information

Empirical Risk Minimization and Optimization

Empirical Risk Minimization and Optimization Statistical Machine Learning Notes 3 Empirical Risk Minimization and Optimization Instructor: Justin Domke 1 Empirical Risk Minimization Empirical Risk Minimization is a fancy sounding name for a very

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Gradient-Based Optimization

Gradient-Based Optimization Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems

More information

GRADIENT = STEEPEST DESCENT

GRADIENT = STEEPEST DESCENT GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT

More information

Lecture 14 Ellipsoid method

Lecture 14 Ellipsoid method S. Boyd EE364 Lecture 14 Ellipsoid method idea of localization methods bisection on R center of gravity algorithm ellipsoid method 14 1 Localization f : R n R convex (and for now, differentiable) problem:

More information

The Randomized Newton Method for Convex Optimization

The Randomized Newton Method for Convex Optimization The Randomized Newton Method for Convex Optimization Vaden Masrani UBC MLRG April 3rd, 2018 Introduction We have some unconstrained, twice-differentiable convex function f : R d R that we want to minimize:

More information

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν 1 Search Directions In this chapter we again focus on the unconstrained optimization problem P min f(x), x R n where f : R n R is assumed to be twice continuously differentiable, and consider the selection

More information

Lecture 6: September 12

Lecture 6: September 12 10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Multivariate Newton Minimanization

Multivariate Newton Minimanization Multivariate Newton Minimanization Optymalizacja syntezy biosurfaktantu Rhamnolipid Rhamnolipids are naturally occuring glycolipid produced commercially by the Pseudomonas aeruginosa species of bacteria.

More information

Empirical Risk Minimization and Optimization

Empirical Risk Minimization and Optimization Statistical Machine Learning Notes 3 Empirical Risk Minimization and Optimization Instructor: Justin Domke Contents 1 Empirical Risk Minimization 1 2 Ingredients of Empirical Risk Minimization 4 3 Convex

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Matrix Secant Methods

Matrix Secant Methods Equation Solving g(x) = 0 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). 3700 years ago the Babylonians used the secant method in 1D:

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

NonlinearOptimization

NonlinearOptimization 1/35 NonlinearOptimization Pavel Kordík Department of Computer Systems Faculty of Information Technology Czech Technical University in Prague Jiří Kašpar, Pavel Tvrdík, 2011 Unconstrained nonlinear optimization,

More information

Numerical Optimization

Numerical Optimization Unconstrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 01, India. NPTEL Course on Unconstrained Minimization Let f : R n R. Consider the optimization problem:

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min

1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min LLNL IM Release number: LLNL-JRNL-745068 A STRUCTURED QUASI-NEWTON ALGORITHM FOR OPTIMIZING WITH INCOMPLETE HESSIAN INFORMATION COSMIN G. PETRA, NAIYUAN CHIANG, AND MIHAI ANITESCU Abstract. We present

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method. 1 Introduction

Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method. 1 Introduction ISSN 1749-3889 (print), 1749-3897 (online) International Journal of Nonlinear Science Vol.11(2011) No.2,pp.153-158 Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method Yigui Ou, Jun Zhang

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information