Math 164: Optimization Barzilai-Borwein Method

Similar documents
A derivative-free nonmonotone line search and its application to the spectral residual method

Step-size Estimation for Unconstrained Optimization Methods

Steepest Descent. Juan C. Meza 1. Lawrence Berkeley National Laboratory Berkeley, California 94720

On efficiency of nonmonotone Armijo-type line searches

Adaptive two-point stepsize gradient algorithm

New Inexact Line Search Method for Unconstrained Optimization 1,2

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

Handling nonpositive curvature in a limited memory steepest descent method

Line Search Methods for Unconstrained Optimisation

The Steepest Descent Algorithm for Unconstrained Optimization

Spectral gradient projection method for solving nonlinear monotone equations

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Math 273a: Optimization Netwon s methods

1. Introduction. We develop an active set method for the box constrained optimization

Sparse Optimization Lecture: Dual Methods, Part I

Optimization methods

Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

5 Quasi-Newton Methods

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Basic concepts

A Novel of Step Size Selection Procedures. for Steepest Descent Method

Unconstrained minimization of smooth functions

Scientific Computing: Optimization

Step lengths in BFGS method for monotone gradients

Scaled gradient projection methods in image deblurring and denoising

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Programming, numerics and optimization

Higher-Order Methods

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search

FALL 2018 MATH 4211/6211 Optimization Homework 4

GRADIENT = STEEPEST DESCENT

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

The cyclic Barzilai Borwein method for unconstrained optimization

Journal of Computational and Applied Mathematics. Notes on the Dai Yuan Yuan modified spectral gradient method

Steepest descent method implementation on unconstrained optimization problem using C++ program

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

Math 273a: Optimization Lagrange Duality

Optimization methods

8 Numerical methods for unconstrained problems

A DIMENSION REDUCING CONIC METHOD FOR UNCONSTRAINED OPTIMIZATION

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Lecture 3: Linesearch methods (continued). Steepest descent methods

Optimization Methods. Lecture 19: Line Searches and Newton s Method

Accelerated Block-Coordinate Relaxation for Regularized Optimization

On spectral properties of steepest descent methods

A Modified Hestenes-Stiefel Conjugate Gradient Method and Its Convergence

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Selected Topics in Optimization. Some slides borrowed from

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

Chapter 8 Gradient Methods

On the steplength selection in gradient methods for unconstrained optimization

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Nonmonotonic back-tracking trust region interior point algorithm for linear constrained optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Optimization Tutorial 1. Basic Gradient Descent

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Block stochastic gradient update method

A Robust Implementation of a Sequential Quadratic Programming Algorithm with Successive Error Restoration

Convex Optimization. Problem set 2. Due Monday April 26th

Static unconstrained optimization

Energy Minimization of Point Charges on a Sphere with a Hybrid Approach

Nonlinear Optimization for Optimal Control

Open Problems in Nonlinear Conjugate Gradient Algorithms for Unconstrained Optimization

MATH 4211/6211 Optimization Basics of Optimization Problems

GRADIENT METHODS FOR LARGE-SCALE NONLINEAR OPTIMIZATION

Residual iterative schemes for largescale linear systems

Math 273a: Optimization Subgradient Methods

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

x k+1 = x k + α k p k (13.1)

Spectral Projected Gradient Methods

Newton s Method. Javier Peña Convex Optimization /36-725

4 damped (modified) Newton methods

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

Unconstrained optimization

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

Modification of the Armijo line search to satisfy the convergence properties of HS method

Unconstrained Optimization

A new nonmonotone Newton s modification for unconstrained Optimization

CPSC 540: Machine Learning

10. Unconstrained minimization

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations

A family of derivative-free conjugate gradient methods for large-scale nonlinear systems of equations

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

R-Linear Convergence of Limited Memory Steepest Descent

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Unconstrained optimization I Gradient-type methods

New hybrid conjugate gradient methods with the generalized Wolfe line search

AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Global Convergence Properties of the HS Conjugate Gradient Method

5 Overview of algorithms for unconstrained optimization

On the regularization properties of some spectral gradient methods

Gradient methods exploiting spectral properties

Nonlinear conjugate gradient methods, Unconstrained optimization, Nonlinear

Transcription:

Math 164: Optimization Barzilai-Borwein Method Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 online discussions on piazza.com

Main features of the Barzilai-Borwein (BB) method The BB method was published in a 8-page paper 1 in 1988 It is a gradient method with modified step sizes, which are motivated by Newton s method but not involves any Hessian At nearly no extra cost, the method often significantly improves the performance of a standard gradient method The method is used along with non-monotone line search as a safeguard 1 J. Barzilai and J. Borwein. Two-point step size gradient method. IMA J. Numerical Analysis 8, 141 148, 1988.

Motivation of the BB method Let g (k) = f (x (k) ) and F (k) = 2 f (x (k) ). gradient method: x (k+1) = x (k) α k g (k) choice of α k : fixed, exact line search, or fixed initial + line search pros: simple cons: no use of 2nd order information, sometimes zig-zag Newton s method: x (k+1) = x (k) (F (k) ) 1 g (k) pros: 2nd-order information, 1-step for quadratic function, fast convergence near solution cons: forming and computing (F (k) ) 1 is expensive, need modifications if F (k) 0 The BB method chooses α k so that α k g (k) approximates (F (k) ) 1 g (k) without computing F (k)

Derive the BB method Consider minimize x f (x) = 1 2 xt Ax b T x, where A 0 is symmetric. Gradient is g (k) = Ax (k) b. Hessian is A. Newton step: d (k) newton = A 1 g (k) Goal: choose α k so that α k g (k) = (α 1 k I ) 1 g (k) approximates A 1 g (k) Define: s (k 1) := x (k) x (k 1) and y (k 1) := g (k) g (k 1). Then A satisfies: As (k 1) = y (k 1). Therefore, given s (k 1) and y (k 1), how about choose α k so that (α 1 k I )s (k 1) y (k 1)

Goal: (α 1 k I )s (k 1) y (k 1). BB method: Least-squares problem: (let β = α 1 ) α 1 k = arg min β Alternative Least-squares problem: α k = arg min α α 1 k and α 2 k are called the BB step sizes. 1 2 s(k 1) β y (k 1) 2 = αk 1 = (s(k 1) ) T s (k 1) (s (k 1) ) T y (k 1) 1 2 s(k 1) y (k 1) α 2 = αk 2 = (s(k 1) ) T y (k 1) (y (k 1) ) T y (k 1)

Apply the BB method Since x (k 1) and g (k 1) and thus s (k 1) and y (k 1) are unavailable at k = 0, we apply the standard gradient descent at k = 0 and start BB at k = 1 We can use either αk 1 or αk 2 or alternate between them We can fix α k = αk 1 or α k = αk 2 for a few consecutive steps It performs very well on minimizing quadratic and many other functions However, f k and f k are not monotonic!

Steepest descent versus BB on quadratic programming Model: Gradient iteration minimize x f (x) := 1 2 xt Ax b T x. x k+1 x (k) α k (Ax (k) b). Steepest descent selects α k as arg min α f (x (k) α k (Ax (k) b)) where r (k) := b Ax (k). α k = (r k ) T r (k) (r k ) T Ar (k) BB selects α k as α 1 k = (s(k 1) ) T s (k 1) (s (k 1) ) T y (k 1)

Numerical example Set symmetric matrix A to have the condition number λmax(a) λ min (A) = 50. Stopping criterion: r (k) < 10 8 Steepest descent stops in 90 iterations BB stops in 10 iterations 2 1.5 Contour Gradient descent 90 steps Barzilai Borwein 10 steps 1 0.5 0 0.5 1 1.5 2 5 4 3 2 1 0 1 2 3 4 5

Properties of Barzilai-Borwein For quadratic functions, it has R-linear convergence 2 For 2D quadratic function, it has Q-superlinear convergence 3 No convergence guarantee for smooth convex problems. On these problems, we pair up BB with non-monotone line search. 10 2 10 0 10 2 10 4 f fmin 10 6 10 8 10 10 10 12 0 100 200 300 400 500 600 iteration number BB on Laplace2: min 1 2 xt Ax b T x + h2 4 ijk u4 ijk. 2 Dai and Liao [2002] 3 Barzilai and Borwein [1988], Dai [2013]

Nonmonotone line search Some growth in the function value is permitted Sometimes improve the likelihood of finding a global optimum Improve convergence speed when a monotone scheme is forced to creep along the bottom of a narrow curved valley Early nonmonotone line search method 4 developed for Newton s methods f (x (k) + αd (k) ) max f (x k j ) + c 1α fk T d (k) 0 j m k However, it may still kill R-linear convergence. Example: x R, minimize f (x) = 1 x 2 x2, x 0 0, d (k) = x (k). { α k = 1 2 k, k = i 2 for some integer i, 2, otherwise, converges R-linear but fails to satisfy the condition for k large. 4 Grippo, Lampariello, and Lucidi [1986]

Zhang-Hager nonmonotone line search 5 1. initialize 0 < c 1 < c 2 < 1, C 0 f (x 0 ), Q 0 1, η < 1, k 0 2. while not converged do 3a. compute α k satisfying the modified Wolfe conditions OR 3b. find α k by backtracking, to satisfy the modified Armijo condition: sufficient decrease: f (x (k) + α k d (k) ) C k + c 1α k f T k d (k) 4. x k+1 x (k) + α k d (k) 5. Q k+1 ηq k + 1, C k+1 (ηq k C k + f (x k+1 ))/Q k+1. Comments: If η = 1, then C k = 1 k+1 k j=0 fj. Since η < 1, C k is a weighted sum of all past f j, more weights on recent f j. 5 Zhang and Hager [2004]

Convergence (advanced topic) The results below are left to the reader as an exercise. If f C 1 and bounded below, f T k d (k) < 0, then f k C k 1 k+1 (k) j=0 fj there exists α k satisfying the modified Wolfe or Armijo conditions In addition, if f is Lipschitz with constant L, then α k > C f T k d(k) d (k) backing factor for some constant depending on c 1, c 2, L and the Furthermore, if for all sufficiently large k, we have uniform bounds fk T d (k) c 3 f k 2 and d (k) c 4 f k then lim k f k = 0 Once again, pairing with non-monotone linear search, Barzilai-Borwein gradient methods work every well on general unconstrained differentiable problems.

References: Yu-Hong Dai and Li-Zhi Liao. R-linear convergence of the Barzilai and Borwein gradient method. IMA Journal of Numerical Analysis, 22(1):1 10, 2002. J. Barzilai and J.M. Borwein. Two-point step size gradient methods. IMA Journal of Numerical Analysis, 8(1):141 148, 1988. Yu-Hong Dai. A new analysis on the barzilai-borwein gradient method. Journal of the Operations Research Society of China, pages 1 12, 2013. Luigi Grippo, Francesco Lampariello, and Stephano Lucidi. A nonmonotone line search technique for Newton s method. SIAM Journal on Numerical Analysis, 23(4): 707 716, 1986. Hongchao Zhang and William W Hager. A nonmonotone line search technique and its application to unconstrained optimization. SIAM Journal on Optimization, 14(4): 1043 1056, 2004.