Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)

Similar documents
Line Search Methods for Unconstrained Optimisation

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

An introduction to algorithms for nonlinear optimization 1,2

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

8 Numerical methods for unconstrained problems

An introduction to algorithms for continuous optimization

Lecture 3: Linesearch methods (continued). Steepest descent methods

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

5 Quasi-Newton Methods

Chapter 4. Unconstrained optimization

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Numerical Optimization

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimization Tutorial 1. Basic Gradient Descent

Optimization Methods. Lecture 19: Line Searches and Newton s Method

MATH 4211/6211 Optimization Basics of Optimization Problems

Nonlinear Optimization for Optimal Control

Unconstrained optimization

Unconstrained optimization I Gradient-type methods

Optimality conditions for unconstrained optimization. Outline

Step-size Estimation for Unconstrained Optimization Methods

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

GRADIENT = STEEPEST DESCENT

17 Solution of Nonlinear Systems

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

10. Unconstrained minimization

Math 409/509 (Spring 2011)

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Line Search Algorithms

Introduction to Nonlinear Optimization Paul J. Atzberger

Nonlinear Optimization: What s important?

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A Line search Multigrid Method for Large-Scale Nonlinear Optimization

New Inexact Line Search Method for Unconstrained Optimization 1,2

TMA4180 Solutions to recommended exercises in Chapter 3 of N&W

An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

Unconstrained minimization

1-D Optimization. Lab 16. Overview of Line Search Algorithms. Derivative versus Derivative-Free Methods

Chapter 6: Derivative-Based. optimization 1

Convex Optimization. Problem set 2. Due Monday April 26th

Worst Case Complexity of Direct Search

1 Numerical optimization

Quasi-Newton Methods

Numerical Optimization

Worst Case Complexity of Direct Search

1 Numerical optimization

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

arxiv: v1 [math.oc] 1 Jul 2016

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

Unconstrained Optimization

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

The Steepest Descent Algorithm for Unconstrained Optimization

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Introduction to unconstrained optimization - direct search methods

Steepest Descent. Juan C. Meza 1. Lawrence Berkeley National Laboratory Berkeley, California 94720

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

An introduction to complexity analysis for nonconvex optimization

Review of Classical Optimization

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1

Trajectory-based optimization

PRACTICE PROBLEMS FOR MIDTERM I

Numerical solutions of nonlinear systems of equations

Gradient-Based Optimization

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Algorithms for constrained local optimization

Multidisciplinary System Design Optimization (MSDO)

CoE 3SK3 Computer Aided Engineering Tutorial: Unconstrained Optimization

1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

IE 5531: Engineering Optimization I

5 Overview of algorithms for unconstrained optimization

Fixed point iteration Numerical Analysis Math 465/565

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Unconstrained minimization: assumptions

Unconstrained minimization of smooth functions

Numerical Optimization

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Notes on Some Methods for Solving Linear Systems

Line Search Methods. Shefali Kulkarni-Thaker

On the convergence properties of the projected gradient method for convex optimization

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Higher-Order Methods

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

1. Introduction. We develop an active set method for the box constrained optimization

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Transcription:

Part 2: Linesearch methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective function f : IR n IR assume that f C 1 (sometimes C 2 ) and Lipschitz often in practice this assumption violated, but not necessary

ITERATIVE METHODS in practice very rare to be able to provide explicit minimizer iterative method: given starting guess x 0, generate sequence {x k }, k = 1, 2,... AIM: ensure that (a subsequence) has some favourable limiting properties: satisfies first-order necessary conditions satisfies second-order necessary conditions Notation: f k = f(x k ), g k = g(x k ), H k = H(x k ). LINESEARCH METHODS calculate a search direction p k from x k ensure that this direction is a descent direction, i.e., g T k p k < 0 if g k 0 so that, for small steps along p k, the objective function will be reduced calculate a suitable steplength α k > 0 so that f(x k + α k p k ) < f k computation of α k is the linesearch may itself be an iteration generic linesearch method: x k+1 = x k + α k p k

STEPS MIGHT BE TOO LONG f(x) 3 2.5 (x 1,f(x 1 ) 2 1.5 1 (x 3,f(x 3 ) (x 5,f(x 5 ) (x 2,f(x 2 ) (x 4,f(x 4 ) 0.5 0 2 1.5 1 0.5 0 0.5 1 1.5 2 x The objective function f(x) = x 2 and the iterates x k+1 = x k + α k p k generated by the descent directions p k = ( 1) k+1 and steps α k = 2 + 3/2 k+1 from x 0 = 2 STEPS MIGHT BE TOO SHORT f(x) 3 2.5 (x 1,f(x 1 ) 2 1.5 1 (x 5,f(x 5 ) (x 2,f(x 2 ) (x 3,f(x 3 ) (x 4,f(x 4 ) 0.5 0 2 1.5 1 0.5 0 0.5 1 1.5 2 x The objective function f(x) = x 2 and the iterates x k+1 = x k + α k p k generated by the descent directions p k = 1 and steps α k = 1/2 k+1 from x 0 = 2

PRACTICAL LINESEARCH METHODS in early days, pick α k to minimize f(x k + αp k ) exact linesearch univariate minimization rather expensive and certainly not cost effective modern methods: inexact linesearch ensure steps are neither too long nor too short try to pick useful initial stepsize for fast convergence best methods are either backtracking- Armijo or Armijo-Goldstein based BACKTRACKING LINESEARCH Procedure to find the stepsize α k : Given α init > 0 (e.g., α init = 1) let α (0) = α init and l = 0 Until f(x k + α (l) p k ) < f k set α (l+1) = τα (l), where τ (0, 1) (e.g., τ = 1 2) and increase l by 1 Set α k = α (l) this prevents the step from getting too small... but does not prevent too large steps relative to decrease in f need to tighten requirement f(x k + α (l) p k ) < f k

ARMIJO CONDITION In order to prevent large steps relative to decrease in f, instead require f(x k + α k p k ) f(x k ) + α k βg T k p k for some β (0, 1) (e.g., β = 0.1 or even β = 0.0001) 0.14 0.12 0.1 f(x k )+αβg T k p k 0.08 0.06 0.04 0.02 0 f(x k +αp k ) 0.02 0.04 f(x k )+αg T k p k 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 α BACKTRACKING-ARMIJO LINESEARCH Procedure to find the stepsize α k : Given α init > 0 (e.g., α init = 1) let α (0) = α init and l = 0 Until f(x k + α (l) p k ) f(x k ) + α (l) βg T k p k set α (l+1) = τα (l), where τ (0, 1) (e.g., τ = 1 2) and increase l by 1 Set α k = α (l)

SATISFYING THE ARMIJO CONDITION Theorem 2.1. Suppose that f C 1, that g(x) is Lipschitz continuous with Lipschitz constant γ(x), that β (0, 1) and that p is a descent direction at x. Then the Armijo condition f(x + αp) f(x) + αβg(x) T p is satisfied for all α [0, α max(x) ], where α max = 2(β 1)g(x)T p γ(x) p 2 2 PROOF OF THEOREM 2.1 Taylor s theorem (Theorem 1.1) + = α 2(β 1)g(x)T p, γ(x) p 2 f(x + αp) f(x) + αg(x) T p + 1 2γ(x)α 2 p 2 f(x) + αg(x) T p + α(β 1)g(x) T p = f(x) + αβg(x) T p

THE ARMIJO LINESEARCH TERMINATES Corollary 2.2. Suppose that f C 1, that g(x) is Lipschitz continuous with Lipschitz constant γ k at x k, that β (0, 1) and that p k is a descent direction at x k. Then the stepsize generated by the backtracking-armijo linesearch terminates with α k min α init, 2τ(β 1)gT k p k γ k p k 2 PROOF OF COROLLARY 2.2 Theorem 2.1 = linesearch will terminate as soon as α (l) α max. 2 cases to consider: 1. May be that α init satisfies the Armijo condition = α k = α init. 2. Otherwise, must be a last linesearch iteration (the l-th) for which α (l) > α max = α k α (l+1) = τα (l) > τα max Combining these 2 cases gives required result.

GENERIC LINESEARCH METHOD Given an initial guess x 0, let k = 0 Until convergence: Find a descent direction p k at x k Compute a stepsize α k using a backtracking-armijo linesearch along p k Set x k+1 = x k + α k p k, and increase k by 1 GLOBAL CONVERGENCE THEOREM Theorem 2.3. Suppose that f C 1 and that g is Lipschitz continuous on IR n. Then, for the iterates generated by the Generic Linesearch Method, either or or g l = 0 for some l 0 lim f k = lim min ( p T k g k, p T ) k g k / p k 2 = 0.

PROOF OF THEOREM 2.3 Suppose that g k 0 for all k and that lim f k >. Armijo = f k+1 f k α k βp T k g k for all k = summing over first j iterations f j+1 f 0 j k=0 α k βp T k g k. LHS bounded below by assumption = RHS bounded below. Sum composed of -ve terms = Let lim α k p T k g k = 0 def K 1 = k α init > 2τ(β 1)gT k p k γ p k 2 where γ is the assumed uniform Lipschitz constant. & K 2 def = {1, 2,...} \ K 1 For k K 1, α k 2τ(β 1)gT k p k γ p k 2 = α k p T 2τ(β 1) g T 2 k p k k g k < 0 γ p k = p T k g lim k = 0. (1) k K 1 p k 2 For k K 2, α k α init = lim k K 2 pt k g k = 0. (2) Combining (1) and (2) gives the required result.

EXAMPLES Steepest-descent direction. p k = g k lim min ( p T k g k, p T k g k / p k 2 ) = 0 = lim g k = 0 Newton-like direction: p k = Bk 1 g k lim min ( p T k g k, p T ) k g k / p k 2 = 0 = provided B k is uniformly positive definite lim g k = 0 Conjugate-gradient direction: p k = any conjugate-gradient approximation to minimizer of f k + p T g k + 1 2p T B k p f(x k + p) lim min ( p T k g k, p T ) k g k / p k 2 = 0 = provided B k is uniformly positive definite lim g k = 0 STEEPEST DESCENT EXAMPLE 1.5 1 0.5 0 0.5 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Contours for the objective function f(x, y) = 10(y x 2 ) 2 + (x 1) 2, and the iterates generated by the Generic Linesearch steepest-descent method

METHOD OF STEEPEST DESCENT (cont.) archetypical globally convergent method many other methods resort to steepest descent in bad cases not scale invariant convergence is usually very (very!) slow (linear) numerically often not convergent at all NEWTON METHOD EXAMPLE 1.5 1 0.5 0 0.5 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Contours for the objective function f(x, y) = 10(y x 2 ) 2 + (x 1) 2, and the iterates generated by the Generic Linesearch Newton method

MORE GENERAL DESCENT METHODS (cont.) may be viewed as scaled steepest descent convergence is often faster than steepest descent can be made scale invariant for suitable B k