OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods

Similar documents
OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Higher-Order Methods

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Trust Regions. Charles J. Geyer. March 27, 2013

Programming, numerics and optimization

OPER 627: Nonlinear Optimization Lecture 2: Math Background and Optimality Conditions

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Newton s Method. Javier Peña Convex Optimization /36-725

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Algorithms for Constrained Optimization

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods

Unconstrained Optimization

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

5 Quasi-Newton Methods

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

EECS260 Optimization Lecture notes

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Sub-Sampled Newton Methods

Notes on Numerical Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Lecture 14: October 17

5 Handling Constraints

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

1 Computing with constraints

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

On Lagrange multipliers of trust region subproblems

A New Penalty-SQP Method

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal

Trust-region methods for rectangular systems of nonlinear equations

Unconstrained optimization

2. Quasi-Newton methods

CPSC 540: Machine Learning

A NOVEL FILLED FUNCTION METHOD FOR GLOBAL OPTIMIZATION. 1. Introduction Consider the following unconstrained programming problem:

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Math 273a: Optimization Netwon s methods

Mini-Course 1: SGD Escapes Saddle Points

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Static unconstrained optimization

Conditional Gradient (Frank-Wolfe) Method

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

MS&E 318 (CME 338) Large-Scale Numerical Optimization

minimize x subject to (x 2)(x 4) u,

Iterative Methods for Smooth Objective Functions

Arc Search Algorithms

This manuscript is for review purposes only.

Reduced-Hessian Methods for Constrained Optimization

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

Line Search Methods for Unconstrained Optimisation

LINEAR AND NONLINEAR PROGRAMMING

A New Trust Region Algorithm Using Radial Basis Function Models

CPSC 540: Machine Learning

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization

On efficiency of nonmonotone Armijo-type line searches

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Università di Firenze, via C. Lombroso 6/17, Firenze, Italia,

On Lagrange multipliers of trust-region subproblems

Nonlinear Optimization Methods for Machine Learning

Computational Optimization. Augmented Lagrangian NW 17.3

Numerical Optimization

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Nonlinear Optimization: What s important?

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods. Jorge Nocedal

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Nonlinear Programming

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Stochastic Optimization Algorithms Beyond SG

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with Global Convergence Guarantees

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Overfitting, Bias / Variance Analysis

Outline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations

A trust region algorithm with a worst-case iteration complexity of O(ɛ 3/2 ) for nonconvex optimization

Geometry optimization

2 Nonlinear least squares algorithms

Neural Network Training

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

Numerical Methods in Informatics

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

arxiv: v1 [math.oc] 1 Jul 2016

Convex Optimization Algorithms for Machine Learning in 10 Slides

Second Order Optimization Algorithms I

Handout on Newton s Method for Systems

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

Transcription:

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods Department of Statistical Sciences and Operations Research Virginia Commonwealth University Sept 25, 2013 (Lecture 9) Nonlinear Optimization Sept 25, 2013 1 / 12

Quiz True or False: From any arbitrary initial point, BFGS algorithm guarantees convergence to a stationary point For a general nonlinear smooth function, BFGS algorithm has superlinear local convergence under some mild conditions Quasi-Newton methods are good at sparse large-scale optimization problems What is the idea of LBFGS method? (Lecture 9) Nonlinear Optimization Sept 25, 2013 2 / 12

Today s Outline Introduction to trust region methods Trust region algorithm Trust region subproblems (Lecture 9) Nonlinear Optimization Sept 25, 2013 3 / 12

An alternative perspective Model function to approximate f (x k + p): m k (p) = f (x k ) + f (x k ) p + 1 2 p B k p This model is only locally reliable: p cannot be too large Let s go local 1 Step 1: Choose a trust region (a compact convex neighborhood around the current point) 2 Step 2: Find a minimizer of some model function in the trust region Q: Why do we need the trust region? (Lecture 9) Nonlinear Optimization Sept 25, 2013 4 / 12

Trust region subproblem 1 Model function: m k (p) = f (x k ) + g k p + 1 2 p B k p 2 Trust region radius:, p (2-norm by default) B k could be the true Hessian B k could be a quasi-newton approximate matrix Trust region subproblem (TRP): min p m(p) s.t. p Make p k as large as possible (Why?), as long as m(p) agrees closely with f (x + p) How close? A-Red k : actual reduction in f, f k := f (x k ) f (x k + p k ) P-Red k : predicted reduction in f, m k := f (x k ) m k (p k ) Ratio: ρ k = f k m k, if close to 1, then we have a good agreement (Lecture 9) Nonlinear Optimization Sept 25, 2013 5 / 12

Trust region: simple idea If ρ is far from 1: If ρ is close to 1, and p hit the boundary: p = δ: (Lecture 9) Nonlinear Optimization Sept 25, 2013 6 / 12

Trust region algorithm Algorithm 1 Trust region ( > 0, 0 (0, ), η [0, 1 4 )) 1: Given x k, k > 0, calculate g k = f (x k ) and B k 2: (Key) Solve TRP k for p k 3: Evaluate f (x k + p k ), and calculate f k, ρ k 4: if ρ k < 1 4 (not a good model, reduce the trust region radius) then 5: k+1 = k 4 6: else if p k = k (hit the boundary) then 7: k+1 = min(2 k, ) 8: else 9: k+1 = k 10: end if 11: if ρ k > η (so there are some reduction in the model) then 12: x k+1 = x k + p k 13: else 14: x k+1 = x k (null step) 15: end if (Lecture 9) Nonlinear Optimization Sept 25, 2013 7 / 12

Trust region and line search Example: min x (x 2 1) 2 1 Line search at 0: cannot get over saddle point 2 Trust region: Escape the saddle point! min 2d 2 s.t. d d (Lecture 9) Nonlinear Optimization Sept 25, 2013 8 / 12

Solution of TRP Trust region subproblem (TRP): min p m(p) s.t. p Theorem (More and Sorensen 1983) Vector p is a global minimizer of TRP if and only if p and λ 0 that: 1 (B + λi)p = g 2 λ( p ) = 0 3 B + λi is PSD (Lecture 9) Nonlinear Optimization Sept 25, 2013 9 / 12

Two cases 1 λ = 0, this happens only if B is PSD, and p = B 1 g satisfies p 2 λ > 0, then we look for a number λ that is big enough so that B + λi is PSD, and p = (B + λi) 1 g satisfies p = Let p(λ) = (B + λi) 1 g, we just need to solve p(λ) =, which is a single-variable root finding problem! (Lecture 9) Nonlinear Optimization Sept 25, 2013 10 / 12

How to solve? No explicit form p(λ)! Let s do some derivation! (Lecture 9) Nonlinear Optimization Sept 25, 2013 11 / 12

Algorithm for solving TRP Algorithm 2 Trust region subproblem 1: Given λ 0, > 0 2: for l = 0, 1, do 3: Factorize B + λ l I = R R (safeguard here to ensure Cholesky factorization) 4: Solve R Rp l = g 5: Solve R q l = p l 6: Set λ l+1 = λ l + p l 2 7: end for p l q l 2 Good news: typically just 2 or 3 steps of Newton iteration! (Lecture 9) Nonlinear Optimization Sept 25, 2013 12 / 12