OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Similar documents
OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods

Nonlinear Optimization: What s important?

Unconstrained optimization

MATH 4211/6211 Optimization Basics of Optimization Problems

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Higher-Order Methods

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Optimization II: Unconstrained Multivariable

8 Numerical methods for unconstrained problems

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Chapter 4. Unconstrained optimization

Optimization II: Unconstrained Multivariable

Algorithms for Constrained Optimization

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Programming, numerics and optimization

5 Quasi-Newton Methods

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Line Search Methods for Unconstrained Optimisation

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Notes on Numerical Optimization

Computational Optimization. Augmented Lagrangian NW 17.3

Optimization Part 1 P. Agius L2.1, Spring 2008

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Introduction to Logistic Regression and Support Vector Machine

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

Iterative Methods for Smooth Objective Functions

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

IE 5531: Engineering Optimization I

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Convex Optimization CMU-10725

2. Quasi-Newton methods

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

EECS260 Optimization Lecture notes

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Review of Classical Optimization

Computational Optimization. Convexity and Unconstrained Optimization 1/29/08 and 2/1(revised)

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

OPER 627: Nonlinear Optimization Lecture 2: Math Background and Optimality Conditions

Math 273a: Optimization Basic concepts

Survey of NLP Algorithms. L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA

Stochastic Optimization Algorithms Beyond SG

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Unconstrained optimization I Gradient-type methods

Lecture V. Numerical Optimization

CPSC 540: Machine Learning

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Unconstrained Multivariate Optimization

Mathematical optimization

17 Solution of Nonlinear Systems

Nonlinear Optimization for Optimal Control

Numerical optimization

An Inexact Newton Method for Nonlinear Constrained Optimization

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

Computational Optimization. Constrained Optimization Part 2

Trust-region methods for rectangular systems of nonlinear equations

Selected Topics in Optimization. Some slides borrowed from

Quasi-Newton Methods

Optimization Tutorial 1. Basic Gradient Descent

Optimization Methods for Machine Learning

1 Numerical optimization

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Conditional Gradient (Frank-Wolfe) Method

Maria Cameron. f(x) = 1 n

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Algorithms for constrained local optimization

Optimization Methods

On efficiency of nonmonotone Armijo-type line searches

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

Nonlinear Programming

An Iterative Descent Method

Convex Optimization. Problem set 2. Due Monday April 26th

Static unconstrained optimization

Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Written Examination

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Gradient-Based Optimization

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

IE 5531: Engineering Optimization I

LINEAR AND NONLINEAR PROGRAMMING

Sub-Sampled Newton Methods

Machine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Lecture 14: October 17

A derivative-free nonmonotone line search and its application to the spectral residual method

Optimization Concepts and Applications in Engineering

Newton s Method. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Transcription:

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization Oct 16, 2013 1 / 16

Exam begins now... Try to find Professor Song s technical mistakes (not including typos) in his terrible slides: If you find one that nobody else could find, you get one extra point Maximum extra points: 5 Submit the exam paper with these mistakes (I will allocate some space for you to fill out) (Lecture 14) Nonlinear Optimization Oct 16, 2013 2 / 16

An overall summary 1 Theory: optimality conditions in various cases In general, FONC, SONC, SOSC, apply for functions defined in an open set Optimality conditions with convexity 2 Algorithms: line search and trust region All we learn is Newton method Algorithms in this class only guarantees convergence to a stationary point from any initial point. (Lecture 14) Nonlinear Optimization Oct 16, 2013 3 / 16

How do we use optimality conditions? 1 Use FONC to rule out non-stationary solutions FONC is used in all algorithms that have global convergence 2 Use SONC to rule out saddle points 3 Use SOSC to validate quadratic convergence of Newton method SOSC is used to show fast convergence to a local minimizer (Lecture 14) Nonlinear Optimization Oct 16, 2013 4 / 16

Convexity 1 First-order characterization of convex functions 2 Second-order characterization of convex functions defined in an open set 3 Free lunch, free dinner, ultimate gift 4 Strongly convex: 2 f is PD, why important? (Lecture 14) Nonlinear Optimization Oct 16, 2013 5 / 16

Optimization algorithms Motivation: optimize for the next step using information from the current step f (x k + p k ) m(p k ) := f (x k ) + f (x k ) p k + 1 2 p k B kp k 1 Line search: if B k is nice, we can find a descent direction p k easily, and the minimizer along that direction is our next iterate 2 Trust region: m(p k ) only approximates f (x k + p k ) well locally, we will look for the next iterate based on our confidence level on how m(p k ) approximate f (x k + p k ), and adjust our confidence level adaptively (Lecture 14) Nonlinear Optimization Oct 16, 2013 6 / 16

Line search 1 Wolfe conditions: Sufficient descent: φ(α) φ(0) + c 1 φ (0)α Sufficient curvature: φ (α) c 2 φ (0) What are their purposes? 2 Fundamental result for line search: Just assume the search direction p k is a descent direction, and use Wolfe condition for line search k=0 cos2 θ k f (x k ) 2 <, where cos θ k = f (x k ) p k f (x k ) p k How to use this result to prove global convergence for steepest descent? Newton? (Lecture 14) Nonlinear Optimization Oct 16, 2013 7 / 16

Line search is all about Newton Model function min m(p k ) := f (x k ) + f (x k ) p k + 1 2 p k B kp k Approximated Hessian B k : This is an unconstrained QP, any stationary point is optimal if and only if B k is PD 1 Choice 1: B k = I, correspond to steepest descent p k = f (x k ) Only first-order information is used Linear local convergence Convergence could be very slow if the condition number of Hessian is large 2 Choice 2: B k = 2 f (x k ), correspond to pure Newton p k = [ 2 f (x k )] 1 f (x k ) No line search is needed, stepsize is always α k = 1 Quadratic local convergence to x if x satisfies SOSC Fragile: may run into trouble if Hessian is not PD (Lecture 14) Nonlinear Optimization Oct 16, 2013 8 / 16

Line search is all about Newton (contd) Model function min m(p k ) := f (x k ) + f (x k ) p k + 1 2 p k B kp k 1 Choice 3: Modified Newton, B k = 2 f (x k ) + E k If 2 f (x k ) is PD, E k = 0 Otherwise, E k is big enough to ensure B k is PD Loses quadratic convergence, because we need line search 2 Choice 4: Quasi-Newton, construct/update a PD matrix B k as we go Updating formula ensures the secant equation: B k+1 s k = y k, so using B k to approximate Hessian makes sense B k is PD by Wolfe/curvature condition and updating formula BFGS, approximates inverse Hessian H k Superlinear local convergence, but no global convergence in general (Lecture 14) Nonlinear Optimization Oct 16, 2013 9 / 16

Trust region All about solving the trust region subproblem (TRP) Solving TRP: min p m(p) := f (x k ) + f (x k ) p + 1 2 p B k p s.t. p 1 Direct method: needs matrix factorization, iterative root-finding procedure 2 Cauchy points 3 Improved Cauchy points, dogleg methods (Lecture 14) Nonlinear Optimization Oct 16, 2013 10 / 16

Cauchy points and dogleg Cauchy point: the best solution along the steepest descent direction within the trust region A constrained step size problem min f k + τ k g τ k ps k + 1 k 2 τ k 2 (ps k ) B k pk s s.t. τ k 1 where p s k = k g k g k Improvement: Dogleg (only when B k is PD) Use a two line segments to approximate the full trajectory from the global minimizer p B = B 1 g to the minimizer along steepest descent direction p U Optimization over the two line segments is easy because of monotone structure (Lecture 14) Nonlinear Optimization Oct 16, 2013 11 / 16

Line search vs. Trust region 1 Line search first finds a direction, then chooses the step length 2 Line search methods have an easy problem to solve in each iteration 3 Line search methods may not allow true Hessian in the model function 1 Trust region first finds a length, then chooses the direction 2 Trust region methods have a hard TRP in each iteration 3 Trust region allows true Hessian in the model function (Lecture 14) Nonlinear Optimization Oct 16, 2013 12 / 16

Important concepts Condition number 1 Condition number and convergence Steepest descent Newton/quasi-Newton Conjugate gradient (preconditioned CG) 2 Condition number and numerical stability Least squares: solving normal equations J Jx = J y Wolfe condition 1 Global convergence for inexact line search 2 Guarantee for PD quasi-newton Hessian matrices (Lecture 14) Nonlinear Optimization Oct 16, 2013 13 / 16

Optimization for large-scale problems When problems get bigger, we have to compromise 1 Quasi-Newton: hard to compute Hessian when large-scale, use first-order information to mimic the behavior of Hessian 2 L-BFGS: use a limited-memory list of vectors s k, y k to approximate quasi-newton matrix 3 Inexact Newton: solving Newton direction via conjugate gradient CG is hunkydory CG enables inexact solutions, which is sufficient for superlinear convergence if error bound η k 0 (Lecture 14) Nonlinear Optimization Oct 16, 2013 14 / 16

Choice of algorithms for unconstrained nonlinear optimization If first-order information is not available, you need to take another course! 1 Second-order information is not available Steepest descent (if you are lazy) Quasi-Newton: n 100 Large-scale: L-BFGS, nonlinear CG, inexact Quasi-Newton, etc 2 Second-order information is available Newton, if you know your problem is strongly convex, or you know you are very close to optimal Trust-region methods Large-scale: Newton-CG, CG-trust In practice, choose an implementation/software that matches the best with your application! (Lecture 14) Nonlinear Optimization Oct 16, 2013 15 / 16

Good luck! (Lecture 14) Nonlinear Optimization Oct 16, 2013 16 / 16