Derivative-Free Trust-Region methods

Similar documents
Interpolation-Based Trust-Region Methods for DFO

A recursive model-based trust-region method for derivative-free bound-constrained optimization.

1101 Kitchawan Road, Yorktown Heights, New York 10598, USA

GEOMETRY OF INTERPOLATION SETS IN DERIVATIVE FREE OPTIMIZATION

On Lagrange multipliers of trust region subproblems

Bilevel Derivative-Free Optimization and its Application to Robust Optimization

Global convergence of trust-region algorithms for constrained minimization without derivatives

On Lagrange multipliers of trust-region subproblems

A DERIVATIVE-FREE ALGORITHM FOR THE LEAST-SQUARE MINIMIZATION

Higher-Order Methods

A trust-region derivative-free algorithm for constrained optimization


A trust-funnel method for nonlinear optimization problems with general nonlinear constraints and its application to derivative-free optimization

SURVEY OF TRUST-REGION DERIVATIVE FREE OPTIMIZATION METHODS

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS

A Derivative-Free Gauss-Newton Method

ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

BOOSTERS: A Derivative-Free Algorithm Based on Radial Basis Functions

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS

MS&E 318 (CME 338) Large-Scale Numerical Optimization

COMP 558 lecture 18 Nov. 15, 2010

DELFT UNIVERSITY OF TECHNOLOGY

1. Introduction. In this paper, we design a class of derivative-free optimization algorithms for the following least-squares problem:

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

A conjugate gradient-based algorithm for large-scale quadratic programming problem with one quadratic constraint

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

Global and derivative-free optimization Lectures 1-4

Solving Separable Nonlinear Equations Using LU Factorization

This manuscript is for review purposes only.

Algorithms for Constrained Optimization

Total least squares. Gérard MEURANT. October, 2008

LSTRS 1.2: MATLAB Software for Large-Scale Trust-Regions Subproblems and Regularization

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

A decoupled first/second-order steps technique for nonconvex nonlinear unconstrained optimization with improved complexity bounds

An interior-point trust-region polynomial algorithm for convex programming

A New Trust Region Algorithm Using Radial Basis Function Models

Newton s Method. Javier Peña Convex Optimization /36-725

Chapter 3 Transformations

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Sobolev seminorm of quadratic functions with applications to derivative-free optimization

1. Introduction Let the least value of an objective function F (x), x2r n, be required, where F (x) can be calculated for any vector of variables x2r

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Mesures de criticalité d'ordres 1 et 2 en recherche directe

5.6 Penalty method and augmented Lagrangian method

Trust Regions. Charles J. Geyer. March 27, 2013

1. Introduction. In this paper we address unconstrained local minimization

Properties of Matrices and Operations on Matrices

Preliminary Examination, Numerical Analysis, August 2016

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

5.5 Quadratic programming

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

The Trust Region Subproblem with Non-Intersecting Linear Constraints

Introduction to Nonlinear Optimization Paul J. Atzberger

Lecture 6. Numerical methods. Approximation of functions

Conditional Gradient (Frank-Wolfe) Method

Global Convergence of Radial Basis Function Trust Region Derivative-Free Algorithms

Line Search Methods for Unconstrained Optimisation

1 Singular Value Decomposition and Principal Component

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

10-725/36-725: Convex Optimization Prerequisite Topics

Constrained optimization: direct methods (cont.)

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

Maria Cameron. f(x) = 1 n

Keywords: Nonlinear least-squares problems, regularized models, error bound condition, local convergence.

THE restructuring of the power industry has lead to

DS-GA 1002 Lecture notes 10 November 23, Linear models

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

arxiv: v1 [math.oc] 1 Jul 2016

Chap 3. Linear Algebra

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

Applying Bayesian Estimation to Noisy Simulation Optimization

10. Unconstrained minimization

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

MATH 350: Introduction to Computational Mathematics

Introduction. Chapter One

A Trust-region-based Sequential Quadratic Programming Algorithm

EIGENVALUES AND SINGULAR VALUE DECOMPOSITION

Linear Least-Squares Data Fitting

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Globally Solving the Trust Region Subproblem Using Simple First-Order Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods

Key words. first order methods, trust region subproblem, optimality conditions, global optimum

A PRIMAL-DUAL TRUST REGION ALGORITHM FOR NONLINEAR OPTIMIZATION

Algorithm 873: LSTRS: MATLAB Software for Large-Scale Trust-Region Subproblems and Regularization

There are six more problems on the next two pages

GLOBALLY CONVERGENT GAUSS-NEWTON METHODS

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

Second Order Optimization Algorithms I

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

Basic Concepts in Linear Algebra

Iterative Methods for Smooth Objective Functions

On the use of quadratic models in unconstrained minimization without derivatives 1. M.J.D. Powell

Transcription:

Derivative-Free Trust-Region methods MTH6418 S. Le Digabel, École Polytechnique de Montréal Fall 2015 (v4) MTH6418: DFTR 1/32

Plan Quadratic models Model Quality Derivative-Free Trust-Region Framework References MTH6418: DFTR 2/32

Quadratic models Model Quality Derivative-Free Trust-Region Framework References MTH6418: DFTR 3/32

Quadratic model of f Natural basis of the space of polynomials of degree 2 in R n. It has q + 1 = (n + 1)(n + 2)/2 elements. φ(x) = (φ 0 (x) φ 1 (x)... φ q (x)) T = ( ) x 1 x 1 x 2... x 2 1 x 2 2 x n 2 2... 2 T n 2 x 1 x 2 x 1 x 3... x n 1 x n. Model of f: m f defined by α R q+1. m f (x) = α T φ(x). MTH6418: DFTR 4/32

Interpolation set Points at which f is known and which are used to construct the model m f. p + 1 elements of R n : Y = {y 0, y 1,..., y p }. These points are also called data points. f(y ) = ( f(y 0 ) f(y 1 )... f(y p ) ) T R p+1. The geometry of Y is important and will be studied later. How to select the data points from the cache points? One solution: Take the points around the current iterate. MTH6418: DFTR 5/32

Construction of the model Find α R q+1 such that y Y (f(y) m f (y)) 2 is minimal. Idea: Solve M(φ, Y )α = f(y ) with φ 0 (y 0 ) φ 1 (y 0 )... φ q (y 0 ) φ 0 (y 1 ) φ 1 (y 1 )... φ q (y 1 ) M(φ, Y ) =.... R(p+1) (q+1). φ 0 (y p ) φ 1 (y p )... φ q (y p ) Cost in O(p 3 ). 3 cases: p = q: Determined. p > q: Overdetermined. p < q: Underdetermined. MTH6418: DFTR 6/32

Number of necessary interpolation points n q + 1 = (n+1)(n+2) 2 2 6 3 10 4 15 5 21 10 66 20 231 50 1326 Typically in the DFO context, n 20, but: Very limited number of evaluations. Selection of the data points near the current iterate. The underdetermined case p < q is the most common. MTH6418: DFTR 7/32

Overdetermined & determined cases: 1/2 More data points than necessary: p > q. Use regression to solve the system in the least square sense, i.e. solve: min α R q+1 M(φ, Y )α f(y ) 2. If M(φ, Y ) has full column rank, analytic and unique solution given by α = M(φ, Y ) + f(y ) with M(φ, Y ) + = [ M(φ, Y ) T M(φ, Y ) ] 1 M(φ, Y ) T the pseudoinverse of M(φ, Y ). Works for the determined case p = q (exact interpolation). MTH6418: DFTR 8/32

Overdetermined & determined cases: 2/2 M(φ, Y ) can be decomposed using the Singular Value Decomposition (SVD): M(φ, Y ) = UΣV T with: U R (p+1) (p+1), U T U = I p+1. Σ R (p+1) (q+1), diagonal: Singular values 0 (sv). V R (q+1) (q+1), V V T = I q+1. M(φ, Y ) has not full rank if the smallest sv is 0. Condition number of M(φ, Y ): Largest sv / smallest sv. M(φ, Y ) + = V Σ + U T where Σ + is the pseudoinverse of Σ obtained by replacing every non-zero sv by its reciprocal and transposing the resulting matrix. Cost of the SVD: O((p + 1)(q + 1) 2 ). MTH6418: DFTR 9/32

Underdetermined case: Infinite number of solutions Minimum Frobenius Norm (MFN) interpolation: Choose a solution that minimizes the Frobenius norm of the Hessian of the model (the curvature). [ ] αl α = with α L R n+1, α Q R n Q, n Q = n(n + 1)/2. α Q [ M(φQ, Y )M(φ F (φ, Y ) = Q, Y ) T ] M(φ L, Y ) M(φ L, Y ) T 0 R (p+n+2) (p+n+2). [ ] [ ] µ f(y ) F (φ, Y ) = α α L 0 L R n+1 and µ R p+1. Use decomposition to solve the system, and then: α Q = M(φ Q, Y ) T µ R n Q. MTH6418: DFTR 10/32

Lagrange polynomials Basis of Lagrange polynomials: p + 1 polynomials l j for j = 0, 1,..., p, with: { l j (y i 1 if i = j ) = 0 if i j. Model: m f (x) = p f(y i )l i (x). i=0 Cost of constructing a model is in O(p 3 ), but cost of updating the model by one point is in O(p 2 ). MTH6418: DFTR 11/32

Lagrange polynomials: Example f(x, y) = x + y + 2x 2 + 3y 3 {[ ] [ ] [ 0 1 0 Y =,, 0 0 1 ], [ 2 0 ] [ 1, 1 ] [ 0, 2 l 0 (x, y) = 1 3 2 x 3 2 y + 1 2 x2 + 1 2 y2 + xy, l 1 (x, y) = 2x x 2 xy, l 2 (x, y) = 2y y 2 xy, l 3 (x, y) = 1 2 x + 1 2 x2, l 4 (x, y) = xy, l 5 (x, y) = 1 2 y + 1 2 y2. m f (x, y) = 0 l 0 (x, y) + 3l 1 (x, y) + 4l 2 (x, y) + 10l 3 (x, y) + 7l 4 (x, y) + 26l 5 (x, y) = 2x 2 + 9y 2 + x 5y. ]} MTH6418: DFTR 12/32

Quadratic models Model Quality Derivative-Free Trust-Region Framework References MTH6418: DFTR 13/32

FL and FQ models A model m f is called Fully Linear (FL) { on B(y; ), for f C 1 and f Lipschitz f(x) mf (x) κ continuous, if f 2 f(x) m(x) κ g. Fully Quadratic (FQ) on B(y; ), for f C 2 and f 2 f(x) m f (x) κ f 3 Lipschitz continuous, if f(x) m(x) κ g 2 2 f(x) 2 m(x) κ h. For all x B(y; ) and some constants κ f, κ g, κ h. MTH6418: DFTR 14/32

FL and FQ class of models A set of models M = {m : R n R, m C 2 } is called a FL (FQ) class of models if: There exists a FL (FQ) model in M. There exists a model-improvement algorithm (MIA) that, in a finite number of steps, can: Determine if a given model is FL (FQ) on B(x; ). Or find a model that is FL (FQ) on B(x; ). MTH6418: DFTR 15/32

Well-poisedness A set Y is said poised for polynomial interpolation or regression if M(φ, Y ) is nonsingular (p = q), or if M(φ, Y ) has full rank. Well-poisedness. Good geometry of Y = well-poised set Y. Condition number of M(φ, Y ) may be a good indicator only with some bases φ and if some specific scaling is performed. Lagrange polynomials can be good indicators. Quantify the well-poisedness with the Λ-poisedness. MTH6418: DFTR 16/32

Λ-poisedness Let Λ > 0 and B R n. Y is Λ-poised in B if Λ Λ l = max max l i(x) where B(Y ) is the smallest ball 0 i p x B(Y ) containing Y. Or for all x B, there exists λ R p+1 such that p φ(x) = λ i φ(y i ) and max λ i Λ. 0 i p i=0 Or replacing any point in Y by any x B can increase the volume of the set φ(y ) at most by a factor Λ, with φ(y ) = {φ(y 0 ), φ(y 1 ),..., φ(y p )} and its volume defined by det(m(φ,y )) (p+1)!. MTH6418: DFTR 17/32

Quadratic models Model Quality Derivative-Free Trust-Region Framework References MTH6418: DFTR 18/32

Introduction We consider the unconstrained problem min x R n f(x). Bounds and linear constraints can be easily treated. Need more elaborate strategies to handle general constraints. See Lesson #9 on the constraints. We present a first order algorithm that ensures global convergence to first order critical points using a FL class of models. This is the general DFTR framework from [Conn et al., 2009]. We suppose f C 1 and f Lipchitz continuous. But derivatives are not available. MTH6418: DFTR 19/32

Notations for the DFTR framework x k : Current iterate. Model of f: m f (x) = f(x k ) + g T k (x x k) + 1/2(x x k ) T H k (x xk) with g k, H k : Gradient and Hessian of the model at iteration k. k : Trust-region radius. For candidate t: r k (t) = f(x k) f(t) m f (x k ) m f (t). m f m f {t} means: update the model with t. ε c > 0. 0 η 0 η 1 < 1, η 1 0. 0 < γ dec < 1 < γ inc. µ > 0. MTH6418: DFTR 20/32

First order algorithm: 1/3 Step 0 [Initialization] Choose FL class of models. Model-improvement algorithm (MIA). x 0, max, 0 ]0; max ], initialize model m f, k 0. Step 1 [Criticality test]: If g k ε c Call MIA to certify m f FL on B(x k ; k ). If ((mf not FL on B(x k ; k )) or ( k > µ g k )) [model not good enough or trust-region too large]: Construct new model. Check stopping criterion; Stop or goto [Step 1]. MTH6418: DFTR 21/32

First order algorithm: 2/3 Step 2 [Subproblem Optimization] Find t argmin x B(x k ; k ) m f (x). Evaluate candidate t in B(xk ; k ). Compute rk (t) = f(x k) f(t) m f (x k ) m f (t). Step 3 [Acceptance of candidate] If rk (t) > η 1 or (r k (t) > η 0 and m f is FL on B(x k ; k ) ): x k+1 t, m f m f {t}. Otherwise: x k+1 x k. Step 4 [Model Improvement] If r k (t) < η 1 and m f not FL Call MIA to certify mf FL on B(x k ; k ). MTH6418: DFTR 22/32

First order algorithm: 3/3 Step 5 [Trust-region radius update] k+1 [ k ; min{γ inc k, max }] if r k (t) η 1, {γ dec k } if r k (t) < η 1 and m f is FL, { k } if r k (t) < η 1 and m f is not FL. k k + 1, goto [Step 1]. MTH6418: DFTR 23/32

First order algorithm: Comments Successful iteration if r k (t) η 1. Then k+1 k. Acceptable iteration if η 1 > r k (t) η 0 and m f is FL. Then k+1 < k. Model-improving iteration if r k (t) < η 1 and m f not FL. Then model must be improved and x k, k are not updated. Unsuccessful iteration if r k (t) < η 0 and m f is FL. Then k+1 < k and x k is not updated. Do not reduce the trust-region radius when the model is not good. MTH6418: DFTR 24/32

Second order algorithm Global convergence to second order critical points using a FQ class of models. f C 2 and f Lipschitz continuous. Second order stationarity of the model: σk m = max{ g k, λ min (H k )} where λ min (H k ) denotes the smallest eigenvalue of H k. Criticality test based on σ m k instead of g k. MTH6418: DFTR 25/32

Definition of the subproblem Trust-region subproblem. We want to solve candidate t. min m f (x) in order to obtain a x B(x k ; k ) The trust-region constraint can be expressed with different norms. We do not need an exact resolution. MTH6418: DFTR 26/32

Optimization of the subproblem Some methods to solve the subproblem: Gradient projection. Moré Sorensen. Generalized Lanczos trust-region. Sequential Subspace. Gould Robinson Thorne. Rendl Wolkowicz. MTH6418: DFTR 27/32

Quadratic models Model Quality Derivative-Free Trust-Region Framework References MTH6418: DFTR 28/32

DFTR solvers BOBYQA. COBYLA. CONDOR. DFO. LINCOA. NEWUOA. ORBIT. SNOBFIT. Wedge. MTH6418: DFTR 29/32

References I Conn, A., Scheinberg, K., and Vicente, L. (2009). Introduction to Derivative-Free Optimization. MOS-SIAM Series on Optimization. SIAM, Philadelphia. Golub, G. and Van Loan, C. (1996). Matrix Computations, chapter 2.5.3 The Singular Value Decomposition, pages 70 71. The John Hopkins University Press, Baltimore and London, third edition. (SVD). Gould, N., Lucidi, S., and Toint, P. (1999). Solving the trust-region subproblem using the Lanczos method. SIAM Journal on Optimization, 9(2):504 525. MTH6418: DFTR 30/32

References II Gould, N., Robinson, D., and Thorne, H. (2010). On solving trust-region and other regularised subproblems in optimization. Mathematical Programming Computation, 2(1):21 57. Moré, J. and Sorensen, D. (1983). Computing a trust region step. SIAM Journal on Scientific Computing, 4(3):553 572. Nocedal, J. and Wright, S. (2006). Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer, Berlin, second edition. (Gradient projection). MTH6418: DFTR 31/32

References III Rendl, F. and Wolkowicz, H. (1997). A semidefinite framework for trust region subproblems with applications to large scale minimization. Mathematical Programming, 77(1):273 299. MTH6418: DFTR 32/32