Derivative-Free Trust-Region methods

Derivative-Free Trust-Region methods MTH6418 S. Le Digabel, École Polytechnique de Montréal Fall 2015 (v4) MTH6418: DFTR 1/32

Plan Quadratic models Model Quality Derivative-Free Trust-Region Framework References MTH6418: DFTR 2/32

Quadratic models Model Quality Derivative-Free Trust-Region Framework References MTH6418: DFTR 3/32

Quadratic model of f Natural basis of the space of polynomials of degree 2 in R n. It has q + 1 = (n + 1)(n + 2)/2 elements. φ(x) = (φ 0 (x) φ 1 (x)... φ q (x)) T = ( ) x 1 x 1 x 2... x 2 1 x 2 2 x n 2 2... 2 T n 2 x 1 x 2 x 1 x 3... x n 1 x n. Model of f: m f defined by α R q+1. m f (x) = α T φ(x). MTH6418: DFTR 4/32

Interpolation set Points at which f is known and which are used to construct the model m f. p + 1 elements of R n : Y = {y 0, y 1,..., y p }. These points are also called data points. f(y ) = ( f(y 0 ) f(y 1 )... f(y p ) ) T R p+1. The geometry of Y is important and will be studied later. How to select the data points from the cache points? One solution: Take the points around the current iterate. MTH6418: DFTR 5/32

Construction of the model Find α R q+1 such that y Y (f(y) m f (y)) 2 is minimal. Idea: Solve M(φ, Y )α = f(y ) with φ 0 (y 0 ) φ 1 (y 0 )... φ q (y 0 ) φ 0 (y 1 ) φ 1 (y 1 )... φ q (y 1 ) M(φ, Y ) =.... R(p+1) (q+1). φ 0 (y p ) φ 1 (y p )... φ q (y p ) Cost in O(p 3 ). 3 cases: p = q: Determined. p > q: Overdetermined. p < q: Underdetermined. MTH6418: DFTR 6/32

Number of necessary interpolation points n q + 1 = (n+1)(n+2) 2 2 6 3 10 4 15 5 21 10 66 20 231 50 1326 Typically in the DFO context, n 20, but: Very limited number of evaluations. Selection of the data points near the current iterate. The underdetermined case p < q is the most common. MTH6418: DFTR 7/32

Overdetermined & determined cases: 1/2 More data points than necessary: p > q. Use regression to solve the system in the least square sense, i.e. solve: min α R q+1 M(φ, Y )α f(y ) 2. If M(φ, Y ) has full column rank, analytic and unique solution given by α = M(φ, Y ) + f(y ) with M(φ, Y ) + = [ M(φ, Y ) T M(φ, Y ) ] 1 M(φ, Y ) T the pseudoinverse of M(φ, Y ). Works for the determined case p = q (exact interpolation). MTH6418: DFTR 8/32

Overdetermined & determined cases: 2/2 M(φ, Y ) can be decomposed using the Singular Value Decomposition (SVD): M(φ, Y ) = UΣV T with: U R (p+1) (p+1), U T U = I p+1. Σ R (p+1) (q+1), diagonal: Singular values 0 (sv). V R (q+1) (q+1), V V T = I q+1. M(φ, Y ) has not full rank if the smallest sv is 0. Condition number of M(φ, Y ): Largest sv / smallest sv. M(φ, Y ) + = V Σ + U T where Σ + is the pseudoinverse of Σ obtained by replacing every non-zero sv by its reciprocal and transposing the resulting matrix. Cost of the SVD: O((p + 1)(q + 1) 2 ). MTH6418: DFTR 9/32

Underdetermined case: Infinite number of solutions Minimum Frobenius Norm (MFN) interpolation: Choose a solution that minimizes the Frobenius norm of the Hessian of the model (the curvature). [ ] αl α = with α L R n+1, α Q R n Q, n Q = n(n + 1)/2. α Q [ M(φQ, Y )M(φ F (φ, Y ) = Q, Y ) T ] M(φ L, Y ) M(φ L, Y ) T 0 R (p+n+2) (p+n+2). [ ] [ ] µ f(y ) F (φ, Y ) = α α L 0 L R n+1 and µ R p+1. Use decomposition to solve the system, and then: α Q = M(φ Q, Y ) T µ R n Q. MTH6418: DFTR 10/32

Lagrange polynomials Basis of Lagrange polynomials: p + 1 polynomials l j for j = 0, 1,..., p, with: { l j (y i 1 if i = j ) = 0 if i j. Model: m f (x) = p f(y i )l i (x). i=0 Cost of constructing a model is in O(p 3 ), but cost of updating the model by one point is in O(p 2 ). MTH6418: DFTR 11/32

Lagrange polynomials: Example f(x, y) = x + y + 2x 2 + 3y 3 {[ ] [ ] [ 0 1 0 Y =,, 0 0 1 ], [ 2 0 ] [ 1, 1 ] [ 0, 2 l 0 (x, y) = 1 3 2 x 3 2 y + 1 2 x2 + 1 2 y2 + xy, l 1 (x, y) = 2x x 2 xy, l 2 (x, y) = 2y y 2 xy, l 3 (x, y) = 1 2 x + 1 2 x2, l 4 (x, y) = xy, l 5 (x, y) = 1 2 y + 1 2 y2. m f (x, y) = 0 l 0 (x, y) + 3l 1 (x, y) + 4l 2 (x, y) + 10l 3 (x, y) + 7l 4 (x, y) + 26l 5 (x, y) = 2x 2 + 9y 2 + x 5y. ]} MTH6418: DFTR 12/32

Quadratic models Model Quality Derivative-Free Trust-Region Framework References MTH6418: DFTR 13/32

FL and FQ models A model m f is called Fully Linear (FL) { on B(y; ), for f C 1 and f Lipschitz f(x) mf (x) κ continuous, if f 2 f(x) m(x) κ g. Fully Quadratic (FQ) on B(y; ), for f C 2 and f 2 f(x) m f (x) κ f 3 Lipschitz continuous, if f(x) m(x) κ g 2 2 f(x) 2 m(x) κ h. For all x B(y; ) and some constants κ f, κ g, κ h. MTH6418: DFTR 14/32

FL and FQ class of models A set of models M = {m : R n R, m C 2 } is called a FL (FQ) class of models if: There exists a FL (FQ) model in M. There exists a model-improvement algorithm (MIA) that, in a finite number of steps, can: Determine if a given model is FL (FQ) on B(x; ). Or find a model that is FL (FQ) on B(x; ). MTH6418: DFTR 15/32

Well-poisedness A set Y is said poised for polynomial interpolation or regression if M(φ, Y ) is nonsingular (p = q), or if M(φ, Y ) has full rank. Well-poisedness. Good geometry of Y = well-poised set Y. Condition number of M(φ, Y ) may be a good indicator only with some bases φ and if some specific scaling is performed. Lagrange polynomials can be good indicators. Quantify the well-poisedness with the Λ-poisedness. MTH6418: DFTR 16/32

Λ-poisedness Let Λ > 0 and B R n. Y is Λ-poised in B if Λ Λ l = max max l i(x) where B(Y ) is the smallest ball 0 i p x B(Y ) containing Y. Or for all x B, there exists λ R p+1 such that p φ(x) = λ i φ(y i ) and max λ i Λ. 0 i p i=0 Or replacing any point in Y by any x B can increase the volume of the set φ(y ) at most by a factor Λ, with φ(y ) = {φ(y 0 ), φ(y 1 ),..., φ(y p )} and its volume defined by det(m(φ,y )) (p+1)!. MTH6418: DFTR 17/32

Quadratic models Model Quality Derivative-Free Trust-Region Framework References MTH6418: DFTR 18/32

Introduction We consider the unconstrained problem min x R n f(x). Bounds and linear constraints can be easily treated. Need more elaborate strategies to handle general constraints. See Lesson #9 on the constraints. We present a first order algorithm that ensures global convergence to first order critical points using a FL class of models. This is the general DFTR framework from [Conn et al., 2009]. We suppose f C 1 and f Lipchitz continuous. But derivatives are not available. MTH6418: DFTR 19/32

Notations for the DFTR framework x k : Current iterate. Model of f: m f (x) = f(x k ) + g T k (x x k) + 1/2(x x k ) T H k (x xk) with g k, H k : Gradient and Hessian of the model at iteration k. k : Trust-region radius. For candidate t: r k (t) = f(x k) f(t) m f (x k ) m f (t). m f m f {t} means: update the model with t. ε c > 0. 0 η 0 η 1 < 1, η 1 0. 0 < γ dec < 1 < γ inc. µ > 0. MTH6418: DFTR 20/32

First order algorithm: 1/3 Step 0 [Initialization] Choose FL class of models. Model-improvement algorithm (MIA). x 0, max, 0 ]0; max ], initialize model m f, k 0. Step 1 [Criticality test]: If g k ε c Call MIA to certify m f FL on B(x k ; k ). If ((mf not FL on B(x k ; k )) or ( k > µ g k )) [model not good enough or trust-region too large]: Construct new model. Check stopping criterion; Stop or goto [Step 1]. MTH6418: DFTR 21/32

First order algorithm: 2/3 Step 2 [Subproblem Optimization] Find t argmin x B(x k ; k ) m f (x). Evaluate candidate t in B(xk ; k ). Compute rk (t) = f(x k) f(t) m f (x k ) m f (t). Step 3 [Acceptance of candidate] If rk (t) > η 1 or (r k (t) > η 0 and m f is FL on B(x k ; k ) ): x k+1 t, m f m f {t}. Otherwise: x k+1 x k. Step 4 [Model Improvement] If r k (t) < η 1 and m f not FL Call MIA to certify mf FL on B(x k ; k ). MTH6418: DFTR 22/32

First order algorithm: 3/3 Step 5 [Trust-region radius update] k+1 [ k ; min{γ inc k, max }] if r k (t) η 1, {γ dec k } if r k (t) < η 1 and m f is FL, { k } if r k (t) < η 1 and m f is not FL. k k + 1, goto [Step 1]. MTH6418: DFTR 23/32

First order algorithm: Comments Successful iteration if r k (t) η 1. Then k+1 k. Acceptable iteration if η 1 > r k (t) η 0 and m f is FL. Then k+1 < k. Model-improving iteration if r k (t) < η 1 and m f not FL. Then model must be improved and x k, k are not updated. Unsuccessful iteration if r k (t) < η 0 and m f is FL. Then k+1 < k and x k is not updated. Do not reduce the trust-region radius when the model is not good. MTH6418: DFTR 24/32

Second order algorithm Global convergence to second order critical points using a FQ class of models. f C 2 and f Lipschitz continuous. Second order stationarity of the model: σk m = max{ g k, λ min (H k )} where λ min (H k ) denotes the smallest eigenvalue of H k. Criticality test based on σ m k instead of g k. MTH6418: DFTR 25/32

Definition of the subproblem Trust-region subproblem. We want to solve candidate t. min m f (x) in order to obtain a x B(x k ; k ) The trust-region constraint can be expressed with different norms. We do not need an exact resolution. MTH6418: DFTR 26/32

Optimization of the subproblem Some methods to solve the subproblem: Gradient projection. Moré Sorensen. Generalized Lanczos trust-region. Sequential Subspace. Gould Robinson Thorne. Rendl Wolkowicz. MTH6418: DFTR 27/32

Quadratic models Model Quality Derivative-Free Trust-Region Framework References MTH6418: DFTR 28/32

DFTR solvers BOBYQA. COBYLA. CONDOR. DFO. LINCOA. NEWUOA. ORBIT. SNOBFIT. Wedge. MTH6418: DFTR 29/32

References I Conn, A., Scheinberg, K., and Vicente, L. (2009). Introduction to Derivative-Free Optimization. MOS-SIAM Series on Optimization. SIAM, Philadelphia. Golub, G. and Van Loan, C. (1996). Matrix Computations, chapter 2.5.3 The Singular Value Decomposition, pages 70 71. The John Hopkins University Press, Baltimore and London, third edition. (SVD). Gould, N., Lucidi, S., and Toint, P. (1999). Solving the trust-region subproblem using the Lanczos method. SIAM Journal on Optimization, 9(2):504 525. MTH6418: DFTR 30/32

References II Gould, N., Robinson, D., and Thorne, H. (2010). On solving trust-region and other regularised subproblems in optimization. Mathematical Programming Computation, 2(1):21 57. Moré, J. and Sorensen, D. (1983). Computing a trust region step. SIAM Journal on Scientific Computing, 4(3):553 572. Nocedal, J. and Wright, S. (2006). Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer, Berlin, second edition. (Gradient projection). MTH6418: DFTR 31/32

References III Rendl, F. and Wolkowicz, H. (1997). A semidefinite framework for trust region subproblems with applications to large scale minimization. Mathematical Programming, 77(1):273 299. MTH6418: DFTR 32/32