A New Trust Region Algorithm Using Radial Basis Function Models

Similar documents
Interpolation-Based Trust-Region Methods for DFO

Nonlinear Optimization: What s important?

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Programming, numerics and optimization

5 Handling Constraints

8 Numerical methods for unconstrained problems

Neural Networks Lecture 4: Radial Bases Function Networks

Global Convergence of Radial Basis Function Trust Region Derivative-Free Algorithms

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

May 9, 2014 MATH 408 MIDTERM EXAM OUTLINE. Sample Questions

Algorithms for constrained local optimization

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

The Conjugate Gradient Method

A recursive model-based trust-region method for derivative-free bound-constrained optimization.

Roots and Coefficients Polynomials Preliminary Maths Extension 1

Plan of Class 4. Radial Basis Functions with moving centers. Projection Pursuit Regression and ridge. Principal Component Analysis: basic ideas

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Least Squares Approximation

14. Nonlinear equations

Interpolation. 1. Judd, K. Numerical Methods in Economics, Cambridge: MIT Press. Chapter

CHAPTER 2: QUADRATIC PROGRAMMING

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

GEOMETRY OF INTERPOLATION SETS IN DERIVATIVE FREE OPTIMIZATION

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Numerical Optimization. Review: Unconstrained Optimization

Chapter 3 Numerical Methods

Algorithms for Nonsmooth Optimization

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

Math 273a: Optimization Netwon s methods

Quadratic Programming

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

Higher-Order Methods

13. Nonlinear least squares

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Outline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations

4TE3/6TE3. Algorithms for. Continuous Optimization

MATH 590: Meshfree Methods

Multidisciplinary System Design Optimization (MSDO)

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Slide05 Haykin Chapter 5: Radial-Basis Function Networks

Line Search Methods for Unconstrained Optimisation

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations

Neural Network Training

Numerical Optimization

Math 273a: Optimization Basic concepts

MATH 4211/6211 Optimization Basics of Optimization Problems

The Conjugate Gradient Method

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Journal of Computational and Applied Mathematics

Chapter 11. Taylor Series. Josef Leydold Mathematical Methods WS 2018/19 11 Taylor Series 1 / 27

Scientific Computing: Optimization

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

Lecture 4 Eigenvalue problems

Constrained optimization: direct methods (cont.)

1. Introduction. In this paper we address unconstrained local minimization

Math (P)refresher Lecture 8: Unconstrained Optimization

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

17 Solution of Nonlinear Systems

Computational Methods. Least Squares Approximation/Optimization

Unconstrained minimization: assumptions

Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling

Sequential Convex Programming

Arc Search Algorithms

Chapter 4: Interpolation and Approximation. October 28, 2005

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Applying Bayesian Estimation to Noisy Simulation Optimization

A DERIVATIVE-FREE ALGORITHM FOR THE LEAST-SQUARE MINIMIZATION

Scientific Computing: An Introductory Survey

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Mathematical optimization

Continuous methods for numerical linear algebra problems

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

Constrained Optimization

Inverses. Stephen Boyd. EE103 Stanford University. October 28, 2017

Preconditioned conjugate gradient algorithms with column scaling

7. Symmetric Matrices and Quadratic Forms

1 Computing with constraints

10 Numerical methods for constrained problems

R-Linear Convergence of Limited Memory Steepest Descent

A scaling perspective on accuracy and convergence in RBF approximations

Derivative-Free Trust-Region methods

Nonlinear Programming

complex dot product x, y =

Bilevel Derivative-Free Optimization and its Application to Robust Optimization

Chapter 2. Optimization. Gradients, convexity, and ALS

you expect to encounter difficulties when trying to solve A x = b? 4. A composite quadrature rule has error associated with it in the following form

Computational Linear Algebra

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods

Chapter 2: Preliminaries and elements of convex analysis

Radial Basis Functions I

Mesh adaptive direct search with second directional derivative-based Hessian update

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)


Transcription:

A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010

Outline 1 Introduction 2 Background Taylor series approximations 3 Radial basis function models Radial basis functions: overview Limiting interpolants 4 The trust region framework The trust region framework: overview Updating the model function Solving the subproblem via d.c. decompositions 5 Numerical results

Introduction Problem definition We are considering a constrained nonlinear optimization problem: min x R n f : Rn R, l i x i u i, i = 1,..., n, Ax b. We also assume that the objective function is nonconvex and not necessarily differentiable. is expensive to evaluate. Such problems frequently appear in image registration. data clustering.

Taylor Series Approximations Most gradient-based methods are based on the quadratic Taylor series approximation m(x) = f (x 0 ) + f (x 0 ) T (x x 0 ) + 1 2 (x x 0) T H(x 0 )(x x 0 ). This can expressed in a more generic form, that is m(x) = c + b T (x x 0 ) + 1 2 (x x 0) T A(x x 0 ), where c R, b R n and A R n n. Problem Can the model parameters c, b and A be determined without evaluating objective function derivatives?

Determining the Model Parameters Interpolation-based approach Determine the quadratic model parameters c, b and A from interpolation equations m(x i ) = f (x i ), i = 1,..., X, where X = {x 1,..., x m } is the set of interpolation points. A model defined by the above equations can be uniquely determined, if X = (n+1)(n+2) 2. requires no derivatives of the objective function. is not restricted to a small neighbourhood. However... The number of interpolation points is O(n 2 ). A quadratic model can only produce local approximations.

A Novel Approach: Radial Basis Function Models Definition A typical radial basis function model is of the form X m(x) = λ i φ( x x i ) + p(x), i=1 where λ i are weighting coefficients and p is a low-order polynomial. Such a model function is more flexible than a quadratic model: The minimum number of interpolation points is n + 2. Can use an arbitrary number of interpolation points. Ideal for approximating functions with multiple minima. A fundamental property (assuming uniformly distributed points) lim m n(x) = f (x), n X = n

Radial Basis Functions: Overview The choice of the radial basis function φ is crucial for the accuracy and numerical stability of the approximation. Commonly used radial basis functions: φ(r) = r linear φ(r) = r 3 cubic φ(r) = r 2 log r thin plate φ(r) = (γr 2 + 1) 3 2 multiquadric φ(r) = e γr 2 Gaussian r 0. Important applications of radial basis functions include: solving partial differential equations. neural networks. interpolation of spatial data.

An Illustrative Example (1) Cubic RBF Interpolation with 30 randomly chosen interpolation points, Rastrigin function: 1.0 Function 4 1.0 RBF model 42 36.0 36 31.5 27.0 30 24 1.0 1.0 1.0 22.5 18.0 13.5 9.0 4.5 1.0 1.0 1.0 18 12 6 0 6

An Illustrative Example (2) Function Objective function RBF model Model 40 35 30 25 20 15 10 5 40 30 20 10 0

Limiting Functions of Flat RBF Models (1) Examples of RBF models with adjustable shape parameter: φ(r, γ) = (γr 2 + 1) 3 2 φ(r, γ) = e γr 2 The limit γ 0 (Fornberg et al. 2004) multiquadric Gaussian When X = (n+1)(n+2) 2 and the set X is poised for quadratic interpolation, the limit γ 0 yields a quadratic polynomial, i.e. there exist A R n n, b R n and c R such that X lim γ 0 i=1 Implication λ i φ( x x i, γ) + p(x) = 1 2 xt Ax + b T x + c. RBF models yield accurate local approximations by letting γ 0 near a minimum.

Limiting Functions of Flat RBF Models (2) 0.6 0.4 0.2 0.2 0.4 Function 4 36.0 31.5 27.0 22.5 18.0 13.5 9.0 0.6 0.4 0.2 0.2 0.4 Multiquadric RBF model (γ=5) 81 72 63 54 45 36 27 18 0.6 4.5 0.6 9 0.6 0.4 0.2 0.2 0.4 0.6 0.6 0.4 0.2 0.2 0.4 0.6 0 Multiquadric RBF model (γ=5) 180 Quadratic model 180 0.6 135 0.6 135 0.4 90 0.4 90 0.2 45 0.2 45 0 0 0.2 45 0.2 45 0.4 0.6 0.6 0.4 0.2 0.2 0.4 0.6 90 135 180 0.4 0.6 0.6 0.4 0.2 0.2 0.4 0.6 90 135 180

Determining the RBF Model Parameters (1) We are particularly interested in multiquadric RBF models X m(x) = λ i (γ x x i 2 + 1) 3 2 + p(x x0 ), i=1 where the linear polynomial tail p(x) = b T x + c guarantees a unique interpolant (Powell 1992). provides an estimate for the function gradient. The interpolation equations uniquely determining the model parameters λ are m(x i ) = f (x i ), i = 1,..., X X λ i p j (x i x 0 ) = 0, j = 1,..., n + 1, i=1 where the set {p 1,..., p n+1 } spans a linear polynomial space.

Determining the RBF Model Parameters (2) The interpolation equations in matrix form are [ ] [ ] [ ] Φ Π λ F Π T =, 0 c 0 where and λ = λ 1. λ X, c = c 1. c n+1, F = f (x 1 ). f (x X ) Φ ij = φ( x i x j ), Π ij = p j (x i x 0 ),. A sufficient condition for a unique solution is that a subset Y Y, Y = n + 1, where Y = {x 1 x 0,..., x X x 0 }, is linearly independent. The solution to these equations can be updated in O(n 2 ) operations with Cholesky and QR factorizations (Powell 1996).

The Trust Region Framework Mathematical formulation At each iteration k, solve the trust region subproblem x = arg min s {m k (x) x B k }, B k = {x F x x k < k }, where F is the feasible set and x k = arg min{f (x) x X}. At each iteration step, obtain the index of the replaced point from i = arg max x i x k, i=1,..., X and set x i = x. Also adjust the trust region radius k : If the step x yields a sufficiently smaller objective function value, set k+1 > k. Otherwise, set k+1 < k.

Updating the Model Under Geometric Constraints Notation: y i = x i x k infeasible region The Wedge Condition (Marazzi, Nodedal 2002) S = span({y 1,..., y n+1 } \ {y }). Compute vector ˆn that is orthogonal to S. The feasible region containing sufficiently linearly independent points is defined by F = {x B k (x x k ) T ˆn > γ x }. These constraints ensure that the set {y 1,..., y n+1 } remains poised for linear interpolation. The Gram-Schmidt construction of {y 1,..., y n+1 } can be updated in O(n 2 ) operations.

The Special Structure of RBF Models m(x) = λ i φ( x x i ) + λ i φ( x x i ) λ i >0 λ i <0 convex concave + p(x x 0 ), convex Motivation RBF models are linear combinations of convex and concave functions. Hence, it is natural to express the model function in the form where g and h are convex. Implications m(x) = g(x) h(x), It is possible to develop efficient d.c. (diff-convex) algorithms for minimizing the RBF model function.

Diff-convex Decomposition of an RBF Model Regularization (Hoai An, Vaz and Vicente 2009) g(x) = ρ 2 x x 0 2 + p(x), X h(x) = ρ 2 x x 0 2 λ i φ( x x i ) i=1 With this decomposition, solving the d.c. subproblem x k+1 = arg min x F {g(x) (h(x k) + h(x k ) T (x x k ))}, is equivalent to solving x k+1 = arg min x (x 0 + x k m(x k) ). x F ρ

How to Determine the ρ-parameter? The adaptive d.c. algorithm At each iteration, set { ρ = γ1 ρ, f (x k+1 ) f (x k ), reject x k+1 ρ = γ 2 ρ, f (x k+1 ) < f (x k ), where γ 1 > 1 and 0 < γ 2 < 1. The convergence rate is inversely proportional to ρ. The convexity of h within the trust region B is guaranteed, if ρ ρ = max{max x B {eig( 2 m(x))}, 0}. Problem Derive an upper bound for the minimal ρ that ensures convexity.

Estimating the Eigenvalues of the Model Hessian The Hessian matrix of an RBF model is of the form X 2 m(x) = λ i [α( r i )I n + β( r i ) r iri T r i 2 ], where r i = x x i. i=1 An estimate for the greatest positive eigenvalue From the above expression, we have X e max (x) λ i α(ri max ) + i=1 X i=1,λ i >0 λ i β(r max i ), where e max (x) = max{eig( 2 m(x)) x B(x 0, )}

Interval Analysis of RBF Models Example: estimating an upper bound of the RBF model in a sphere xi Case 2b Case 2a Case 1 Given the index j of the origin point, we define λ i > 0, λ i < 0, r max i r min i = = x i x j + { xi x j, x i / B(x j, ) 0, x i B(x j, ) Case 1 Case 2a Case 2b

The Effect of Shape Parameter on Convergence Rates Convergence rates of algorithms with quadratic, multiquadric rbf and cubic rbf model functions were compared. 10 0 QUAD RBF MQ RBF C 10 1 10 2 xk x 10 3 10 4 10 5 10 6 0 10 20 30 40 50 60 70 80 number of iterations Adaptive multiquadric RBF exhibits a rapid convergence rate. Cubic RBF without shape parameter exhibits a very slow local convergence rate.

Thank you! Questions?