A New Trust Region Algorithm Using Radial Basis Function Models

A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010

Outline 1 Introduction 2 Background Taylor series approximations 3 Radial basis function models Radial basis functions: overview Limiting interpolants 4 The trust region framework The trust region framework: overview Updating the model function Solving the subproblem via d.c. decompositions 5 Numerical results

Introduction Problem definition We are considering a constrained nonlinear optimization problem: min x R n f : Rn R, l i x i u i, i = 1,..., n, Ax b. We also assume that the objective function is nonconvex and not necessarily differentiable. is expensive to evaluate. Such problems frequently appear in image registration. data clustering.

Taylor Series Approximations Most gradient-based methods are based on the quadratic Taylor series approximation m(x) = f (x 0 ) + f (x 0 ) T (x x 0 ) + 1 2 (x x 0) T H(x 0 )(x x 0 ). This can expressed in a more generic form, that is m(x) = c + b T (x x 0 ) + 1 2 (x x 0) T A(x x 0 ), where c R, b R n and A R n n. Problem Can the model parameters c, b and A be determined without evaluating objective function derivatives?

Determining the Model Parameters Interpolation-based approach Determine the quadratic model parameters c, b and A from interpolation equations m(x i ) = f (x i ), i = 1,..., X, where X = {x 1,..., x m } is the set of interpolation points. A model defined by the above equations can be uniquely determined, if X = (n+1)(n+2) 2. requires no derivatives of the objective function. is not restricted to a small neighbourhood. However... The number of interpolation points is O(n 2 ). A quadratic model can only produce local approximations.

A Novel Approach: Radial Basis Function Models Definition A typical radial basis function model is of the form X m(x) = λ i φ( x x i ) + p(x), i=1 where λ i are weighting coefficients and p is a low-order polynomial. Such a model function is more flexible than a quadratic model: The minimum number of interpolation points is n + 2. Can use an arbitrary number of interpolation points. Ideal for approximating functions with multiple minima. A fundamental property (assuming uniformly distributed points) lim m n(x) = f (x), n X = n

Radial Basis Functions: Overview The choice of the radial basis function φ is crucial for the accuracy and numerical stability of the approximation. Commonly used radial basis functions: φ(r) = r linear φ(r) = r 3 cubic φ(r) = r 2 log r thin plate φ(r) = (γr 2 + 1) 3 2 multiquadric φ(r) = e γr 2 Gaussian r 0. Important applications of radial basis functions include: solving partial differential equations. neural networks. interpolation of spatial data.

An Illustrative Example (1) Cubic RBF Interpolation with 30 randomly chosen interpolation points, Rastrigin function: 1.0 Function 4 1.0 RBF model 42 36.0 36 31.5 27.0 30 24 1.0 1.0 1.0 22.5 18.0 13.5 9.0 4.5 1.0 1.0 1.0 18 12 6 0 6

An Illustrative Example (2) Function Objective function RBF model Model 40 35 30 25 20 15 10 5 40 30 20 10 0

Limiting Functions of Flat RBF Models (1) Examples of RBF models with adjustable shape parameter: φ(r, γ) = (γr 2 + 1) 3 2 φ(r, γ) = e γr 2 The limit γ 0 (Fornberg et al. 2004) multiquadric Gaussian When X = (n+1)(n+2) 2 and the set X is poised for quadratic interpolation, the limit γ 0 yields a quadratic polynomial, i.e. there exist A R n n, b R n and c R such that X lim γ 0 i=1 Implication λ i φ( x x i, γ) + p(x) = 1 2 xt Ax + b T x + c. RBF models yield accurate local approximations by letting γ 0 near a minimum.

Limiting Functions of Flat RBF Models (2) 0.6 0.4 0.2 0.2 0.4 Function 4 36.0 31.5 27.0 22.5 18.0 13.5 9.0 0.6 0.4 0.2 0.2 0.4 Multiquadric RBF model (γ=5) 81 72 63 54 45 36 27 18 0.6 4.5 0.6 9 0.6 0.4 0.2 0.2 0.4 0.6 0.6 0.4 0.2 0.2 0.4 0.6 0 Multiquadric RBF model (γ=5) 180 Quadratic model 180 0.6 135 0.6 135 0.4 90 0.4 90 0.2 45 0.2 45 0 0 0.2 45 0.2 45 0.4 0.6 0.6 0.4 0.2 0.2 0.4 0.6 90 135 180 0.4 0.6 0.6 0.4 0.2 0.2 0.4 0.6 90 135 180

Determining the RBF Model Parameters (1) We are particularly interested in multiquadric RBF models X m(x) = λ i (γ x x i 2 + 1) 3 2 + p(x x0 ), i=1 where the linear polynomial tail p(x) = b T x + c guarantees a unique interpolant (Powell 1992). provides an estimate for the function gradient. The interpolation equations uniquely determining the model parameters λ are m(x i ) = f (x i ), i = 1,..., X X λ i p j (x i x 0 ) = 0, j = 1,..., n + 1, i=1 where the set {p 1,..., p n+1 } spans a linear polynomial space.

Determining the RBF Model Parameters (2) The interpolation equations in matrix form are [ ] [ ] [ ] Φ Π λ F Π T =, 0 c 0 where and λ = λ 1. λ X, c = c 1. c n+1, F = f (x 1 ). f (x X ) Φ ij = φ( x i x j ), Π ij = p j (x i x 0 ),. A sufficient condition for a unique solution is that a subset Y Y, Y = n + 1, where Y = {x 1 x 0,..., x X x 0 }, is linearly independent. The solution to these equations can be updated in O(n 2 ) operations with Cholesky and QR factorizations (Powell 1996).

The Trust Region Framework Mathematical formulation At each iteration k, solve the trust region subproblem x = arg min s {m k (x) x B k }, B k = {x F x x k < k }, where F is the feasible set and x k = arg min{f (x) x X}. At each iteration step, obtain the index of the replaced point from i = arg max x i x k, i=1,..., X and set x i = x. Also adjust the trust region radius k : If the step x yields a sufficiently smaller objective function value, set k+1 > k. Otherwise, set k+1 < k.

Updating the Model Under Geometric Constraints Notation: y i = x i x k infeasible region The Wedge Condition (Marazzi, Nodedal 2002) S = span({y 1,..., y n+1 } \ {y }). Compute vector ˆn that is orthogonal to S. The feasible region containing sufficiently linearly independent points is defined by F = {x B k (x x k ) T ˆn > γ x }. These constraints ensure that the set {y 1,..., y n+1 } remains poised for linear interpolation. The Gram-Schmidt construction of {y 1,..., y n+1 } can be updated in O(n 2 ) operations.

The Special Structure of RBF Models m(x) = λ i φ( x x i ) + λ i φ( x x i ) λ i >0 λ i <0 convex concave + p(x x 0 ), convex Motivation RBF models are linear combinations of convex and concave functions. Hence, it is natural to express the model function in the form where g and h are convex. Implications m(x) = g(x) h(x), It is possible to develop efficient d.c. (diff-convex) algorithms for minimizing the RBF model function.

Diff-convex Decomposition of an RBF Model Regularization (Hoai An, Vaz and Vicente 2009) g(x) = ρ 2 x x 0 2 + p(x), X h(x) = ρ 2 x x 0 2 λ i φ( x x i ) i=1 With this decomposition, solving the d.c. subproblem x k+1 = arg min x F {g(x) (h(x k) + h(x k ) T (x x k ))}, is equivalent to solving x k+1 = arg min x (x 0 + x k m(x k) ). x F ρ

How to Determine the ρ-parameter? The adaptive d.c. algorithm At each iteration, set { ρ = γ1 ρ, f (x k+1 ) f (x k ), reject x k+1 ρ = γ 2 ρ, f (x k+1 ) < f (x k ), where γ 1 > 1 and 0 < γ 2 < 1. The convergence rate is inversely proportional to ρ. The convexity of h within the trust region B is guaranteed, if ρ ρ = max{max x B {eig( 2 m(x))}, 0}. Problem Derive an upper bound for the minimal ρ that ensures convexity.

Estimating the Eigenvalues of the Model Hessian The Hessian matrix of an RBF model is of the form X 2 m(x) = λ i [α( r i )I n + β( r i ) r iri T r i 2 ], where r i = x x i. i=1 An estimate for the greatest positive eigenvalue From the above expression, we have X e max (x) λ i α(ri max ) + i=1 X i=1,λ i >0 λ i β(r max i ), where e max (x) = max{eig( 2 m(x)) x B(x 0, )}

Interval Analysis of RBF Models Example: estimating an upper bound of the RBF model in a sphere xi Case 2b Case 2a Case 1 Given the index j of the origin point, we define λ i > 0, λ i < 0, r max i r min i = = x i x j + { xi x j, x i / B(x j, ) 0, x i B(x j, ) Case 1 Case 2a Case 2b

The Effect of Shape Parameter on Convergence Rates Convergence rates of algorithms with quadratic, multiquadric rbf and cubic rbf model functions were compared. 10 0 QUAD RBF MQ RBF C 10 1 10 2 xk x 10 3 10 4 10 5 10 6 0 10 20 30 40 50 60 70 80 number of iterations Adaptive multiquadric RBF exhibits a rapid convergence rate. Cubic RBF without shape parameter exhibits a very slow local convergence rate.

Thank you! Questions?