DAMTP 999/NA22 A radial basis function method for global optimization H.-M. Gutmann Abstract: We introduce a method that aims to nd the global minimum

Size: px

Start display at page:

Download "DAMTP 999/NA22 A radial basis function method for global optimization H.-M. Gutmann Abstract: We introduce a method that aims to nd the global minimum"

Winifred Houston
5 years ago
Views:

1 UNIVERSITY OF CAMBRIDGE Numerical Analysis Reports A radial basis function method for global optimization H.-M. Gutmann DAMTP 999/NA22 December, 999 Department of Applied Mathematics and Theoretical Physics Silver Street Cambridge CB3 9EW England

2 DAMTP 999/NA22 A radial basis function method for global optimization H.-M. Gutmann Abstract: We introduce a method that aims to nd the global minimum of a continuous nonconvex function on a compact subset of IR d. It is assumed that function evaluations are expensive and that no additional information is available. Radial basis function interpolation is used to dene a utility function. The maximizer of this function is the next point where the objective function is evaluated. We show that, for most types of radial basis functions that are considered in this paper, convergence can be achieved without further assumptions on the objective function. Besides, it turns out that our method is closely related to a statistical global optimization method, the P-algorithm. A general framework for both methods is presented. Finally, a few numerical examples show that on the set of Dixon-Szego test functions our method yields favourable results in comparison to other global optimization methods. Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Silver Street, Cambridge CB3 9EW, England. December, 999.

3 A radial basis function method for global optimization 3 Introduction Global optimization has attracted a lot of attention in the last 20 years. In many applications, the objective function is nonlinear and nonconvex. Often, the number of local minima is large. Therefore standard nonlinear programming methods may fail to locate the global minimum. In the most general way, the Global Optimization Problem can be stated as (GOP) nd x 2 D such that f(x ) f(x); x 2 D; where D IR d is compact, and f : D! IR is a continuous function dened on D. Under these assumptions, (GOP) is solvable, because f attains its minimum on D. Numerous methods to solve (GOP) have been developed (see e.g. Horst and Pardalos [4] and Torn and Zilinskas [9]). Stochastic methods like simulated annealing and genetic algorithms which use only function values are very popular among users, although their rate of convergence is usually rather slow. Deterministic methods like Branch-and-Bound, however, assume that one can compute a lower bound of f on a subset of D. This can be done, for example, when the Lipschitz constant on f is available. The further assumptions make these methods very powerful, but often they are not satised or it is too expensive to provide the necessary information. For the method investigated in this paper, we have in mind problems when the only information available is the possibility to evaluate the objective function, and each evaluation is very expensive. This may mean that it takes several hours to calculate a function value. For example, a function evaluation at a point may be done by building an experiment, by running a long computer simulation or by using a nite element method. Therefore, as the duration of the optimization process is dominated by the function evaluations, our goal is to require as few function evaluations as possible to nd an adequate estimate of the global minimum. The method is based on a general technique proposed by Jones [8]. Let A be a linear space of functions, and assume that, for s 2 A, (s) is a measure of the \bumpiness" of s. Now assume that we have calculated x ; : : : ; x n and the function values f(x ); : : : ; f(x n ). A target value f is chosen that can be regarded as an estimate of the optimal value, but it might be very crude. For each y 62 fx ; : : : ; x n g, let s y 2 A be dened by the interpolation conditions s y (x i ) = f(x i ); i = ; : : : ; n; s y (y) = f : (.) The new point x n+ is chosen to be the value of y that minimizes (s y ); y 62 fx ; : : : ; x n g. Thus the view is taken that the \least bumpy" of the functions s y

4 4 H.-M. Gutmann yields the most useful location of the new point. We will use radial basis functions as interpolants. Their interpolation properties are very suitable. Specically, the uniqueness of an interpolant is achieved under very mild conditions on the location of the interpolation points, and a measure of bumpiness is also available. Close relations can be established between our method and one from statistical global optimization, namely the P-algorithm (Zilinskas [22]). Although being derived using a completely dierent approach, it is very similar to our method. One special case of a P-algorithm, developed by Kushner [2], is even equivalent to a special case of our radial basis function method. Other global optimization methods based on radial basis functions have been developed. Alotto et. al. [] use interpolation by multiquadrics to accelerate a simulated annealing method. Ishikawa et. al. [6], [7] employ radial basis functions to estimate the global minimizer and run an SQP algorithm to locate it. The properties of radial basis functions that are necessary for the description of our method are introduced in the following section. In particular, we address the question of interpolation and introduce a suitable measure of \bumpiness". The global optimization method is described in detail in Section 3. Convergence of the method is the subject of Section 4. Subsection 4. contains various convergence results, and the proof of the main theorem can be found in Subsection 4.2. The relation between our method and the P-algorithm is addressed in Section 5. The nal section deals with search strategies, but a complete analysis is beyond the scope of this paper. 2 Interpolation by radial basis functions and a measure of bumpiness Let n pairwise dierent points x ; : : : ; x n 2 IR d and data f ; : : : ; f n 2 IR be given, where n and d are any positive integers. We seek a function s of the form s(x) = nx i (kx? x i k) + p(x); x 2 IR d ; (2.) that interpolates the data (x ; f ); : : : ; (x n ; f n ). The coecients i ; i = ; : : : ; n, are real numbers, and p is from m, the space of polynomials of degree less than or equal to m. The norm k:k is the Euclidean norm in IR d. The following choices of are considered: (r) = r (linear); (r) = r 3 (cubic); (r) = r 2 log r (thin plate spline); (r) = r (multiquadric); (r) = e?r2 (Gaussian); 9 >= >; r 0; (2.2)

5 A radial basis function method for global optimization 5 where is a prescribed positive constant. Let the matrix 2 IR nn be dened by () ij := (kx i? x j k); i; j = ; : : : ; n: (2.3) Further, we introduce the linear space V m IR n containing all 2 IR n that satisfy nx i q(x i ) = 0 8 q 2 m : (2.4) Formally, we set V? := IR n. Obviously, V m+ V m for all m?. Powell [5] shows that, in the cubic and thin plate spline cases in the linear and multiquadric cases and in the Gaussian case T > V n f0g; (2.5) T < V 0 n f0g; (2.6) T > IR n n f0g: (2.7) We let m 0 be in the cubic and thin plate spline cases, 0 in the linear and multiquadric cases and? in the Gaussian case. Then the inequalities (2.5) { (2.7) can be merged into T > V m0 n f0g: (2.8) After choosing, we let m be an integer that is not less than m 0, and is conned to V m. Let ^m be the dimension of m, let p ; : : : ; p ^m be a basis of this linear space, and let P be the matrix P := 0 p (x ) p ^m (x ).. p (x n ) p ^m (x n ) C A : (2.9) Then it can be shown (see [5]) that the matrix P A = P T 2 IR (n+ ^m)(n+ ^m) (2.0) 0 is nonsingular if and only if x ; : : : ; x n satisfy q 2 m and q(x i ) = 0; i = ; : : : ; n; =) q 0: (2.)

6 6 H.-M. Gutmann In the Gaussian case with m =?, P and condition (2.) are omitted. Therefore the coecients of the function s in (2.) are dened uniquely by the system nx s(x i ) = f i ; i = ; : : : ; n i p j (x i ) = 0; j = ; : : : ; ^m 9 >= >; : (2.2) Let F be the vector whose entries are the data values f ; : : : ; f n. Then the system (2.2) becomes P F = ; (2.3) P T 0 c 0 ^m where = ( ; : : : ; n ) T 2 IR n, c 2 IR ^m and 0 ^m is the zero in IR ^m. The components of c are the coecients of the polynomial p with respect to the basis p ; : : : ; p ^m. The motivation for the measurement of the bumpiness of a radial basis function interpolant can be developed from the theory of natural cubic splines in one dimension. They can be written in the form (2.), where (r) = r 3 ; 2 V and p 2. It is well known (e.g. Powell [4]) that the interpolant s that is dened by the system (2.2) minimizes I(g) := R IR [g00 (x)] 2 dx among all functions g : IR! IR that satisfy the interpolation conditions g(x i ) = f i ; i = ; : : : ; n, and for which I(g) exists and is nite. Therefore I(g) is a suitable measure of bumpiness. The second derivative s 00 is piecewise linear and vanishes outside a bounded interval. Thus one obtains by integration by parts I(s) = Z IR = 2 [s 00 (x)] 2 dx = 2 nx i nx j= nx i s(x i ) j jx i? x j j 3 + p(x i )! = 2 nx i;j= i j ij ; where the last equation follows from 2 V. This relation suggests that expression (2.8) can provide a semi-inner product and a semi-norm for each in (2.2) and m m 0. Also, the semi-norm will be the measure of bumpiness of a radial basis function (2.). A semi-inner product h:; :i satises the same properties as an inner product, except that hs; si = 0 need not imply s = 0. Similarly, for a semi-norm k:k, ksk = 0 does not imply s = 0. We choose any radial basis function from (2.2) and m m 0, and we dene A ;m to be the linear space of all functions of the form NX i (kx? y i k) + p(x); x 2 IR d ; where N 2 IN, y ; : : : ; y N 2 IR d, p 2 m, and = ( ; : : : ; N ) T satises (2.4) for n = N. On this space, the semi-inner product and the semi-norm are dened as

7 A radial basis function method for global optimization 7 follows. Let s and u be any functions in A ;m, i.e. s(x) = N (s) X i (kx? y i k) + p(x) and u(x) = We let the semi-inner product be the expression hs; ui := N (s) X Clearly, it is bilinear. To show symmetry, we use N (s) X i q(y i ) = 0 and N (u) X j= N (u) X j= j (kx? z j k) + q(x): i u(y i ): (2.4) j p(z j ) = 0; to deduce hs; ui = = = = N (s) X N (s) X N (u) X j= N (u) X j= i 0 N (u) X N (u) j= j (ky i? z j k) + q(y i ) i j (ky i? z j k) j= 0 N X (s) i (kz j? y i k) + p(z j ) j s(z j ) = hu; si: A A By (2.8), hs; si = N (s) X i s(y i ) = N (s) X i;j= i j (ky i? y j k) (2.5) is strictly positive, if 2 V m n f0g and m m 0, i.e. s 2 A ;m is not a polynomial in m. Thus (2.4) is a semi-inner product on A ;m that induces the semi-norm hs; si with null space m (for details see Powell [6] and Schaback [7]). In analogy to the variational principle for cubic splines in one dimension, mentioned above, there is a theorem that states that the given interpolant is the solution to a minimization problem.

8 8 H.-M. Gutmann Theorem (Schaback [7]) Let be any radial basis function from (2.2), and let m be chosen such that m m 0. Given are points x ; : : : ; x n in IR d having the property (2.) and values f ; : : : ; f n in IR. Let s be the radial function of the form (2.) that solves the system (2.2). Then s minimizes the semi-norm hg; gi =2 on the set of functions g 2 A ;m that satisfy g(x i ) = f i ; i = ; : : : ; n: (2.6) 3 A radial basis function method It will be shown how radial basis functions can be used in the general method of Jones [8] to solve the problem (GOP) (cf. Powell [6]). As in Section 2, we pick from (2.2) and m m 0. Let p ; : : : ; p ^m be a basis of m, where ^m = dim m. Assume we have chosen x ; : : : ; x n 2 D that satisfy (2.), and we know the function values f(x ); : : : ; f(x n ). Let the function s n (x) = nx i (kx? x i k) + p(x); x 2 IR d ; interpolate (x ; f(x )); : : : ; (x n ; f(x n )). Our task is to determine x n+. For a target value f n and a point y 2 D n fx ; : : : ; x n g the radial basis function s y that satises (.) can be written as s y (x) = s n (x) + [f n? s n (y)] `n(y; x); x 2 IR d ; (3.) where `n(y; x) is the radial basis function solution to the interpolation conditions Therefore `n(y; :) can be expressed as `n(y; x) = nx `n(y; x i ) = 0; i = ; : : : ; n; `n(y; y) = : (3.2) i (y)(kx?x i k)+ n (y)(kx?yk)+ where the coecients of `n(y; :) are dened by the equations A(y) (y) n (y) b(y) A = 0 n 0 ^m ^mx b i (y)p i (x); x 2 IR d ; (3.3) A : (3.4)

9 A radial basis function method for global optimization 9 Here (y) = ( (y); : : : ; n (y)) T 2 IR n, b(y) = (b (y); : : : ; b ^m (y)) T 2 IR ^m, n (y) 2 IR, 0 n and 0 ^m denote the zero in IR n and IR ^m, respectively, and, as in equation (2.0), A(y) has the form (y) P (y) A(y) := P (y) T : (3.5) 0 ^m ^m Specically, letting u(y) and (y) be the vectors u(y) := ((ky? x k); : : : ; (ky? x n k)) T and (y) := (p (y); : : : ; p ^m (y)) T ; respectively, (y) and P (y) are the matrices u(y) (y) := and P (y) := u(y) T (0) P (y) T : (3.6) The square of the semi-norm hs y ; s y i of the new interpolant (3.), as dened in the previous section, has the value hs y ; s y i = hs n ; s n i + 2[f n? s n (y)]hs n ; `n(y; :)i + [f n? s n (y)] 2 h`n(y; :); `n(y; :)i: Equations (2.4) and (3.2) imply hs n ; `n(y; :)i = nx i`n(y; x i ) = 0; and, using expressions (3.2) and (3.3), we nd the expression h`n(y; :); `n(y; :)i = Thus we deduce the formula " n X i (y)`n(y; x i ) + n (y)`n(y; y) = n (y): (3.7) hs y ; s y i = hs n ; s n i + n (y) [f n? s n (y)] 2 : Further, we dene the function g n : D n fx ; : : : ; x n g! IR as the dierence g n (y) := hs y ; s y i? hs n ; s n i = n (y) [f n? s n (y)] 2 ; which is nonnegative. Since hs n ; s n i is independent of y, the required minimization of hs y ; s y i and the minimization of g n (y) are equivalent. #

10 0 H.-M. Gutmann The choice of f n determines the location of x n+. If f n min y2d s n(y); then g n (y) = 0 can be achieved. However, if f n < min y2d s n(y); then x n+ will be away from the x i ; i = ; : : : ; n. In particular, for f n!?, we make the following deduction. Remark 2 For f n < min y2d s n (y) let x(f n) be the minimizer of g n, i.e. n (x(f n))[s n (x(f n))? f n] 2 n (y)[s n (y)? f n] 2 This is equivalent to n (x(f n)) n (y) As f n!?, the boundedness of s n on D implies 8 y 2 D n fx ; : : : ; x n g: + s n(y)? s n (x(f n)) s n (x(f n))? f n n (x(?)) n (y) 8 y 2 D n fx ; : : : ; x n g: 2 : Therefore, the choice f n =? requires the minimization of n (y). This process puts x n+ in a large gap between x i ; i = ; : : : ; n, a property that is of fundamental importance to global optimization. The following basic algorithm employs the given method. Algorithm 3 Initial step: Pick from (2.2) and m m 0. Choose points x ; : : : ; x n0 2 D that satisfy (2.). Compute the radial function s n that minimizes hs; si on A ;m, subject to the interpolation conditions s(x i ) = f(x i ); i = ; : : : ; n: Iteration step: x ; : : : ; x n are the points in D where the value of f is known, and s n minimizes hs; si, subject to s(x i ) = f(x i ); i = ; : : : ; n. Choose a target value f n 2 [?; min y2d s n (y)]. (The choice f n = min s n (y) is admissible only if none of the x i is a global minimizer of s n ). Calculate x n+, which is the value of y that minimizes the function g n (y) = n (y) [s n (y)? f n] 2 ; y 2 D n fx ; : : : ; x n g: (3.8) Evaluate f at x n+ and set n := n +. Stop, if n is greater than a prescribed n max.

11 A radial basis function method for global optimization The function g n is innitely dierentiable on Dnfx ; : : : ; x n g, but is not dened at the interpolation points. If f n = min s n (y); y 2 D; and if s n (x i ) > fn; i = ; : : : ; n, then the global minimizers of g n are the global minimizers of s n. Thus one can minimize s n, which is dened on the whole of D, to obtain x n+. If f n < min s n (y), however, then g n (x) tends to innity as x tends to x i ; i = ; : : : ; n. Let h n : D! IR be dened as h n (y) := 8 >< >: g n (y) ; y 62 fx ; : : : ; x n g 0; y = x i ; i = ; : : : ; n : (3.9) The maximization of h n on D is equivalent to the minimization of g n. Further, h n is innitely dierentiable on D n fx ; : : : ; x n g. It can also be shown, using the system (3.4), that it is in C(D) in the linear case, in C (D) in the thin plate spline case, in C 2 (D) in the cubic case, and in C (D) in the multiquadric and Gaussian cases. Under certain conditions on f and the values fn, n!, it can be proved that a subsequence of the generated points (x n ) n2in converges to a global minimum. This is the subject of the following section. 4 Convergence of the method Our aim is to prove convergence of the method for any continuous function f. A theorem by Torn and Zilinskas [9] tells us that the sequence that is generated by Algorithm 3 should be dense. Applied to our method, it states Theorem 4 The algorithm converges for every continuous function f if and only if it generates a sequence of points that is dense in D. So our task is to establish the density of the sequence of generated points. Our main convergence result (Theorem 7), however, does not allow all choices of and m. The rst part of this section states Theorem 7 and derives a few corollaries. The second part gives the proof of the theorem. 4. Convergence results Assume that f n is set to min y2d s n (y) on each iteration and that this choice is admissible. Then, if there are large function values in a neighbourhood U of a global minimizer of f, the global minimizer of s n might be outside U for every n n 0, where n 0 is the number of initial points. In this case, U is not explored and the global optimum of f might be missed. Therefore, we have to assume that

12 2 H.-M. Gutmann enough of the numbers min s n (y)? f n are suciently large. Specically, let > 0 and 0 be constants, where additionally < in the linear case and < 2 in the thin plate spline and cubic cases, and dene Then the condition n := min kx n? x i k: (4.) in? min y2d s n(y)? f n > =2 n ks n k ; (4.2) for innitely many n 2 IN, where k:k denotes the maximum norm of a function on D, will lead to the required result. We note that the norms ks n k may diverge as n!. Unfortunately, the choice of and m is restricted. In the proof of Theorem 7 we need the result that, for any y 2 D and any neighbourhood U of y, n (y) can be bounded above by a number that does not depend on n, if none of the points x ; : : : ; x n is in U. This condition is achieved, if there is a function that takes the value at y, that is identically zero outside U, and that is in the function space N ;m (IR d ) as dened below. Denition 5 Let from (2.2) and m m 0 be given. A continuous function F : D! IR, D IR d, belongs to the function space N ;m (D), if there exists a positive constant C such that, for any choice of interpolation points x ; : : : ; x n 2 D for which (2.) holds, the interpolant s n 2 A ;m to F at these points has the property hs n ; s n i C: The characterization of N ;m (D) is rather abstract. In the linear, cubic and thin plate spline cases, the following proposition that is taken from Gutmann [3] provides a useful criterion to check whether it is satised. In the multiquadric and Gaussian cases, however, no such criterion is known. Proposition 6 Let (r) = r, (r) = r 2 log r or (r) = r 3. Further, let = in the linear case, = 2 in the thin plate spline case and = 3 in the cubic case, and choose the integer m such that 0 m d in the linear case, m d + in the thin plate spline case and m d + 2 in the cubic case. Dene := (d + )=2 if d+ is even, and := (d++)=2 otherwise. If F 2 C (IR d ) and has bounded support, then F 2 N ;m (IR d ). Global convergence will be established only for the cases covered by this proposition. It remains an open problem whether a similar property is achieved in other cases. Thus we have the following theorem.

13 A radial basis function method for global optimization 3 Theorem 7 Let (r) = r, (r) = r 2 log r or (r) = r 3. Further, choose the integer m such that 0 m d in the linear case, m d + in the thin plate spline case and m d + 2 in the cubic case. Let (x n ) n2in be the sequence generated by Algorithm 3, and s n be the radial function that interpolates (x i ; f(x i )); i = ; : : : ; n. Assume that, for innitely many n 2 IN, the choice of f n satises (4.2). Then the sequence (x n ) is dense in D. A particular convergence result follows immediately from Theorems 4 and 7, because the right hand side of (4.2) is some real number. Corollary 8 Let the assumptions of Theorem 7 on and m hold. Further, let f be continuous, and, for innitely many n 2 IN, let f n =?. Then the method converges. An interesting question is to nd conditions on f such that the maximum norm of an interpolant is uniformly bounded. If they hold, then the right-hand side of (4.2) can be replaced by n =2, so this constraint on f n can be checked easily. We consider the special case of linear splines in one dimension, when d =, (r) = r and m = 0. For arbitrary points x ; : : : ; x n, the piecewise linear interpolant s n attains its maximum and minimum values at interpolation points. Thus ks n k is bounded by kfk, a number that does not depend on the interpolation points. Therefore in this case the term ks n k may be dropped from (4.2). For other radial basis functions and other dimensions this simplication may fail. It is shown in the next lemma, however, that the uniform boundedness of the semi-norm of an interpolant is sucient for the uniform boundedness of the maximum norm. Thus, the second convergence result applies to functions f in N ;m (D). Lemma 9 Let f be in N ;m (D). Further, let (x n ) n2in be a sequence in D with pairwise dierent elements, such that (2.) holds for n = n 0. For n n 0, denote the radial basis function interpolant to f at x ; : : : ; x n by s n. Then ks n k is uniformly bounded by a number that only depends on x ; : : : ; x n0. Proof: We x n, and we let y be any point of D n fx ; : : : ; x n g. Let ~s n be the radial function that interpolates (y; f(y)) and (x i ; f(x i )); i = ; : : : ; n. By analogy with equation (3.), it can be written as ~s n (x) = s n (x) + [f(y)? s n (y)]`n(y; x); x 2 IR d ; where `n(y; :) is still the cardinal function that interpolates (x i ; 0); i = ; : : : ; n, and (y; ). Thus, as shown in Section 3, h~s n ; ~s n i = hs n ; s n i + [f(y)? s n (y)] 2 n (y);

14 4 H.-M. Gutmann which gives the equation [f(y)? s n (y)] 2 = h~s n; ~s n i? hs n ; s n i n (y) ; (4.3) the value of n (y) being strictly positive. Next we show that n (y) is bounded away from zero. Let `n0 (y; :) be the cardinal function that interpolates (x ; 0); : : : ; (x n0 ; 0) and (y; ). Then the semi-norm properties of h:; :i and Theorem imply 0 < n0 (y) = h`n0 (y; :); `n0 (y; :)i h`n(y; :); `n(y; :)i = n (y): For n = n 0, let A and A(y) be the matrices (2.0) and (3.5), respectively. By using Cramer's Rule to solve (3.4), we nd n0 (y) = det A det A(y) : Now det A is a nonzero constant, and det A(y) is bounded on D. It follows that n0 (y) is bounded away from zero. Therefore there exists a constant > 0 such that n (y) 8 y 2 D n fx ; : : : ; x n0 g; n n 0 : (4.4) As f 2 N ;m (D), h~s n ; ~s n i is bounded above by a positive constant C. Further, hs n ; s n i is nonnegative. It follows from (4.3) and (4.4) that r C jf(y)? s n (y)j ; y 2 D n fx ; : : : ; x n g: Further, because f is bounded on D, we obtain r C js n (y)j + kfk : Note that the right-hand side is independent of n and y, as required. Alternatively, if y 2 fx ; : : : ; x n g, we have js n (y)j = jf(y)j kfk ; which completes the proof. 2 Next, by applying Proposition 6, we obtain a criterion that ensures that f is in N ;m (D).

15 A radial basis function method for global optimization 5 Proposition 0 Let, m and be dened as in Proposition 6, and let f 2 C (D), where D IR d is compact. Then f 2 N ;m (D). Proof: By Whitney's theorem ([20]), f can be extended to a function F 2 C (IR d ) that is equal to f on D. Now D is contained in a closed ball of radius, say, and there is an innitely dierentiable function g with g(x) = ; kxk, and g(x) = 0; kxk 2. Thus F g is in C (IR d ), and by Proposition 6 it is in N ;m (IR d ). Since F g is equal to f on D, it follows from the denition of the semi-norm that f 2 N ;m (D). 2 We complete this subsection by combining Theorem 7, Lemma 9 and Proposition 0. Corollary Let the assumptions of Theorem 7 on and m hold. Let be as in Proposition 6, and let f 2 C (D). Further, assume that, for innitely many n 2 IN, f n has the property min s n(y)? f n =2 n ; y2d where > 0 is a constant, and where n and are as in the rst paragraph of this subsection. Then the method converges. 4.2 Proof of Theorem 7 In order to establish Theorem 7, some lemmas are needed on the behaviour of the coecients n. Lemma 2 Let be any of the radial basis functions in (2.2), and let m 0 and m m 0 be chosen as in Section 2. Let D IR d be compact, and let (x n ) n2in be a convergent sequence in D with pairwise dierent elements. Further, let (y n ) n2in be a sequence in D such that y n 6= x n ; n 2 IN, and lim n! kx n? y n k = 0: Choose k points z ; : : : ; z k 2 D that satisfy condition (2.). Assume (x n ) converges to x 2 Dnfz ; : : : ; z k g. For any y 2 Dnfz ; : : : ; z k ; y n+ g, let ~`y be the cardinal spline that interpolates the data (z ; 0); : : : ; (z k ; 0); (y n+ ; 0) and (y; ), and let ~ n (y) be the coecient of ~`y that satises ~ n (y) = h ~`y; ~`yi. Then, for 0 < in the linear case and 0 < 2 in the other cases, lim n! (?)m 0+ ky n+? x n+ k ~ n (x n+ ) = : (4.5)

16 6 H.-M. Gutmann Proof: Let A n and A n (x n+ ) be the matrices of the form (2.0) for the interpolation points z ; : : : ; z k ; y n+ and z ; : : : ; z k ; y n+ ; x n+, respectively. For suciently large n, neither x n+ nor y n+ is in the set fz ; : : : ; z k g. Thus, A n and A n (x n+ ) are nonsingular. Cramer's Rule implies ~ n (x n+ ) = det A n det A n (x n+ ) : (4.6) Also, let A be the matrix of the form (2.0) for the interpolation points z ; : : : ; z k ; x. By the continuity of the determinant, lim det A n = det A 6= 0: (4.7) n! In order to investigate the behaviour of ky n+? x n+ k? det A n (x n+ ), we let T v(y) := (ky? z k); : : : ; (ky? z k k) ; y 2 D; and p(y) := p (y); : : : ; p ^m (y) T ; y 2 D; where ^m = dim m and p ; : : : ; p ^m are as in Section 2. Thus A n (x n+ ) is the matrix 0 v(y n+ ) v(x n+ ) P v(y n+ ) T (0) (ky n+? x n+ k) p(y n+ ) T v(x n+ ) T (ky n+? x n+ k) (0) p(x n+ ) T P T p(y n+ ) p(x n+ ) 0 C A ; (4.8) where and P correspond to (2.3) and (2.9), respectively, if we set n = k and fx ; : : : ; x n g = fz ; : : : ; z k g. Now the rows ( v(y n+ ) T (0) (ky n+? x n+ k) p(y n+ ) T ) and ( v(x n+ ) T (ky n+? x n+ k) (0) p(x n+ ) T ) have the same limit as n!, so det A n (x n+ ) tends to zero. Hence the properties (4.6) and (4.7) prove the assertion (4.5) for = 0. For > 0, note that the value of the determinant of the matrix (4.8) does not change if we replace the second row by the dierence between the second and third rows, and subsequently replace the second column by the dierence between

17 A radial basis function method for global optimization 7 the second and third columns. Then det A n (x n+ ) becomes v(y n+ )?v(x n+ ) v(x n+ ) P v(y n+ ) T?v(x n+ ) T 2[(0)?(ky n+?x n+ k)] (ky n+?x n+ k)?(0) p(y n+ ) T?p(x n+ ) T v(x n+ ) T (ky n+?x n+ k)?(0) (0) p(x n+ ) T P T p(y n+ )? p(x n+ ) p(x n+ ) 0 (4.9) We have to divide the determinant by ky n+? x n+ k, so we divide the second row and then the second column of (4.9) by ky n+? x n+ k =2. Then the following remarks are helpful. For each choice of and each j = ; : : : ; k, the function (kz j? xk); x 2 D, is Lipschitz continuous, so the components of v(y n+ )? v(x n+ ) satisfy j(kz j? y n+ k)? (kz j? x n+ k)j const kx n+? y n+ k; j = ; : : : ; k: Thus for < 2 we have lim n! h (kz j? y ky n+? x n+ k =2 n+ k)? (kz j? x n+ k) i = 0; (4.0) Similarly, for < 2, the components of p(y n+ )? p(x n+ ) have the property h i lim p i (y n! ky n+? x n+ k =2 n+ )? p i (x n+ ) = 0 (4.) Finally, we deduce lim n! h i (ky ky n+? x n+ k n+? x n+ k)? (0) = 0; (4.2) for < in the linear case, for < 2 in the thin plate spline, multiquadric and Gaussian cases, and for < 3 in the cubic case. This is clear in the linear, thin plate spline and cubic cases. In the other two cases it follows from second order Taylor expansion of, because 0 (0) = 0 and 00 is bounded on IR +. Thus (4.9) { (4.2) provide lim n! ky n+? x n+ k? det A n (x n+ ) = 0; for the given values of. Hence (4.6) and (4.7) imply (4.5). 2 Now we obtain Lemma 3 Let, m 0 and m be chosen as in Lemma 2, and let (x n ) n2in be the sequence generated by Algorithm 3. Further, let 0 < in the linear case and :

18 8 H.-M. Gutmann 0 < 2 in the other cases. Then, for every convergent subsequence (x n k ) k2in of (x n ), lim k! (?)m 0+ nk nk?(x n k ) = ; where n (:) is dened in Section 3 and n k is expression (4.) for n = n k. Proof: For n 2, dene j n to be the natural number j that minimizes kx n? x j k; j < n, so n = kx n? x jn k. Further, let (y n ) n2in be the sequence x2 ; n = ; y n := x jn ; n 2: Let (x ) n k k2in be a subsequence of (x n ) n2in, that converges to x, say. Convergence and the choice of (y n ) n2in imply lim k! kx? y n k nk k = 0. The initial step of Algorithm 3 provides a nite number of points that satisfy (2.), so the initial interpolation matrix (2.0) is nonsingular. If one of these points is x, we pick x n k 0 in a neighbourhood of x so that the interpolation matrix to x n k 0 and the other initial points is also nonsingular. Therefore there exist points ^x ; : : : ; ^x l in (x n ) n2in such that their interpolation matrix is nonsingular, and x 62 f^x ; : : : ; ^x l g. For suciently large k 2 IN, such that y n k 62 f^x ; : : : ; ^x l g, and for any y 2 D n fx ; : : : ; x n k?g, we let ^`k(y; :) be the radial function that interpolates (^x ; 0); : : : ; (^x l ; 0); (y ; 0) and (y; ), and we let `nk?(y; n k :) be the interpolant to (x ; 0); : : : ; (x n k?; 0) and (y; ). Because `nk?(y; :) interpolates (y n k ; 0) and (^x i ; 0); i = ; : : : ; l, for suciently large k, (3.7) and Theorem imply the inequality ^ k (y) = h^`k(y; :); ^`k(y; :)i h`nk?(y; :); `nk?(y; :)i = n k?(y) (4.3) for the coecients ^ k and n k?. We apply Lemma 2 in the case when fz ; : : : ; z k g is the set f^x ; : : : ; ^x l g and n = n k?. It follows that lim k! (?)m 0+ nk ^ k(x n k ) = lim k! (?)m 0+ kx n k? y nk k^ k (x n k ) = ; with the choice of stated in Lemma 2. Thus, setting y = x n k in (4.3), we obtain that nk nk?(x n k ) tends to innity as k!. 2 Finally we show, using Proposition 6, that the coecients n (y) are uniformly bounded, if y is bounded away from the points that are generated by the algorithm.

19 A radial basis function method for global optimization 9 Lemma 4 Let (r) = r, (r) = r 2 log r or (r) = r 3. Further, choose the integer m such that 0 m d in the linear case, m d + in the thin plate spline case and m d + 2 in the cubic case. Let (x n ) n2in be the sequence generated by Algorithm 3, and let n 0 be the number of points chosen in the initial step. Assume that there exist y 0 2 D and a neighbourhood N := fx 2 IR d : kx? y 0 k < g, > 0, that does not contain any point of the sequence. Then there exists K > 0, that depends only on y 0 and, such that n (y 0 ) K 8 n n 0 : Proof: For any n n 0, let `n be the radial function that is dened by `n(x i ) = 0; i = ; : : : ; n, and `n(y 0 ) =. There exists a compactly supported innitely dierentiable function F that takes the value at y 0 and 0 on IR d n N. It follows from Proposition 6 that F 2 N ;m. `n interpolates F at x ; : : : ; x n and y 0. Therefore, there is a positive number K, depending on y 0 and, such that n (y 0 ) = h`n; `ni K; n n 0 : 2 Now we are ready to prove Theorem 7. Proof of Theorem 7: Assume there is y 0 2 D and an open neighbourhood U = fx 2 IR d : kx?y 0 k < g, > 0, that does not contain an interpolation point. The iteration step of Algorithm 3 gives g n (x n+ ) g n (y 0 ); n n 0 ; where n 0 is the number of points chosen in the initial step of the algorithm. By assumption (4.2), there is a subsequence (n k ) k2in of the natural numbers such that min s nk?(y)? f nk? > =2 nk?ks n y2d k?k 0; k 2 IN; (4.4) with > 0, n k? being the expression (4.) for n = n k?, 0 < in the linear and 0 < 2 in the thin plate spline and cubic cases. The sequence (x n k ) k2in is a sequence in a compact set, thus it contains a convergent subsequence. Therefore, without loss of generality, we assume that (x n k ) k2in itself converges. For all k 2 IN, x n k is the minimizer of g nk?(x). Thus, if f nk? >?, n k?(x n k )[s nk?(x n k )? f nk?] 2 n k?(y 0 )[s n k?(y 0 )? f nk?] 2 : (4.5)

20 20 H.-M. Gutmann If ks n k?k > 0, this inequality, condition (4.4) and the denition of k:k provide n k?(x n k ) (?)m 0+ n k?(y 0 ) sn k?(y 0 )? f nk? n k?(y 0 ) n k?(y 0 ) n k?(y 0 ) " " s n k?(x n k )? f nk? + js nk?(y 0 )? s n k?(x )j n k s n k?(x )? f n k nk? + =2 nk + 2 =2 nk 2 2 # js n k?(y 0 )? s n k?(x )j 2 n k ks n k?k # 2 : If ks n k?k = 0, s n k?(y)? f nk? (4.5) gives is a positive number independent of y, thus n k?(x n k ) (?)m 0+ n k?(y 0 ) n k?(y 0 ) " # ; =2 nk for any positive, as before. Remark 2 shows that this inequality holds also in the case f nk? =?. Multiplying both sides by nk yields nk (?)m 0+ n k?(x n k ) (?)m 0+ n k?(y 0 ) =2 nk : (4.6) By Lemma 3, the left-hand side of (4.6) tends to innity as k tends to innity. However, Lemma 4 states that n (y 0 ) is bounded above by a constant that does not depend on n. Thus the right-hand side of (4.6) is bounded by a constant that is independent of k which contradicts (4.6). Therefore there is a point in the sequence that is an element of U. This implies that in each neighbourhood of an arbitrary y 0 2 D there are innitely many elements of (x n ) n2in, so the sequence is dense in D. 2 5 Relations to statistical global optimization In this section we consider the similarities between the given radial basis function method and the P-algorithm. The idea of that method is proposed by Kushner [], [2] for one-dimensional problems. Here the objective function is regarded as a realization of a Brownian motion stochastic process. If real numbers x <: : :<x n

21 A radial basis function method for global optimization 2 are given, and their function values f(x ); : : : ; f(x n ) have been calculated, the model yields, for each x in the feasible set, a mean value Mean(x) and a variance Var(x). Mean(x) serves as a prediction of the true function value at x, while Var(x) is a measure of uncertainty. It turns out that Mean is the piecewise linear interpolant of the given data. The variance is piecewise quadratic on [x ; x n ], nonnegative and takes the value zero at x ; : : : ; x n. For a real number x, let F x be the normally distributed random variable F x with mean Mean(x) and variance Var(x). Now a nonnegative n is chosen, and the next point x n+ will be the one that maximizes the utility function P F x min f(x i)? n ; x 2 D; (5.) ;:::;n where P denotes probability. One can show that maximizing (5.) is equivalent to maximizing Var(x) ; x 2 D: (5.2) 2 [Mean(x)? minff(x ); : : : ; f(x n )g + n ] Compare our method in one dimension with the choice of linear splines, i.e. (r) = r and m = 0, and with the target values f n = minff(x ); : : : ; f(x n )g? n. In this case, the interpolant s n is identical to Mean. Further, except for a constant factor, Var(x) =? n (x) : Therefore, Kushner's method and our method using linear splines are equivalent. Zilinskas [2], [22] extends this approach to Gaussian random processes in several dimensions. He uses the selection rule (5.) and introduces the name \Palgorithm". In addition, he gives an axiomatic description of the terms involved in (5.), namely the mean value function, the variance function and the utility function. We relate these results to our method. Consider a symmetric function : IR d IR d! IR; (x; z) 7! (x; z), and assume that is conditionally positive or negative denite of order m. This means, there exists 2 f0; g such that, given n dierent points x ; : : : ; x n 2 IR d and multipliers ; : : : ; n 2 IR, we have (?) nx i;j= i j (x i ; x j ) > 0; if the i ; i = ; : : : ; n; are not all zero and satisfy nx i p(x i ) = 0; p 2 m :

22 22 H.-M. Gutmann Denote the matrix with the elements (x i ; x j ); i; j = ; : : : ; n, by, and the matrix with the elements p j (x i ); i = ; : : : ; n; j = ; : : : ; ^m by P, where fp j : j = ; : : : ; ^mg is a basis of m and ^m its dimension. The analogue of expression (2.0) is the matrix P A = P T : (5.3) 0 Further, we now let the interpolant s to the components of F = (f(x ); : : : ; f(x n )) T be the function s(y) = nx whose coecients solve the system A c i (y; x i ) + F = 0 ^m ^mx j= ; c j p j (y); It can be written as s(y) = v m (y) T A? F 0 ^m ; y 2 D; (5.4) where v m (y) is the vector The nonnegative function v m (y) := ((y; x ); : : : ; (y; x n ); p (y); : : : ; p ^m (y)) T : (5.5) Var(y) = j(y; y)? v m (y) T A? v m (y)j; y 2 D; (5.6) is assumed to be a measure of uncertainty. Note that v m (x i ) is the i-th column of A, i = ; : : : ; n, so we obtain Var(x i ) = j(x i ; x i )? v m (x i ) T A? v m (x i )j = j(x i ; x i )? v m (x i ) T e i j = 0: Thus there is no uncertainty at the interpolation points, which is meaningful because we know the true function values there. For the P-algorithm, is interpreted as the correlation function of a Gaussian stochastic process. The use of a normal distribution, for example, gives (x; y) := e?kx?yk2 =2, but other choices of are also considered in the literature. All of them are positive denite, so we set m =?. The conditional mean and the conditional

23 A radial basis function method for global optimization 23 variance can be expressed as ([2]) Mean(y) = Var(y) = (y; y)? f(x ); : : : ; f(x n ) 0 (y; x ) (y; x n ) (y; x ); : : : ; (y; x n ) C A ; (5.7) 0 (y; x B (y; x n ) C A ; (5.8) which agree with (5.4) and (5.6). For our method, given and m, it is suitable to dene (x; y) := (kx? yk). Thus =, and the coecients of the interpolant s solve (2.3). This gives the form F s(y) = v m (y) T A? ; (5.9) 0 which is the same as the function (5.4). We have seen already that j= n j can be regarded as a variance in the case of linear splines in one dimension. An expression for it containing the matrix A and the vector v m (x) can be derived in the following way. For any y 2 Dnfx ; : : : ; x n g, consider the cardinal function (3.3). The second cardinality condition from (3.2) implies nx n (y) = (0) + i (y) n (y) (ky? x ik) + ^mx j= b j (y) n (y) p j(y): (5.0) The coecients (y); n (y) and b(y) solve the system (3.4). Moreover, the vector (5.5) contains the rst n and the last ^m elements of the (n+)-th column of A(y). Therefore (y) and b(y) also solve (y) A =? b(y) n (y)v m (y); which implies (y) n (y) b(y) =?A? v m (y): Thus, replacing the terms i (y)= n (y) and b j (y)= n (y) in (5.0), we nd n (y) = (0)? v m(y) T A? v m (y): (5.) It follows from (x; x) = (0), x 2 D, that expression (5.6) is equivalent to j= n (y)j.

24 24 H.-M. Gutmann Finally, we consider the selection rule for the next point. For the P-algorithm, it has already been noted that the maximization of (5.) is equivalent to the maximization of (5.2). For a given target value f dene the function U : IR 2! IR as V 2 U(M; V ) := (M? f ) : (5.2) 2 It is increasing in V and decreasing in M, if f M. Also, it satises the axioms of rational search stated in Zilinskas [22]. Then, employing (5.4) and (5.6), both methods choose the point that maximizes U s(y); p Var(y) ; y 2 D: 6 Search strategies and practical questions Practical features of our method have received little attention in this paper. Several questions arise concerning the choice of parameters in Algorithm 3.. What radial basis function should be chosen, and what polynomial degree m? 2. What is a good strategy for the choice of the target values f n? 3. Given a target value, how should the minimization of g n in (3.8) (or the maximization of h n in (3.9)) be carried out? Should we approximate the global optimum of g n (or h n ) or compute a (possibly non-global) local minimum? The rst problem has not been investigated thoroughly. Experiments using cubic and thin plate splines on a few test functions suggest that one cannot say in general that one of them is better than the other. Experiments have not been tried yet for the other types. The choice of target values is crucial for the performance of the method. The interpretation of the two extremal cases has been noted in Section 3. Specically, the choice f n = min y2d s n (y) means that we trust our model and assume that the minimizer of s n is close to the global minimizer of f. In the other case, namely f n =?, we try to nd a point in a region that has not been explored at all. It may be best to employ a mix between values of f n that are suitable for convergence to a local minimizer and values that provide points in previously unexplored regions of the domain. Research is going on for the third question. We prefer to maximize h n, as this function is dened everywhere on D. It might seem strange that we consider computing the global optimum of h n, i.e. that a global optimization problem is replaced by another one. However, unlike f, h n can be evaluated quickly, and derivatives are available as well. Also we know roughly where the local maxima

25 A radial basis function method for global optimization 25 no. of local no. of global function dimension minima minima domain Branin [?5; 0] [0; 5] Goldstein-Price 2 4 [?2; 2] 2 Hartman [0; ] 3 Shekel [0; 0] 4 Shekel [0; 0] 4 Shekel [0; 0] 4 Hartman [0; ] 6 Table : Dixon-Szego test functions and their dimension, the domain and the number of local and global minima. of h n lie. Thus the maximization of h n is much easier than the minimization of f. In addition, as the problem (GOP) is very dicult under our assumptions, it would take too long to compute a global minimizer accurately. Therefore, from a practical point of view, we are interested in an approximate solution of (GOP). So it should suce to determine an approximation to the maximizer of (3.9). As far as the second option in question 3 is concerned, we have to nd a way to choose starting points or search regions in order to ensure fast convergence, which is still an open problem. Experiments show that large dierences between function values can cause the interpolant to oscillate very strongly. Thus its minimal value can be much below the least calculated function value. We have found in numerical computations that these ineciencies are reduced if large function values are replaced by the median of all available function values. Some experiments were performed using the test functions proposed by Dixon and Szego [2]. Table gives the name of each function, the dimension, the domain and the number of local and global minima in that domain. The maximization of (3.9) was carried out using a version of the tunneling method (Levy and Montalvo [3]). The target values f n are determined as follows. The idea is to perform cycles of N + iterations for some N 2 IN, where each cycle employs a range of target values, starting with a low one (global search), and ending with a value of f n that is close to min s n (y) (local search). Then we go back to a global search, starting the cycle again. The results that we report have been obtained using the following strategy. We choose the cycle length N = 5. Let the number of initial points be n 0, let the cycle start at n = ~n, and let the function values be

26 26 H.-M. Gutmann ordered, i.e. f(x ) : : : f(x n ). If f(x ) = f(x n ), the interpolant is a constant function, because we pick m 0, so the maximization of (3.9) is equivalent to the maximization of j= n j, if f n < f(x ). In this case, we choose f n =?. Otherwise, for ~n n ~n + N?, we set N? n + ~n f n = min y2d s n(y)? N 2 f(x (n) )? min s n(y) ; y2d, ~n + n ~n + N?. When n? n0 where (~n) = ~n and (n) = (n? )? N n = ~n + N we set f n = min s n (y). However, we have to be careful here since this choice is only admissible if x is not one of the global minimizers of s n. Thus, we do not accept this choice, if (f(x )? min s n (y)) < 0?4 jf(x )j, provided f(x ) is nonzero, or if f(x )? min y2d s n (y) < 0?4, if f(x ) = 0. In these cases, we set f n = min s n (y)? 0?2 jf(x )j and f n = min s n (y)? 0?2, respectively, to try to obtain a yet lower function value. We use thin plate splines as radial basis function, except for the Hartman 3 function where we take cubic splines. The algorithm is stopped when the relative error jf best? f j=jf j becomes smaller than a xed, where f is the global optimum and f best the current best function value. The optimal values of all test functions in Table are nonzero, so this stopping criterion is valid. Table 2 reports the number of function evaluations needed to achieve a relative error less than % and 0:0%. RBF denotes our method using the target value strategy described above. DIRECT (Jones, Perttunen and Stuckman [9]) and MCS (Multilevel Coordinate Search, Huyer and Neumaier [5]) are recent methods that, according to the results presented in those papers, are more ecient than most of their competitors on the Dixon-Szego testbed. DE (Dierential Evolution, Storn and Price [8]) is an evolutionary method that operates only at the global level, which explains the large number of function evaluations. EGO (Ecient Global Optimization, Jones, Schonlau and Welch [0]) is the latest method known to us. Unfortunately, no tests are reported on the Shekel test functions. All the results from these methods are quoted from the papers mentioned above. For the DIRECT method, numbers of evaluations for both the % and the 0:0% stopping criterion are reported. For DE and EGO only results for the % criterion are available, whereas MCS only uses the 0:0% criterion. It should be noted that MCS uses a local search method at some stages of the algorithm, and in all the cases of Table 2 the rst local minimum found is the global one. 7 Conclusions Our global optimization method converges to the global minimizer of an arbitrary continuous function f, if we choose the sequence of target values carefully. If f is

27 A radial basis function method for global optimization 27 error < % error< 0:0% Test function RBF DIRECT DE EGO RBF DIRECT MCS Branin Goldstein-Price Hartman Shekel { Shekel { Shekel { Hartman Table 2: Number of function evaluations for our method in comparison to DI- RECT, DE, EGO and MCS with two dierent stopping criteria. suciently smooth, there is even a suitable condition on this sequence that can be checked by the algorithm. However, it is unsatisfactory that the multiquadric and Gaussian cases are excluded from the statement of Theorem 7. It is believed that the convergence result is true also in these cases, although they are not covered by the analysis in [3]. Table 2 shows that the method is able to compete with other global optimization methods on the set of the Dixon-Szego test functions. The test functions in this testbed, however, are of relatively low dimension, and the number of local and global minima is very small. Therefore, it is necessary to test the method on other sets of test functions and of course on real-world applications. The relation to the P-algorithm is very interesting. It is hoped that the connections can be exploited further. In particular, the choice of the target values and the determination of the point of highest utility are common problems. Solutions to these problems may be developed that are useful for both methods. Acknowledgement I am very grateful to my supervisor, Prof. M.J.D. Powell, for his constant help and his guidance. Also, I would like to thank the German Academic Exchange Service for supporting this research with a doctoral scholarship (HSP III) and the Engineering and Physical Sciences Research Council for further support with a research grant.

28 28 H.-M. Gutmann References [] P. Alotto, A. Caiti, G. Molinari, and M.Repetto. A Multiquadrics-based Algorithm for the Acceleration of Simulated Annealing Optimization Procedures. IEEE Transactions on Magnetics, 32(3):98{20, 996. [2] L.C.W. Dixon and G.P. Szego. The Global Optimization Problem: An Introduction. In L.C.W. Dixon and G.P. Szego, editors, Towards Global Optimization 2, pages {5. North-Holland, Amsterdam, 978. [3] H.-M. Gutmann. On the semi-norm of radial basis function interpolants. In preparation. [4] R. Horst and P.M. Pardalos. Handbook of Global Optimization. Kluwer, Dordrecht, 994. [5] W. Huyer and A. Neumaier. Global optimization by multilevel coordinate search. Journal of Global Optimization, 4(4):33{355, 999. [6] T. Ishikawa and M. Matsunami. An Optimization Method Based on Radial Basis Functions. IEEE Transactions on Magnetics, 33(2):868{87, 997. [7] T. Ishikawa, Y. Tsukui, and M. Matsunami. A Combines Method for the Global Optimization Using Radial Basis Function and Deterministic Approach. IEEE Transactions on Magnetics, 35(3):730{733, 999. [8] D.R. Jones. Global optimization with response surfaces. Presented at the Fifth SIAM Conference on Optimization, Victoria, Canada, 996. [9] D.R. Jones, C.D. Perttunen, and B.E. Stuckman. Lipschitz Optimization Without the Lipschitz Constant. Journal of Optimization Theory and Applications, 78():57{8, 993. [0] D.R. Jones, M. Schonlau, and W.J. Welch. Ecient Global Optimization of Expensive Black-Box Functions. Journal of Global Optimization, 3(4):455{ 492, 998. [] H.J. Kushner. A Versatile Model of a Function of Unknown and Time Varying Form. Journal of Mathematical Analysis and Applications, 5:50{67, 962. [2] H.J. Kushner. A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise. Journal of Basic Engineering, 86:97{06, 964. [3] A.V. Levy and A. Montalvo. The Tunneling Algorithm for the Global Minimization of Functions. SIAM Journal on Scientic and Statistical Computing, 6():5{29, 985.

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics

UNIVERSITY OF CAMBRIDGE Numerical Analysis Reports Spurious Chaotic Solutions of Dierential Equations Sigitas Keras DAMTP 994/NA6 September 994 Department of Applied Mathematics and Theoretical Physics