Krzysztof Tesch. Continuous optimisation algorithms

Size: px

Start display at page:

Download "Krzysztof Tesch. Continuous optimisation algorithms"

Abigail Weaver
5 years ago
Views:

2 Krzysztof Tesch Continuous optimisation algorithms Gdańsk 16

3 GDAŃSK UNIVERSITY OF TECHNOLOGY PUBLISHERS CHAIRMAN OF EDITORIAL BOARD Janusz T. Cieśliński REVIEWER Krzysztof Kosowski COVER DESIGN Katarzyna Olszonowicz Published under the permission of the Rector of Gdańsk University of Technology Gdańsk University of Technology publications may be purchased at orders should be sent to No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system or translated into any human or computer language in any form by any means without permission in writing of the copyright holder. Copyright by Gdańsk University of Technology Publishers Gdańsk 16 ISBN GDAŃSK UNIVERSITY OF TECHNOLOGY PUBLISHERS Edition I. Publishing sheet 8,6, sheet printing 13,5, 117/97

4 Contents 1 Introduction Standard problem formulation Global and local minima Feasibility problem Example Classification of optimisation problems Classification of algorithms Hyperoptimisation Test functions Multimodal test function Unimodal test function Products Single-point, derivative-based algorithms 1.1 Introduction Classification Gradient and Hessian of a function Gradient Hessian Newton s method Modified Newton s method Method of steepest descent Quasi-Newton methods Secant method Other methods Conjugate gradient method Conditions for optimality Single-point, derivative-free algorithms Random variables and stochastic processes Selected random variables Discrete uniform distribution Continuous uniform distribution Normal distribution Lévy alpha-stable distribution Selected stochastic processes Wiener process Lévy flight

5 Contents 3. Random walk Uncontrolled random walk Domain controlled random walk Position controlled random walk Simulated annealing Random jumping Multi-point, derivative-free algorithms 37.1 Introduction (Meta)heuristic Nature-inspired algorithms Physics-based algorithms Gravitational search algorithm Bio-inspired algorithms Genetic algorithms Evolutionary algorithms Binary representation Floating-point representation Differential evolution Flower pollination algorithm Swarm intelligence based algorithms Particle swarm optimisation Accelerated particle swarm optimisation Firefly algorithm Bat algorithm Cuckoo search Constraints Unconstrained and constrained optimisation Lagrange multipliers The method Equality constraints Inequality constraints Equality and inequality constraints Box constraints Penalty function method Barrier method Variational calculus Functional and its variation Necessary condition for an extremum The Euler equation Constraints Classic problems Shortest path on a plane Brachistochrone Minimal surface of revolution Isoperimetric problem Geodesics Minimal surface passing through a closed curve in space Variational formulation of elliptic partial differential equations.... 8

6 Contents Variational method of finding streamlines in ring cascades for creeping flows Introduction Conservation equation in curvilinear coordinate systems Dissipation function and dissipation power Analytical solutions Dissipation functional Dissipation functional vs. equations of motion Streamlines Both ends constrained One end partly constrained One end unconstrained Summary Minimum drag shape bodies moving in inviscid fluid Problem formulation Fluid Resistance Drag force Pressure coefficients and its approximation Two-dimensional problem Three-dimensional problem Functional and Euler equation Exact pseudo solution Approximate solution due to the functional Approximate solution due to form of the function Approximate solution by means of a Bézier curve Summary Multi-objective optimisation Definitions Domination The Pareto set The Pareto front Scalarisation Method of weighted-sum Method of target vector Method of minimax SPEA Examples Two objective fitness functions of a single variable Analytical solution Single objective reconstruction of Pareto set Multi-objective SPEA Two objective fitness functions of two variables Analytical solution Single objective reconstruction of Pareto set Multi-objective SPEA Multi-objective description of Murray s law Introduction Multi-objective description

7 6 Contents 8 Statistical analysis Distributions Discrepancy Single-problem statistical analysis Multiple-problem statistical analysis D =, 1 evaluations D =, evaluations D =, evaluations D = 1, 1 evaluations Bibliography 178 A Codes 181 A.1 Single-point, derivative-free algorithms A. Multi-point, derivative-free algorithms A.3 Miscellaneous B AGA Advanced Genetic Algorithm 19 B.1 Brief introduction B. Detailed introduction B.3 I/O Console B. Script writing

8 Chapter 1 Introduction 1.1 Standard problem formulation If the objective function to be minimised is f : R D R then the standard (unconstrained) optimisation problem is min x f(x) = f (1.1) where a D-dimensional point is x = (x 1, x,..., x D ). What is more, x R D is also referred to as an independent variable. As a consequence, the general problem of unconstrained optimisation is the process of optimising (minimising or maximising) an objective function f in the absence of constraints on independent variable. The objective function may, however, be subjected to equality and inequality constraints g i (x) =, h j (x). (1.a) (1.b) Hence, the constrained optimisation problem is the process of optimising an objective function f in the presence of constraints on independent variable x. One can deal with a maximisation problem by negating the objective function max x f(x) = min ( f(x)). (1.3) The argument x of the minimum value of the objective function f is expressed as x = arg min f(x) (1.) x so that f = f(x ). The arg min operator (1.) is defined by x arg min x f(x) := {x : y f(y) f(x)} (1.5) and gives a point x R D or a set of points whereas min operator (1.1) gives the minimum value f R min x f(x) := {f(x) : y f(y) f(x)}. (1.6)

9 8 1. Introduction 1. Global and local minima Firstly, let us introduced the so called neighbourhood of a point x with radius r >, namely B(x, r) := {x : < x x < r}. (1.7) Consequently, a local minimum x is defined as a point for which x = arg min x B(x,r) f(x). (1.8) In other words, a point x is a local minimum of the objective function f if f(x ) f(x) for all x fulfilling < x x < r. A global minimum g is defined as a point for which g = arg min f(x) (1.9) x Ω R D or a point x is a local minimum of the objective function f if f(x ) f(x) for all x. If Ω = R D one deals with unconstrained optimisation. On the other hand, if Ω R D then it is constrained problem and Ω is a feasible region or simply search (optimisation) space. 1.3 Feasibility problem If there is no objective function f : R D R to be minimised or when the objective function values are the same for all x Ω, then the optimisation problem is called a feasibility problem. That is to say, any feasible point x Ω is an optimal solution. The feasibility problem is also referred to as the satisfiability problem. The feasible region Ω is a set of points that satisfies all constraints (discussed in chapter 5), namely equality and inequality constraints g i (x) =, h j (x) (1.1a) (1.1b) or Ω = {x : g i (x) =, h j (x) }. (1.11) 1. Example Let us consider the following two-dimensional objective function or the so called sphere function f(x) := x i. (1.1) For the sake of simplicity, we assume first a discrete search domain Ω = {x 1, x, x 3 } where x 1 = (1, ), x = (3, 1), x 3 = (1, ). Consequently, according to equation i=1

10 1.5. Classification of optimisation problems 9 (1.1), values of the objective function are f(x 1 ) = 5, f(x ) = 1, f(x 3 ) = 1. Thus the best solution (the argument of the minimum value of f) is g = arg min x j Ω f(x j) = arg min{f(1, ), f(3, 1), f(1, )} = (1, ) (1.13) and the minimum value f = f(g) f = min x j Ω f(x j) = min{f(1, ), f(3, 1), f(1, )} = min{5, 1, 1} = 1. (1.1) Next, let us consider a continuous search domain Ω = R, meaning that our problem is unconstrained. A two-dimensional plot of equation (1.1) is shown in figure.. Obviously, the argument of the minimum value of f is and the minimum value f = f(g) = f(, ) g = arg min f(x) = (, ) (1.15) x R f = min f(x) = f(, ) =. (1.16) x R 1.5 Classification of optimisation problems Generally speaking, various optimisation problems can be loosely classified as follow based on: Objective function Single objective. In the case of single objective optimisation we deal with only one objective function. Most of the presented problems are single objective. Multi-objective. More than one objective function is simultaneously minimised. Importantly, the considered objective functions should be in conflict. Typically, multi-objective optimisation gives as a results set of solutions. Chapter 7 deals with multi-objective optimisation. Modality Unimodal. A problem (function) is unimodal if there is only one local minimum. Single-point, derivative-based algorithms (chapter ) are particularly suitable for such problems. Figure 1. shows an example of unimodal function. Multimodal. A problem (function) is multiunimodal if there are more than one local minima. This problem is not suitable for derivative based algorithms. Derivative-free algorithms (chapter 3 and ) are able to deal with multimodal functions better. Figure 1.1 shows a multimodal function. Linearity Linear. The objective function is linear together with constraints, if any. Nonlinear. The objective function or the constraints are nonlinear, if any or both of them. All of the presented algorithms are suitable for nonlinear functions. Variable type

11 1 1. Introduction Continuous. The optimisation variables are continuous (continuous sets of real numbers). All of the presented algorithms are suitable for continuous variables. Discrete. The optimisation variables are discrete (integer numbers). Mixed. Combination of the two above. For instance, one variable is continuous and the second discrete. Constraints Constrained. The process of minimising the objective function f in the presence of constraints on independent variable. We can distinguish equality and inequality constraints. Figures 5.1, 5. and 5.3 display examples of constrained functions. Unconstrained. The process of minimising the objective function f in the absence of constraints on independent variable. 1.6 Classification of algorithms Optimisation algorithms can be divided based on: Derivative Derivative-based. Derivative-based algorithms require first or second derivatives of the objective functions. Ideally, the objective function should be twice differentiable. Derivative-based algorithms (chapter ) are regarded as classical optimisation algorithms suitable for unimodal problems. Derivative-free. Derivative-free algorithms do not require derivatives of the objective functions. Moreover, the objective function does not have to be continuous. Point Single-point. Single-point algorithms (chapter and 3) process a single point iteratively, constantly modifying and improving it. Multi-point Sequential. Algorithms process single point sequentially. Typically, there is no exchange of information. Parallel. Algorithms process many points in parallel in order to communicate and exchange information (chapter ). Randomness Deterministic. Algorithms comprise only known parameters. There is no uncertainty and randomisation. Stochastic. Randomisation through stochastic variables is introduced in order to efficiently explore the feasible region. Hybrid. Combination of the two above. Globality Local. Derivative-based algorithms are typically local optimisation algorithms (chapter ) unless the objective function is unimodal. Global. Single-point (chapter 3) and multi-point, derivative free algorithms (chapter ) are considered as global optimisation algorithms.

12 1.7. Hyperoptimisation Hyperoptimisation Hyperoptimisation or metaoptimisation is regarded as optimisation of optimisation algorithms. It is also referred as tuning. Parameter tuning may be relevant in order to improve the performance of stochastic methods in terms of minimisation of the number of iteration, for instance. It is obvious that poor set of parameters can decrease the performance of an algorithm. Ideally, the properly tuned algorithms should be able to solve the whole variety of different problems, or at least a given set of problems, with very good performance. What is important is the performance measure being utilised during the tuning. The obvious choice, however, not the only one, is the number of iteration of a tuned algorithm or more generally a computational cost. Hyperoptimisation is by no means a trivial problem. At least two approaches to this problem are considered []: configuring an algorithm by choosing optimal parameter and analysing an algorithm by studying how its performance depends on its parameters. Also two types of parameters are considered, i.e., qualitative (e.g. type of binary vs floating-point representation) and quantitative (e.g. values of crossover probability), to make the whole problem even more complicated. Except for parameter tuning, discussed above, there is also the so called parameter control problem when parameters undergo changes while algorithm is running. 1.8 Test functions Two simple functions are introduce here in order to evaluate graphically characteristics of discussed algorithms. More complicated test functions, typically used as benchmarks, are discussed in chapter 8. These include, among others, unimodal, multimodal, composition, separable and non-separable functions z x 3 y y x 3 Figure 1.1: Multimodal test function

13 1 1. Introduction Multimodal test function The multimodal test function given by f(x, y) := 5 sin x sin y sin 7x sin 7y (1.17) is a two-dimensional, nonlinear function. It is shown in figure 1.1. There are several local minima. The search space Ω is Ω = { (x, y) : (x, y) [; ] }. (1.18) It is also regarded as box constraint set. The global minimum value of the function (1.17) is min f(x, y) = 6. (1.19) (x,y) Ω The argument of the minimum value 6 of the function (1.17) is located at the centre of the search space Ω ( arg min f(x, y) = (x,y) Ω, ), (1.) The multimodal function (1.17) is utilised in order to evaluate graphically characteristics of derivative-free algorithms z 1 3 x 3 y y x 3 Figure 1.: Unimodal test function 1.8. Unimodal test function The unimodal test function given by f(x, y) := (x 1) + (y 1) (1.1) is a two-dimensional, nonlinear function. It is shown in figure 1.. There is only one global minimum. The search space Ω (box constraint set) is Ω = { (x, y) : (x, y) [; ] }. (1.)

14 1.9. Products 13 The global minimum value of the function (1.1) is min f(x, y) = (1.3) (x,y) Ω and the argument of the minimum value of the function (1.1) is located at arg min (x,y) Ω f(x, y) = (1, 1). (1.) The unimodal function (1.1) is utilised in order to evaluate graphically characteristics of derivative-based algorithms. 1.9 Products There are several products of two vectors commonly met in optimisation. include: These Dot product. The dot product, denoted as, of two vectors x, y of the same size is a scalar. It is defined as x y = D x i y i. (1.5) The dot product is also referred to as the inner or scalar product. What is more, the dot product is commutative, meaning that x y = y x. Dyadic product. The dyadic product, denoted with no multiplication signs, of two vectors x, y of the same size is a matrix. It is defined as x 1 y 1 x 1 y... x 1 y D x 1 y x y... x y D xy = (x i y j ) = (1.6) x 1 y D x y D... x D y D i=1 The dyadic product is also referred to as the outer or tensor product. If the first vector is an operator, such as gradient, or vectors are not of the same size then the dyadic product is not commutative. Hadamard product. The Hadamard product, denoted as, of two vectors x, y of the same size is a vector where each element is the product of elements of the creating two vectors. It is defined as x y = (x i y i ) = (x 1 y 1,..., x D y D ). (1.7) Conse- The Hadamard product is also referred to as the entrywise product. quently, Hadamard product is commutative, i.e., x y = y x.

15 Chapter Single-point, derivative-based algorithms.1 Introduction.1.1 Classification Single-point, derivative-based algorithms can be divided into three main groups based on information about derivatives necessary in order to find a minimum of the objective function, namely: Newton s and modified Newton s method Method of steepest descent Quasi-Newton methods Secant method Other methods (DFP, BFGS) Conjugate gradient method Newton s method and its modified version use the gradient (first derivatives) and the Hessian matrix (second derivatives) of the objective function while method of steepest descent does not. Other quasi-newton methods try to approximate the Hessian matrix and can be regarded as a certain generalisation of the secant method..1. Gradient and Hessian of a function.1..1 Gradient For differentiable and scalar functions of several variables f : R D R the gradient is the vector whose components consists of partial derivatives of f ( ) ( ) f f f f :=,..., =. (.1) x 1 x D x i The gradient can be also regarded as a vector field pointing the direction in which the function f displays the largest rate of increase. Apart from direction, the magnitude of the gradient f determines rate of change towards that direction.

16 .1. Introduction 15 If the gradient cannot be determined analytically, the finite difference approximation of the first order partial derivatives are used instead. The central difference of f x i, being second order accurate, is then f f(..., x i + h,...) f(..., x i h,...) x i h (.) where h is a small, fixed differentiation step size. Alternatively, relative step size ε can be assumed, resulting in { ε x if x > ε, h = (.3) ε if x ε. If the function f : R R is two-dimensional then the central differences approximations simplify to f x f y f(x + h, y) f(x h, y), (.a) h f(x, y + h) f(x, y h). (.b) h This is, however, true for the same step size towards x and y directions..1.. Hessian For twice differentiable scalar functions of several variables f : R D R the Hessian matrix is a square matrix whose components consists of second order partial derivatives of f. Provided that the second order derivatives are continuous, the Hessian matrix is symmetric f f x 1 x... f ( ) f H := = x i x j x 1 f x 1 x. f x 1 x D f x. f x x D... x 1 x D f x x D f x D. (.5) The finite difference approximation of the second order partial derivatives can be used in order to evaluate the Hessian matrix. The symmetric difference of f, being x i second order accurate, is f x i and of mixed derivatives f(..., x i + h,...) f(..., x i,...) + f(..., x i h,...) h (.6) f x i x j respectively f f(..., x i + h, x j + h,...) + f(..., x i h, x j h,...) x i x j h f(..., x i h, x j + h,...) f(..., x i h, x j + h,...) h. (.7)

17 16. Single-point, derivative-based algorithms If the function f : R R is two-dimensional then the symmetric differences approximations simplify to f f(x + h, y) f(x, y) + f(x h, y) x h, (.8a) f f(x, y + h) f(x, y) + f(x, y h) y h, (.8b) f f(x + h, y + h) + f(x h, y h) f(x h, y + h) f(x + h, y h) x y h. (.8c) Again, this is true for the same step size towards x and y directions.. Newton s method The idea behind Newton s method is to approximate f by a quadratic function around x at each iteration. Subsequently, an attempt to minimise that approximation is undertaken. Let us consider one-dimensional function f : R R first. Assuming that f has continuous derivatives over certain interval, the Taylor expansion is used f(x + x) = m 1 n= d n f(x ) n! + dm f(c) m! (.9) where x = x + x, c = x + θ x and θ ]; 1[. The above equation may also be written as f(x) = f(x ) + f (x ) x + 1 f (x ) x f (c) x 3. (.1) The third derivative is evaluated at the unknown point c. Discarding (truncating) the last term one gets a quadratic approximation to f f(x) f(x ) + f (x ) x + 1 f (x ) x. (.11) A necessary conditions for optimality of f is f (x) =. Differentiating the above equation with respect to x or x = x x and taking advantage of the necessary condition, one gets = f (x ) + f (x ) x. (.1) Solving the above for x it is possible to provide the following equation x = f (x ) f (x ). (.13) Finally, an iterative sequence can now be constructed in order to get better approximation x n+1 to the equation f (x) = x n+1 = x n f (x n ) f (x n ). (.1)

18 .3. Modified Newton s method 17 Following the same line of reasoning to functions of several variables f : R D R we have an equivalent of equation (.11) f(x) f(x ) + f(x ) x + 1 x H(x ) x. (.15) A necessary conditions for optimality of f is now f(x) = or f( x) =. Differentiating equation (.15) with respect to x or x and taking advantage of the necessary condition, we have an equivalent of equation (.1) It is possible now to solve the above equation for x = f(x ) + H(x ) x. (.16) x = H 1 (x ) f(x ) (.17) and provide the following iterative scheme equivalent to (.1) x n+1 = x n H 1 (x n ) f(x n ). (.18) The structure of Newton s method is shown in listing.1. The algorithms stops when x n+1 x n ε max, i.e., the difference between the previous and current solution is below an assumed accuracy ε or maximum number of iterations n max is reached. Input: α n, n max, ε max, x Output: x 1 n := ; repeat 3 x n+1 := x n H 1 (x n ) f(x n ); ε := x n+1 x n ; 5 n := n + 1; 6 until n < n max and ε ε max ; 7 x := x n 1 ; Algorithm.1: Newton s method pseudocode Newton s method does not only take advantage of the maximal direction of change as method of steepest descent does, discussed further. It also corrects search direction by weighting gradients with the Hessian matrix inverse. This means that it directs the search towards to the minimum rather than towards maximal direction of change. Furthermore, this is possible because of second order derivatives. There is, however, a drawback of Newton s method, namely the cost of additional function evaluations. Furthermore, this method converges for initial points close to the optimal value. What is more, the Hessian matrix H has to be positive definite otherwise the method can be divergent..3 Modified Newton s method One possible approach to generalising Newton s method is the relaxation factor α k which can control the step size x n+1 = x n α n H 1 (x n ) f(x n ). (.19)

19 18. Single-point, derivative-based algorithms The value of relaxation factor can be determined by the solution of one dimensional optimisation problem α n = arg min α f(x n αh 1 f(x n )). (.) The one-dimensional equivalent of equation (.19) is x n+1 = x n α n f (x n ) f (x n ). (.1) For α n := 1 we have equation (.1). The relaxation factor α k can be either constant α ]; 1] or adjustable. The structure of modified Newton s method is shown in listing.. Input: α n, n max, ε max, x Output: x 1 n := ; repeat 3 α n = arg min α f(x n αh 1 f(x n )); x n+1 := x n α n H 1 (x n ) f(x n ); 5 ε := x n+1 x n ; 6 n := n + 1; 7 until n < n max and ε ε max ; 8 x := x n 1 ; Algorithm.: Modified Newton s method pseudocode. Method of steepest descent Method of steepest descent is also know as method of gradient descent. Steepest descent method directs the search towards maximal direction of change, i.e., towards the direction of the negative gradient. Therefore, it is enough to set H(x n ) := δ in equation (.19) x n+1 = x n α n f(x n ). (.) Following the same logic f (x n ) := 1 it is possible to obtain the one-dimensional equivalent of the above equation x n+1 = x n α n f (x n ). (.3) As previously, the step size α k can be either constant α ]; 1] or adjustable. The actual value of it can be determined by the solution of one-dimensional optimisation problem α n = arg min f(x n α f(x n )). (.) α The structure of steepest descent method is shown in listing.3 for adjustable step size α n. For constant α it is enough to replace line 3 with α n := α.

20 .. Method of steepest descent 19 Input: n max, ε max, x Output: x 1 n := ; repeat 3 α n := arg min α f(x n α f(x n )); x n+1 := x n α n f(x n ); 5 ε := x n+1 x n ; 6 n := n + 1; 7 until n < n max and ε ε max ; 8 x := x n 1 ; Algorithm.3: Steepest descent method pseudocode Method of steepest descent may have a poor tendency near the optimal value, since the closer to the minimum the smallest the gradients or step sizes become. This is especially true for constant step size α because there are no additional information in order to correct the direction and the step size of the next iteration. Figure.1 displays 9 evaluations of Newton s method and iterations of steepest descent with constant α =.15. For the latter approach it was not possible to reach the optimal value. However, only 1 iterations of steepest descent with adjustable α n according to equation (.) was necessary to reach the optimal value within the ε max := 1 5 accuracy. As for the modified Newton s method with adjustable α n according to equation (.) only iterations are necessary y Newton Steepest descent α Steepest descent αn x 3 Figure.1: Newton s method vs steepest descent

21 . Single-point, derivative-based algorithms.5 Quasi-Newton methods.5.1 Secant method The secant method for one-dimensional optimisation approximates the second derivative of Newton s equation (.1) by means of the first order accurate backward finite difference f (x n ) f (x n ) f (x n 1 ) x n x n 1. (.5) By that means, an iterative sequence has the following form x n+1 = x n f x n x n 1 (x n ) f (x n ) f (x n 1 ). (.6) It has to be noted that two starting values of f (i.e. f (x n ) and f (x n 1 )) are needed in comparison with Newton s method equation (.1). Given that one can store previously evaluated f (x n 1 ), it can hardly be regarded as a drawback of this method..5. Other methods Many quasi-newton methods and optimisation method in general consist of two steps, namely a direction d n formulation with the step size α n and the following update formula x n+1 := x n + α n d n. (.7) In other words, a sequence of points (x n ) n= is created, hopefully leading to an optimal value. The infinite sequence is truncated if x n+1 x n ε max, i.e., the two subsequent points are close enough. Ideally, an initial point x should be located close to the optimal value in order to ensure convergence. Assuming d n := f(x n ) in equation (.7) it is possible to obtain steepest descent method equation (.). If d n := H 1 (x n ) f(x n ) (.8) then one receives modified Newton s method according to equation (.19), which is the starting point for quasi-newton methods. These methods make an attempt to approximate the inverse of Hessian matrix that is now denoted as M n. Thus the direction is now d n := M n f(x n ). (.9) Furthermore, the adjustable step size α n is determined by the solution of one-dimensional optimisation α n = arg min α f(x n + αd n ). (.3) The structure of general quasi-newton algorithm is shown in listing.. Quasi-Newton methods do not take advantage of the explicit use of the Hesian matrix H n or its inverse. The subsequent approximation of H 1 n is used by means

22 .6. Conjugate gradient method 1 Input: n max, ε max, x, M Output: x 1 n := ; repeat 3 d n := M n f(x n ); α n := arg min α f(x n αd n ); 5 x n+1 := x n α n d n ; 6 Calculate f(x n+1 ); 7 Update M n+1 ; 8 ε := x n+1 x n ; 9 n := n + 1; 1 until n < n max and ε ε max ; 11 x := x n 1 ; Algorithm.: General quasi-newton method pseudocode of M n instead. Typically, M := δ. The DFP method (Davidon-Fletcher-Powell) updates M n by means of the following equation M n+1 = M n + x n x n x n w n (M n w n ) (w n M n ) w n M n w n (.31) where x n := x n+1 x n and w n := f(x n+1 ) f(x n ). The above update uses subsequents gradients. The same concerns the BFGS method (Broyden-Fletcher- Goldfarb-Shanno). This time M n is updated by ( M n+1 = δ x ) ( nw n M n δ w ) n x n + x n x n. (.3) x n w n x n w n x n w n.6 Conjugate gradient method The first step of the conjugate gradient method (listing.5) is simply steepest descent method, i.e. d := f(x ). The adjustable step size α n in update formula (.7) is calculated according to one-dimensional optimisation in equation (.). Subsequent iterations include additional term in update formula, namely β n d n. Both, gradient descent direction and the additional term are referred as the conjugate direction d n+1 d n+1 := f(x n+1 ) + β n d n. (.33) Similarly to the first step, the adjustable step size α n is obtained as a result of onedimensional optimisation α := arg min α f (x n + α ( f(x n+1 ) + β n d n )). (.3) The most popular choice of β n is due to Fletcher and Reeves β n := f(x n+1) f(x n+1 ). (.35) f(x n ) f(x n )

23 . Single-point, derivative-based algorithms Input: n max, ε max, x Output: x 1 n := ; d n := f(x n ); 3 repeat α := arg min α f(x n + αd n ); 5 x n+1 := x n + α n d n ; 6 Calculate β n ; 7 d n+1 := f(x n+1 ) + β n d n ; 8 ε := x n+1 x n ; 9 n := n + 1; 1 until n < n max and ε ε max ; 11 x := x n 1 ; Algorithm.5: Conjugate gradient method pseudocode.7 Conditions for optimality A necessary conditions for optimality of twice continuously differentiable function f : R D R in unconstrained optimisation problems is f(x ) =. (.36) Point x or points, if any, are called stationary points or critical points. The necessary condition (.36) results in a set of typically nonlinear algebraic equations. Sufficient conditions for optimality of f : R D R in unconstrained optimisation problems need to examine the Hassian matrix at stationary points ( ) f(x ) H(x ) :=. (.37) x i x j This is because at stationary points we can localise a minimum, maximum or a neither of those. To be more precise, the eigenvalues of the Hessian matrix at the stationary point need to be examined. The determinant H(x ) λδ = (.38) results in characteristic polynomial with D roots (eigenvalues) λ i. At x we have minimum if H(x ) is positive definite (all λ i > ) minimum or saddle point if H(x ) is positive semi-definite (all λ i and at least one λ i = ) maximum if H(x ) is negative definite (all λ i < ) maximum or saddle point if H(x ) is negative semi-definite (all λ i and at least one λ i = ) saddle point. If H(x ) is indefinite (certain λ i > and certain λ i < ) Alternatively, the Hessian matrix is positive definite if all the subdeterminants

24 .7. Conditions for optimality 3 (principal minors) H n (x ) := f(x ) x 1 f(x ) f(x ) x 1 x x. f(x ) x 1 x... f(x ) f(x ) x 1 x n. x x n... f(x ) x 1 x n f(x ) x x n f(x ) x n (.39) for n {1,..., D} are positive, i.e. n {1,...,D} H n (x ) >. (.) However, the above criterion cannot be used in order to verify whether the Hessian matrix if positive semi-definite z 8 1 x y y x 1 Figure.: f(x, y) := x + y plot The two succeeding examples are provided in order to illustrate the condition for optimality. Let us consider first a two-dimensional function f : R R given by the following equation f(x, y) := x + y. (.1) The necessary conditions for optimality f = results in a stationary point (, ). Now, the Hessian matrix is ) ( ) H := =. (.) ( f x f x y f x y f y The next step is to examine the eigenvalues of the Hessian matrix. This leads to the determinant H(x ) λδ = (.3) or λ λ = (.)

25 . Single-point, derivative-based algorithms resulting in characteristic polynomial ( λ) =. The solutions are λ 1 = λ =, i.e. H(, ) is positive definite. There is a local minimum at (, ), see figure.. Let us now consider a two-dimensional function f : R R given by the following equation f(x, y) := x y. (.5) The necessary conditions for optimality f = results in exactly the same stationary point (, ) as previously. However, the Hessian matrix is different H := ( f x f x y f x y f y ) = ( ). (.6) Examining the eigenvalues of the Hessian matrix, we have the determinant H(x ) λδ = (.7) or λ λ = (.8) resulting in characteristic polynomial λ =. The solutions are λ = ±, i.e. H(, ) is indefinite. There is a saddle point at (, ), see figure z 1 x y y x 1 Figure.3: f(x, y) := x y plot

26 Chapter 3 Single-point, derivative-free algorithms 3.1 Random variables and stochastic processes Selected random variables A random variable X is a function X : Ω R from the set of elementary events Ω to the set of real numbers R provided that a set {ω Ω : X(ω) < x} is an elementary event. By a random variate one understand the realisations of a random variable, i.e., random outcomes according to a probability distribution function of the random variable. The set of realisation X(Ω) := {X(ω) : ω Ω} is called a set of values of the variable X. There are two types of random variables, namely discrete and continuous. The former takes finite or countable list of values associated with probability mass function whereas the latter takes any numerical value associated with probability distribution function Discrete uniform distribution The discrete uniform distribution is given in table 3.1. The finite number n of values x i are equally probable with probability 1 n. Furthermore, the probability mass function for n = 5 is shown in figure 3.1 which is also referred as a histogram. Table 3.1: Discrete uniform distribution x i x 1... x n p i 1 n... 1 n The expected value of the discrete uniform distribution is E X = 1 x i =: µ (3.1) n i

27 6 3. Single-point, derivative-free algorithms whereas variance D X = 1 n n (x i µ). (3.) i=1 Any particular realisation or simply random variate of the discrete uniform distribution is denoted as U{x 1, x n }. We have U{x 1, x n } {x 1, x,..., x n } with equal probability 1 n a = ; b = 1 a = b =.8 pi f(x).6. a = b = x i 6 6 x Figure 3.1: Probability mass function of a discrete uniform distribution Figure 3.: Probability density function of continuous uniform distributions Continuous uniform distribution The continuous uniform distribution is given by the following probability distribution function { 1 f(x) := b a, a x b; (3.3), otherwise which is shown in figure 3. for various a and b. The expected value of the continuous uniform distribution is E X = + x f(x) dx = a + b =: µ (3.) and variance + D X = (x µ) f(x) dx = 1 1 (b a). (3.5) Any particular realisation or random variate of the continuous uniform distribution is denoted as U(a, b) or for more than one dimension U(a, b). For a standard continuous uniform distribution, denoted as U(, 1), we have E X = 1 and D X = Normal distribution The normal uniform distribution is given by the following probability distribution function f(x) := 1 e (x µ) σ (3.6) σ

28 3.1. Random variables and stochastic processes 7 which is shown in figure 3.3 for various σ and µ =. The expected value of the normal distribution is E X = µ and variance D X = σ. Any particular realisation or random variate of the normal distribution is denoted as N (µ, σ ) or for more than one dimension N (µ, σ ). The standard normal distribution is denoted as N (, 1), for which the expected value E X = and variance D X = 1. f(x) σ =.75 σ = 1 σ =.6.. f(x) α = 1 α = 1.5 α = x x Figure 3.3: Probability density function of normal distributions Figure 3.: Probability density function of symmetrical Lévy stable distributions Lévy alpha-stable distribution The Lévy alpha-stable distribution is the four parameters family of distributions. These are α stability parameter, β skewness parameter, µ location parameter, γ scale parameter. The probability distribution function f(x, α, β, µ, γ) can be expressed analytically only for selected group of parameters. It is possible to provide the expected value E X = µ when µ > 1 and variance D X = γ when α =. When β = µ = the Lévy alpha-stable distribution is know as the symmetrical Lévy stable distribution L α,γ with the following probability distribution function f(x, α, γ) := 1 e γyα cos yx dy. (3.7) As the Lévy distributions are difficult to deal with both analytically and numerically, the following approximation of L α,γ can be used [16] L α,σ := X Y 1 α (3.8) where Y is a random variable with the standard normal distribution and X is a random variable with the normal distribution with µ = and the standard deviation σ given by σ α := Γ(1 + α) sin α α 1. (3.9) )α Γ( 1+α

29 8 3. Single-point, derivative-free algorithms Any particular realisation or random variate of the symmetrical Lévy stable distribution is denoted as σ N (, 1) L(α, σ) :=. (3.1) N (, 1) 1 α 3.1. Selected stochastic processes A real value function X : T Ω R is a random function provided that a set {ω Ω : X(t, ω) < x} is an elementary event. For a fixed t the function X is a random variable X t sometimes denoted as X(t), X t (ω) or even X(t, ω). A stochastic process is a set of random variables X t depending on one parameter, typically time t {X t : t T }. (3.11) If a set T is countable, i.e. T := {1,,...} then the stochastic process (3.11) can be regarded as a stochastic series (x n ) n= y x Figure 3.5: Wiener process realisations Wiener process The Wiener process is an example of a continuous time stochastic process and is characterised by the following properties: W () = with probability one. If < t 1 < t < t 3 < t < τ then W (t ) W (t 1 ) and W (t ) W (t 3 ) are independent. If < t 1 < t < τ then W (t ) W (t 1 ) t t 1 N (, 1), meaning that the difference W (t ) W (t 1 ) is a random variable with the normal distribution with µ = and variance t t 1, i.e. N (, t t 1 ). A method of increments summing is applied to discrete approximation of the continuous Wiener process, namely dw = t N (, 1) (3.1)

30 3.1. Random variables and stochastic processes 9 where t = τ/n max. In order to form a D-dimensional Wiener process, a limited sequence of points is created (x n ) nmax n=, where x n+1 := x n + α ɛ (3.13) where the random vector ɛ is drawn from the standard normal distribution vector ɛ := N (, 1). (3.1) The scale coefficient α, obviously, is α := τ t =. (3.15) n max Figure 3.5 displays an example realisation of the Wiener process y y x x Figure 3.6: Lévy flight realisations Lévy flight The Lévy flight or in fact the Lévy alpha-stable walk is another example of a continuous time stochastic process and is characterised by the following properties: X() = with probability one. If < t 1 < t < t 3 < t < τ then X(t ) X(t 1 ) and X(t ) X(t 3 ) are independent. If < t 1 < t < τ then X(t ) X(t 1 ) (t t 1 ) 1/α L(α, 1), meaning that the difference X(t ) X(t 1 ) is a random variable with the symmetrical Lévy stable distribution with the scale parameter (t t 1 ) 1/α, i.e. L(α, (t t 1 ) 1/α ). As previously, a method of increments summing is applied to discrete approximation of the Lévy flight, i.e. dx = t 1 α L(α, 1). (3.16) A limited sequence of points is created (x n ) nmax n= in order to form a D-dimensional Lévy flight x n+1 := x n + α n ɛ. (3.17)

31 3 3. Single-point, derivative-free algorithms This time, however, the random vector ɛ is drawn from the symmetrical Lévy stable distributions σ N (, 1) ɛ = L(α, σ) :=. (3.18) N (, 1) 1 α The scale coefficient α n, not to be confused this time with the stability parameter α, is ( ) 1 α n := t 1 τ α α =. (3.19) n max Figure 3.6 (left side) displays an example realisation of the Lévy flight. Long jumps are typically parallel to either x or y axis. Simultaneous long jumps are hardy probable. In order to simulate such jumps, one can propose the following random vector where σ N (, 1) ɛ := N (, 1) 1 α ε ε (3.) ε := N (, 1). (3.1) However, the proposed random process shown in figure 3.6 (right side) is not a strict Lévy flight. 3. Random walk 3..1 Uncontrolled random walk A sequence of points (x n ) nmax n= is randomly generated in a similar manner to the Wiener process, given by equation (3.13), x n+1 := x n + α ɛ. (3.) The random vector is drawn from the standard normal distribution for every coordinate N (, 1) and the step size α(u i L i ) accounts for the search domain size. Lower and upper domain constraints are denoted as U i and L i respectively. Comparing the Wiener process step size (3.15) with the uncontrolled random walk version one can observe the difference. This is because it does not depend on the maximum step number n max but on the search domain size instead. In vector notation we have One possible form of the α constant could be ɛ := (U L) N (, 1). (3.3) α := 1 D nmax. (3.) Furthermore, the whole step size α(u i L i ) can be also regarded as the standard deviation or the square root of variance of N (, (α(u i L i )) ). Finally, the update formula is x n+1 := x n + U L N (, 1). (3.5) D nmax

32 3.. Random walk 31 The algorithm is shown in listing 3.1. As there is no control whether the random walk is within the search domain the method is called the uncontrolled random walk. The last line of algorithm 3.1 stores the currents best point eventually becoming global best solution when a maximum number of evaluations is reached. Figure 3.7 displays an example realisation of an uncontrolled random walk for n max = 1. Input: α, n max, L, U Output: g 1 g := x := L + (U L) U(, 1); for n := 1 to n max 1 do 3 ɛ := (U L) N (, 1); x := x + αɛ; 5 g := arg min {f(g), f(x)}; Algorithm 3.1: Uncontrolled random walk pseudocode y x 3 Figure 3.7: 1 evaluations of uncontrolled random walk 3.. Domain controlled random walk The domain controlled random walk is a natural extension of its uncontrolled version. In order to control whether the random walk is within the search domain Ω, next steps x n+1 are only accepted if x n+1 Ω { x n+1 x n+1, if x n+1 Ω; := x n, if x n+1 (3.6) / Ω. This approach does not introduce additional function evaluations f(x n+1 ) as only positions are checked. Analogously, the update formula is given by equation (3.5). The algorithm is shown in listing 3..

33 3 3. Single-point, derivative-free algorithms Input: α, n max, L, U Output: g 1 g := x := L + (U L) U(, 1); for n := 1 to n max 1 do 3 ɛ := (U L) N (, 1); while y / Ω do 5 y := x + αɛ; 6 x = y; 7 g := arg min {f(g), f(x)}; Algorithm 3.: Domain controlled random walk pseudocode Figure 3.8 displays an example realisation of a domain controlled random walk for n max = 1. This can be compared with figure y x 3 Figure 3.8: 1 evaluations of domain controlled random walk 3..3 Position controlled random walk A sequence of random points (x n ) nmax n= is generated in a different manner in comparison to the Wiener process (3.13) or uncontrolled random walk (3.). First of all, a temporary point y is created y := g + α ɛ. (3.7) Then the next point x n+1 of a sequence is accepted only if the objective function f(y) is lower than of the predecessor f(x n ) { x n+1 y, if f(y) < f(g); := x n (3.8), otherwise. Thus, the predecessor is always regarded as a current global best g. Otherwise the predecessor is preserved and the next new point is generated randomly. Moreover,

34 3.. Random walk 33 the step constant α is given by the following equation α := 1 1 D, (3.9) being one among many possibilities. Ultimately, the update equation is now x n+1 := g + U L 1 N (, 1). (3.3) D y x 3 Figure 3.9: 1 evaluations of position controlled random walk The position controlled random walk is one of the simplest global optimisation, nature inspired algorithms. It is shown in listing 3.3. Input: α, n max, L, U Output: g 1 g := x := L + (U L) U(, 1); for n := 1 to n max 1 do 3 ɛ := (U L) N (, 1); x := g + αɛ; 5 if f(x) f(g) < then g := x; Algorithm 3.3: Position controlled random walk pseudocode Figure 3.9 displays an example realisation of a position controlled random walk for n max = 1. The solid polyline represents the sequence of points forming the optimisation path. Separate points depict probing the search domain, when the condition f(y) < f(g) was not satisfied.

35 3 3. Single-point, derivative-free algorithms y x 3 Figure 3.1: 1 evaluations of simulated annealing 3.3 Simulated annealing Simulated annealing [1] is somewhat similar to the position controlled random walk given by equation (3.7) y := x n + α (U L) N (, 1). (3.31) The only difference is that the predecessor x n does not have to be better than a temporary point y in terms of the objective function value. The function difference is customarily used as an improvement indicator := f(y) f(x n ). (3.3) In this way, the next point is always accept if <. Alternatively, it may also be accepted if > with certain probability p { x n+1 y, if < or p > U(, 1); := x n (3.33), otherwise. The probability, however, has to fulfill at least two conditions. It should decrease as the algorithm progresses. Furthermore, it should also decrease as increases. In order to fulfill these conditions, the Boltzmann distribution is taken under consideration, or in fact the ratio of a Boltzmann distribution for two states, p e E kt (3.3) as it describes distribution of particle energy differences E over various states. It is also loosely connected with the transition of a physical system or in this case annealing, i.e. slow cooling of metals with temperature T. Slow cooling assumptions allows for another simplification, namely, equilibrium state at all times which leads

36 3.3. Simulated annealing 35 to a minimum energy configurations of particles. Further, the Boltzmann s constant is assumed to be k = a and energy difference E a. (3.35) Thus, the probability p of acceptance of worse solution is now given by the following approximation p e T (3.36) and the next point may be accepted if > and e T > U(, 1). (3.37) Proportion (3.36) fulfill the condition that p decrease as increases. In order to implement the remaining requirement, i.e. p should decrease as the algorithm progresses, it is necessary to introduce the so called cooling schedule There are several possibilities, for instance T n+1 T n. (3.38) T n+1 :=T n δ n, T n+1 :=T n δ 1/nmax, T n+1 :=T δ (3.39a) (3.39b) (3.39c) where δ is another constant of the algorithm together with the step size constant α and initial temperature T. The cooling rate (3.39), controlled by constant δ, cannot be too quick in order to avoid local minima or too slow because it becomes then too costly. Input: T, α, δ, n max, L, U Output: g 1 g := x := L + (U L) U(, 1); for n := 1 to n max 1 do 3 T := T δ 1/nmax ; ɛ := (U L) N (, 1); 5 y := x + αɛ; 6 = f(x) f(y); 7 if < or e /T > U(, 1) then x := y; 8 g := arg min {f(g), f(x)}; Algorithm 3.: Simulated annealing pseudocode Simulated annealing is another example of global optimisation, nature inspired algorithms. It is shown in listing 3.. Figure 3.1 displays an example realisation of simulated annealing for n max = 1. The solid polyline represents the sequence of points forming the optimisation path. Separate points depict probing the search domain, when neither < nor e T > U(, 1) was satisfied.

37 36 3. Single-point, derivative-free algorithms 3. Random jumping The random jumping is the simplest and most naive way of dealing with an objective function optimisation. Simply, a sequence o completely random points is generated with no relation to one another whatsoever x n+1 := α ɛ. (3.) The random vector ɛ can be drawn from the standard normal distribution according to equation (3.3), for instance, or any other distribution. Formula for generating points is given by x n+1 := α (U L) U(, 1). (3.1) y x 3 Figure 3.11: 1 evaluations of random jumping Figure 3.11 displays an example plot of random jumping for n max = 1. The solid polyline represents an order of points generation and it does not constitute any optimisation path. Despite the fact that the algorithm is simple and naive it may, however, perform better than an uncontrolled random walk.

38 Chapter Multi-point, derivative-free algorithms.1 Introduction Metaheuristic and nature-inspired algorithms are two key concepts in global optimisation. Also, nature-inspired metaheuristic is commonly used term. Actually, all algorithms in this chapter can be classified as metaheuristic inspired by nature..1.1 (Meta)heuristic Heuristic is typically trial and error approach to problem solving or in other words it is a method developed on the basis of experience. Metaheuristic (higher lever heuristic) is not problem-specific, stochastic algorithms with randomisation and local search. Properties of (meta)heuristic are: There is no guarantee that a globally optimal solution can be found. This is because (meta)heuristic algorithms are approximate in nature. Sufficiently good solution can be found in a reasonable amount of time. A balance between exploitation and exploration should exists. The former concept (exploitation) has local search character, whereas the latter (exploration) is of global nature. Moreover, most of the (meta)heuristic optimisation algorithms are global due to their stochastic nature..1. Nature-inspired algorithms Metaheuristic, nature-inspired algorithms can be classified according to sources of inspiration [7]: Physics-based. Inspiration comes from physics or chemistry. Certain laws are imitated. Examples of physics-inspired algorithm include for instance Gravitational Search Algorithm. Two more examples, known from the previous chapter,

39 38. Multi-point, derivative-free algorithms are Random Walk and Simulated Annealing. However, these are also classified as single-point algorithms. Bio-inspired. Inspirations comes from biology. Examples of bio-inspired algorithms are Genetic Algorithms, Differential Evolution, Flower Pollination Algorithm. Furthermore, bio-inspired algorithms are not swarm intelligence based. Swarm intelligence based. Inspiration comes from swarm intelligence, i.e. the collective behaviour of decentralised agents following a small set of simple rules. Examples are Particle Swarm Optimisation, Firefly Algorithm, Bat Algorithm, Cuckoo Search. Other methods.. Physics-based algorithms..1 Gravitational search algorithm The gravitational search algorithm [19] mimics Newton s law of gravitation which states that every mass attracts other individual masses by a force f ij proportional to the products m i m j of the two individual masses and inversely proportional to the square of the distance x ij between them. The force is directed along the line x ij / x ij intersecting both masses. If the gravitational potential is V ij = G m im j x ij (.1) then the force acting between the two masses is given by the negative gradient of the potential V ij, namely f ij = xi V ij. G stands for the gravitational constant. A vector form of Newton s law of gravitation is now given by f ij = G m im j x ij x ij x ij. (.) Considering a system which consists of N individual masses m i it is possible to utilise Newton s equation of motion in order to track the evolution in time of all individual masses N m i a i = f ij. (.3) j=1 i The evolution in time depends on the potential solely, provided that the initial positions and velocities are known. Furthermore, Newton s equation of motion (.3) in the following form d x i dt = a i = 1 m i N j=1 i f ij (.) can now be discretised and solved by means of the Störmer-Verlet method, for instance. A simpler approach, known as the semi-implicit Euler method, is used instead

40 .. Physics-based algorithms 39 as accuracy of time evolution is not an issue here. Thus, equation (.) is equivalent to a pair of differential equations dx i dt =v i, dv i dt =a i. (.5a) (.5b) The discrete version of the above system is obtained from the Taylor linear expansion of velocity v i (t + t) v i (t) + a i (t) t (.6) and position x i (t + t) x i (t) + v i (t + t) t. (.7) What is more, the linear Taylor expansions means that this method is first order accurate in contrast with the Störmer-Verlet method being second order accurate. The discrete form (.6) and (.7) of the system (.5) indicates that the initial positions and velocities should be known. In the gravitational search algorithm masses are associated with agents (points) in such a way that the objective function values are proportional to individual masses. Heavier masses attract lighter masses by a gravitational force analogous to (.). According to equation (.), Movement of individual agents is proportional to their masses, hence, the heavier the mass the slower its movement. This provides a mechanism for exploitation, whereas exploration exists due to lighter masses and faster movements. Positions of agents are associated with solutions if terms of arguments of an objective function. In order to account for the mass conservation N i=1 M i = 1, the two auxiliary points are calculated every iteration, namely current best agent b and current worst agent w The actual mass per iteration is then calculated as b := arg min f(x n x n i ) (.8) i w := arg max f(x n x n i ). (.9) i m n i := f(xn i ) f(w) f(b) f(w). (.1) However, the above equation does not account for the mass conservation. This is because as the algorithm progresses, both b and w becomes smaller. Hence, the individual masses are normalised in the following way Mi n := mn i N m n i i=1. (.11) Thus, the mass of a system is conserved N i=1 M i = 1 and simply redistributed among individual agents according to objective function f values.

41 . Multi-point, derivative-free algorithms The force coming from agent j acting on agent i is similar to that given by equation (.), namely fij n := G n Mi n Mj n U(, 1) ( x n j ) xn i x n i. (.1) xn j 1 + ε There are, however, three differences. Firstly and most importantly, it is no longer an inverse square law as the force in not proportional to the square of a distance x ij. It is simply a distance x ij 1 instead. In order to avoid division by zero, a small constant ε is always added to the denominator. As the algorithm progresses the gravitational constant is reduced according to n G n := G e α nmax. (.13) An algorithm constant α is introduced in order to control reduction of the gravitation constant. Another algorithm constant is the initial value of G. Secondly, the distance (x n j xn i ) between the two individual agents is not normalised. Lastly, randomisation is introduced to the force by means of the realisation of a stochastic vector variable with uniform continuous distribution U(, 1) y x 3 Figure.1: evaluations of gravitational search algorithm The discrete Newton s equation of motion (.) makes it now possible to calculate the acceleration a n i of individual agents a n i := 1 M n i N j=1 i f n ij. (.1) Assuming unit time step t := 1, the updated velocity, according to equation (.6), is now v n+1 i := U(, 1) v n i + a n i (.15) where additional stochastic vector variable with uniform continuous distribution U(, 1) is added in order to introduce randomisation. Finally, the next position of individual

42 .3. Bio-inspired algorithms 1 agents is updated directly according to equation (.7), resulting in x n+1 := x n + v n+1. (.16) The algorithm is shown in listing.1. Figure.1 displays an example realisation of the gravitational search algorithm for n max = and N = agents which is equivalent to evaluations of the objective function. The solid polylines represent trajectories of individual agents. Input: α, G, N, n max, L, U Output: g 1 for i := to N 1 do x i := L + (U L) U(, 1); 3 v i := ; for n := to n max 1 do 5 b := arg min xi f(x i ); 6 w := arg max xi f(x i ); 7 if n = then g := b; 8 M := f(x) f(w) f(b) f(w) ; 9 M := M N i=1 Mi ; 1 G := G e α n nmax ; 11 E := ; 1 for i := to N 1 do 13 for j := to N 1 do 1 if i j then 15 Mj U(,1) (xj xi) E i := E i + x j x i +ε ; 16 v := U(, 1) v + G E; 17 x := x + v; Algorithm.1: Gravitational search algorithm pesudocode.3 Bio-inspired algorithms.3.1 Genetic algorithms Evolutionary algorithms Evolutionary Algorithms (EA) are multi-point (population based) optimisation algorithms. EA are classified as bio-inspired in the sense that they mimic Darwinian evolution. They evolve better solutions by means of recombination, mutation and survival. Also, EA operate on populations as other multi-point algorithms. One of the huge advantage of EA over traditional optimisation method is that they typically does not need any additional information about the objective function. Another desirable property EA is parallelism. This is because all individuals of a population

43 . Multi-point, derivative-free algorithms (generation) perform independently. Further, randomisation of the EA is introduced through the probability of crossover and mutation. One can distinguish at least the two groups of algorithms: Genetic Algorithms (GA) [1, 1]. Binary representation of individuals is used. This means that they are encoded as vectors of bits and all genetic operators such as crossover are performed on vectors. The disadvantage of this approach is a discretisation error due to the limited length of vectors. This is one of the reasons why the genetic algorithms with floating-point representation are better suited for continuous optimisation. Evolution Strategies (ES) [17]. Floating-point representation is used in order to represent individuals. All genetic operators perform directly on floating-point numbers, meaning that no discretisation error is introduced. Traditionally, both representations, i.e. binary and floating-point, are commonly termed genetic algorithms Binary representation Initialisation No Population Parent selection Parents Recombination Offspring Survivor selection Converged? Yes Stop = Figure.: Genetic algorithms flowchart Figure.3: Crossover and mutation The flowchart of the genetic algorithm is shown in figure.. The first step called Initialisation includes encoding all individuals. In this case, binary representation is chosen. Also, random initial population is created and the fitness function is evaluated. Thus, the Population step is achieved. Next, parents for further generations are selected in order to produced offspring. This can be achieved by various method. Two most popular and common methods are roulette wheel and tournament selection and the process is called Parent selection. The next step is recombination where offspring is produced. Typically, two parent produce two offspring by means of the genetic operators such as crossover (figure.3 top) with high crossover probability p c. A random point is selected and exchange of bits from the left of that point with first parent with bits on the right with second parent follows. As a results, two offspring

44 .3. Bio-inspired algorithms 3 inherit portion of each parent. The next genetic operator is random mutation with low probability p m. This results in altering certain number of bits as shown in figure.3 (bottom). Mutation alters a 1 to or conversely a to 1. The next generation or populations is then created through the process called Survivor selection. Two strategies are possible, discussed further. Finally, the new population is evaluated by means of the objective function and a stop criterion is checked in the last step Converged?. Detailed description of steps like selection, recombination methods or survivor selections are given in the next paragraph Floating-point representation Input: p c, p m, T, N, n max, L, U Output: g 1 for i := to N 1 do x i := L + (U L) U(, 1); 3 y i := ; g := arg min xi f(x i ); 5 for n := 1 to n max 1 do 6 for i := 1 to N 1 do 7 a := Tournament (x, T ); 8 b := Tournament (x, T ); 9 p 1 := x a ; 1 p := x b ; 11 (c 1, c ) := Crossover (p 1, p, p c ); 1 y i := Mutation (c 1, i, p m ); 13 y i+1 := Mutation (c, i, p m ); 1 i := i + ; 15 l := arg min xi f(x i ); 16 g := arg min {f(g), f(l)}; 17 x = Selection(x, y); Algorithm.: Genetic algorithm pesudocode The genetic algorithm in the pseudocode form, regardless of how the individuals are represented, is shown in listing.. However, details of internal functions are given for floating-point representation, since they are better suited for continuous optimisation. Lines 7 and 8 represent parent selection steps by means of tournament selection shown in listing.3. The tournament size T is necessary in order to select T individual out of a parent population of N members. When T individuals are selected, the best of them is chosen to be a parent. Typically, T is low for small populations, i.e. or 3 yet the lowest value is. Obviously, the tournament selection is of random character and the whole process reassembles competition for selection in order to pass genetic material to offspring. Lines 1 and 3 in listing.3 represent random variate of a discrete uniform distribution in order to select random member

45 . Multi-point, derivative-free algorithms of a parent populations. Once parents are selected, the crossover takes place (line 11 in listing.). Input: T, N, x Output: k 1 k := U{, N 1}; for i := 1 to T 1 do 3 j := U{, N 1}; if f(x j ) < f(x k ) then k = j; Algorithm.3: GA parent selection (tournament) pesudocode Crossover provides mixing of the solutions. Several methods are in use. The most popular an arithmetical crossover is discussed here, being simple and elegant. Two parents x 1, x are crossed with probability p c. If U(, 1) < p c then a random number drawn from a uniform continuous distribution is generated a := U(, 1). (.17) Further, two parent vectors x 1 and x produce two offspring vectors y 1 and y according to y 1 := a x 1 + (1 a)x, y := a x + (1 a)x 1. (.18a) (.18b) This also means that two offspring vectors are a linear combinations of two parent vectors. This method, however, guarantees that y 1, y remain within an optimisation domain Ω if its either unconstrained optimisation problem or the domain Ω is constrained and convex (e.g. box constraints). The arithmetical crossover pseudocode is shown in listing.. Input: x 1, x, p c Output: y 1, y 1 y 1 := x 1 ; y := x ; 3 if U(, 1) < p c then a := U(, 1); 5 y 1 := a x 1 + (1 a)x ; 6 y := a x + (1 a)x 1 ; Algorithm.: GA arithmetical crossover pesudocode As soon as parents produce offspring, they children are mutated (lines 1, 13 in listing.) with probability p m. Mutation increases the diversity of the population and provides a mechanism for escaping from local optima. Two types of mutations are in common use: Uniform Nonuniform

46 .3. Bio-inspired algorithms 5 A child is uniformly mutated if U(, 1) < p m. Then a random individual is generated within the search space according to the following equation x i = L + (U L) U(, 1). (.19) The uniform mutation pseudocode is shown in listing.5. Input: x i, p m, L, U Output: y i 1 y i := x i ; if U(, 1) < p m then 3 x i = L + (U L) U(, 1) Algorithm.5: GA uniform mutation pseudocode Nonuniform mutation takes place if U(, 1) < p m. If so, then additional number [; 1] is generated n := 1 U(, 1) (1 nmax ). (.) As the algorithm progresses the value of decreases. This leads to mutation damping when the algorithm approaches to an end. Finally, components of a mutated child are given by { x ik + (U k x ik ), if U{, 1} = ; x ik := (.1) x ik (x ik L k ), otherwise where x i = (x i1,..., x id ). The nonuniform mutation pseudocode is shown in listing.6. Input: x i, p m, D, n, n max, L, U Output: y i 1 y i := x i ; for k := to D 1 do 3 if U(, 1) < p m then := 1 U(, 1) (1 n nmax ) ; 5 if U{, 1} = then 6 y ik := y ik + (U k y ik ) ; 7 else 8 y ik := y ik (y ik L k ) ; Algorithm.6: GA nonuniform mutation pseudocode The last main step in the genetic algorithm (listing.) is Selection (line 17), in fact being survivor selection. The purpose of this is to pass best solutions onto next generations. Traditionally, µ denotes total number of parent vectors x i and λ stand for the number of offspring vectors y i. In this case, both are equal µ = λ = N. In general, however, at least two selection strategies may be distinguish, keeping in mind that λ µ:

47 6. Multi-point, derivative-free algorithms (µ, λ) strategy (µ + λ) The (µ, λ) strategy selects the best µ out of λ offspring vectors y to become the next parent vectors generation x. Listing.7 shows the straightforward (µ, λ) strategy pseudocode (µ = λ). Input: x, y Output: x 1 x := y; Algorithm.7: GA (µ, λ) strategy pseudocode The (µ + λ) strategy creates the next parent vector generation with the best µ vectors from the combined parent x and offspring y population of µ + λ vectors. The (µ + λ) strategy pseudocode is shown in listing.8. Input: x, y, N Output: x 1 x = x y; Sort(x i ) based on f(x i ); 3 x := x \ {x N,..., x N 1 }; Algorithm.8: GA (µ + λ) strategy pseudocode Figure. displays an example realisation of the genetic algorithm for n max = and N = individuals which is equivalent to evaluations of the objective function. As usual, the solid polylines represent trajectories of individuals y x 3 Figure.: evaluations of genetic algorithm

48 .3. Bio-inspired algorithms 7.3. Differential evolution Differential evolution [] is a simple, fast and effective metaheuristic algorithm. Like other metaheuristic algorithms, DE does not need any additional information about the objective function. DE is similar to genetic algorithms. What is even more, DE is regarded as the next step in evolution of genetic algorithms. Crossover and mutation are utilised on floating-point vectors, which makes it similar to GA. Additionally, selection is also present in DE. Most importantly, an explicit update equation is provided in contrast with GA. Input: C, F, N, n max, L, U Output: g 1 for i := to N 1 do x i := L + (U L) U(, 1); 3 y i := ; g := arg min xi f(x i ); 5 for n := 1 to n max 1 do 6 for i := to N 1 do 7 K := H (C U(, 1)); 8 K U{,D 1} := 1; 9 a := RandomPermutation({,..., N 1} \ {i}); 1 y i := K (x a3 + F (x a1 x a )) + (1 K) x i ; 11 for i := to N 1 do 1 x i := arg min {f(x i ), f(y i )}; 13 g := arg min {f(g), f(y i )}; Algorithm.9: Differential evolution pseudocode Differential evolution consists of four main steps, namely, three different individuals selection, mutation, crossover and selection. The first step, i.e., three different and randomly chosen individuals x n a, x n b, xn c out of a population x means that {x n a, x n b, xn c } x. Additionally, the population size is N. Once three individuals are selected, a mutant vector v i is generated according to v i := x n a + F (x n b x n c ). (.) The scale factor F, or the so called differential weight F ]; 1[, is used in order to control the rate of population development. Furthermore, the trial vector y i is created via binomial crossover with probability C { vij n, if U(, 1) < C; y ij := x n ij, otherwise. (.3) Other crossover techniques, such as exponential crossover, are possible. The crossover probability C [; 1] regulates how many of the mutant vector is copied to the trial vector. Alternatively, one can combine mutation and binomial crossover in a single

49 8. Multi-point, derivative-free algorithms vector equation by means of the Heaviside step (theta) function H. auxiliary D-dimensional vector K consisting of and 1 Introducing K := H (C U(, 1)), (.) it is now possible to combine equations (.) and (.3), i.e. mutation and crossover, into a single vector formula y i := K (x n a + F (x n b x n c )) + (1 K) x n i. (.5) The above equation is present in line 1 in DE pseudocode listing.9 together with equation (.) (line 7). Three different and randomly chosen individuals are indexed on the basis of a random permutation vector a (line 9). Additionally, line 8 corresponds to setting a random index of the vector K to 1 in order to guarantee that the y i x n i. The last step is selection (line 1 in algorithm.9). Simply, the best solution of the trial vector y i and original individual x n i, in terms of the objective function value, is passed onto a next generation x n+1 i x n+1 i := { y i, if f(y i ) < f(x n i ); x n i, otherwise. (.6) This step is fully deterministic in contrast with mutation and crossover. Furthermore, the four essential steps are applied to all member x n i of the population until the new population x n+1 i is created. The algorithm terminates if a given stop criterion is satisfied. This could be for instance a maximum number of objective function evaluations y x 3 Figure.5: evaluations of differential evolution Figure.5 presents an example realisation of differential evolution for n max = and N = individuals. This is equivalent to evaluations of the objective function. Trajectories of individuals are represented by means of the solid polylines.

50 .3. Bio-inspired algorithms 9 Various differential evolution variants are in use, each of which has its own notation. These include, among others: DE/Rand/1/Bin y i := x n a + F (x n b x n c ). (.7) DE/Best/1/Bin DE/Rand/m/Bin DE/Best/m/Bin y i := g n + F (x n b x n c ). (.8) y i := x n a1 + y i := g n + m ( F j x n aj x n aj+1). (.9) j=1 m ( F j x n aj x n aj+1). (.3) j=1 Word rand stands for the first, randomly chosen, individual x a whereas best represents current global best g n. Third number in the above notation (1 or m) shows the number of individual differences added to the first individual. Finally, Bin describes a binomial crossover..3.3 Flower pollination algorithm The flower pollination algorithm [9], as the name suggest, is inspired by the pollination of flowers phenomena. Two types of pollination are considered, i.e. global and local. Global pollination, taking place over long distances, is mimicked by a Lévy flight and termed as global search. Local pollination (short distances) is mimicked by a local search. The interaction between local and global pollination is controlled by a probability p. In other words, FPO algorithm is simply a combination of global and local random walk. Thus, the update formula is { x n i + α ɛ, if p < U(, 1); y := x n i + U(, 1)(xn j xn k ), otherwise (.31) where the random vector ɛ is drawn from the symmetrical Lévy stable distribution (3.18) σ N (, 1) ɛ := (g x n N (, 1) 1 i ). (.3) λ Two different individuals x n j and xn k are taken randomly from the current population and their difference is scaled by random variate of the continuous uniform distribution. Consequently, this reassembles local random search (walk). Equation (.31) is present in lines 7-1 in algorithm.1. If the new individual y is better than it predecessor x n i then it is passed onto a new population { x n+1 y, if f(y) < f(x n+1 i ); i := x n i, otherwise. (.33)

51 5. Multi-point, derivative-free algorithms Input: α, λ, N, n max, p, L, U Output: g 1 for i := to N 1 do x i := L + (U L) U(, 1); 3 g := arg min xi f(x i ); ( Γ(1+λ) sin σ := λ Γ( 1+λ )λ λ 1 ) 1 λ ; 5 for n := 1 to n max 1 do 6 for i := to N 1 do 7 if p < U(, 1) then 8 ɛ := σ N (,1) N (,1) λ 1 (g x i ); 9 y := x i + αɛ; 1 else 11 R := RandomPermutation({,..., N 1}); 1 y := x i + U(, 1) (x R1 x R ); 13 CheckRange(y); 1 x i := arg min {f(x i ), f(y)}; 15 g := arg min {f(x i ), f(g)}; Algorithm.1: Flower pollination algorithm pseudocode The above condition is represented by line 1 in algorithm.1. Figure.6 shows an example of a flower pollination algorithm realisation for n max = and N = individuals (equivalent to evaluations of the objective function). Trajectories of individuals are represented by means of the solid polylines y x 3 Figure.6: evaluations of flower pollination algorithm

52 .. Swarm intelligence based algorithms 51. Swarm intelligence based algorithms..1 Particle swarm optimisation Particle swarm optimisation [13] is a bio-inspired, and at the same time swarm intelligence based, global optimisation algorithm. Consequently, the algorithm is inspired by swarm or collective behaviour frequently observed among certain animals. What is important, there is no central coordination and swarm behaviour is regarded as a collective motion of agents following a small set of simple rules. Agents interact with one another at the local scale. The whole process leads to an intelligent like global behaviour. Input: α, β, θ, N, n max, δ, L, U Output: g 1 for i := to N 1 do x i := x i := L + (U L) U(, 1); 3 v i := ; g := arg min xi f(x i ); 5 for n := 1 to n max 1 do 6 θ := θ δ 1/nmax ; 7 for i := to N 1 do 8 ɛ 1, ɛ := U(, 1); 9 v i := (θ + θ) v i + αɛ 1 (x i x i) + βɛ (g x i ); 1 x i := x i + v i ; 11 x i := arg min {f(x i ), f(x i)}; 1 g := arg min {f(g), f(x i )}; Algorithm.11: Particle swarm optimisation pseudocode Assuming unit time step t := 1, the position update formula of individual particle is x n+1 i := x n i + v n+1 i (.3) and velocity v n+1 i := v n i + α ɛ 1 (x i x n i ) + β ɛ (g x n i ). (.35) The two random vectors ɛ 1, ɛ are drawn from the continuous uniform distribution distribution ɛ 1 := U(, 1), ɛ := U(, 1) (.36a) (.36b) thus introducing randomisation to the update formula. Clearly, the two main components can be distinguished in equation (.35), i.e. attraction towards the particle s best position x i so far and attraction towards the global best position g. Additionally, the ratio between these two attraction is balanced by means of two components α

53 5. Multi-point, derivative-free algorithms and β. In order to reduce velocity as the algorithm progresses, the so called damping function θ, having the following property, analogous to the cooling formula (3.39) θ n+1 θ n. (.37) As a result, the stabilised version of the velocity update formula (.35) is now v n+1 i := (θ + θ n ) v n i + α ɛ 1 (x i x n i ) + β ɛ (g x n i ). (.38) The above equation is present in line 9 in algorithm.11 together with the position update formula (.3) (line 1). The particle s best position x i and global best position g are checked every position update (lines 11 and 1). Initial positions of particles x i are uniformly distributed (line ) and initial velocities are assumed to be zero vi := (line 3) y x 3 Figure.7: evaluations of particle swarm optimisation Figure.7 shows an example realisation of a particle swarm optimisation algorithm for n max = and N = individuals ( evaluations of the objective function). As usual, trajectories of individuals are represented by means of the solid polylines... Accelerated particle swarm optimisation The accelerated particle swarm optimisation [3] does not take advantage of the particle s best position x i. Additional randomisation and diversity is replaced by a random vectors ɛ 1. A simplified velocity update formula is now v n+1 i := (θ + θ n ) v n i + α ɛ 1 + β (g x n i ). (.39) The random vectors ɛ 1 is drawn from the continuous uniform distribution and is scaled by means of the search space range (U L) ɛ 1 := (U L) U ( 1, 1 ). (.)

54 .. Swarm intelligence based algorithms 53 Moreover, the position update formula remains intact (.3). The same concerns the damping function θ (.37). Another obvious difference between equation (.38) and (.39) is lack of the second random vector ɛ. Consequently, the two main components can now be distinguished in equation (.39), i.e. randomisation via α ɛ 1 and deterministic attraction towards the global best position β(g x n i ). Input: α, β, θ, N, n max, δ, L, U Output: g 1 for i := to N 1 do x i := L + (U L) U(, 1); 3 v i := ; g := arg min xi f(x i ); 5 for n := 1 to n max 1 do 6 θ := θ δ 1/nmax ; 7 for i := to N 1 do 8 ɛ 1 := (U L) U( 1, 1 ); 9 v i := (θ + θ) v i + αɛ 1 + β (g x i ); 1 x i := x i + v i ; 11 g := arg min {f(g), f(x i )}; Algorithm.1: Accelerated particle swarm optimisation 1 pseudocode The new velocity update equation (.39) is present in line 9 in algorithm.1 together with the random vector (.) (line 8). The position update formula (.3) (line 1) is the same. Only global best position g is updated every position (line 11). Initial positions or particles x i are uniformly distributed (line ) and initial velocities are assumed to be zero vi := (line 3) y y x 3 x 3 Figure.8: evaluations of accelerated particle swarm optimisation 1 Figure.9: evaluations of accelerated particle swarm optimisation

55 5. Multi-point, derivative-free algorithms In order to avoid initialisation of velocities it is possible to simplify the accelerated particle swarm optimisation even further. Substituting equation (.39) into (.3) and removing velocity we have x n+1 i := x n i + (α + α n ) ɛ 1 + β (g x n i ). (.1) Furthermore, the coefficient α is replaced by another damping function α, having the following property α n+1 α n. (.) The random vectors ɛ 1 remains the same (equation (.)). The second version of the accelerated particle swarm optimisation in the pseudocode form is shown in listing.13. It is shorter in comparison with listing.1 as there is no need to initialise and update velocities. The position update equation (.1) is present in line 8. Input: α, β, N, n max, δ, L, U Output: g 1 for i := to N 1 do x i := L + (U L) U(, 1); 3 g := arg min xi f(x i ); for n := 1 to n max 1 do 5 α := α δ 1/nmax ; 6 for i := to N 1 do 7 ɛ 1 := (U L) U( 1, 1 ); 8 x i := x i + (α + α) ɛ 1 + β (g x i ); 9 g := arg min {f(g), f(x i )}; Algorithm.13: Accelerated particle swarm optimisation pseudocode Figures.8 and.9 displays an example realisation of two version of the particle swarm optimisation algorithm for n max = and N = individuals (equivalent evaluations of the objective function). Trajectories of individuals are represented by means of the solid polylines. The acceleration is obvious when compared to the standard particle swarm optimisation in figure Firefly algorithm The firefly algorithm [31] can be regarded as a variant of the particle swarm optimisation. It is inspired by the flashing light of fireflies. Fireflies are attracted to one another and the attractiveness is proportional to the light intensity (objective function value). Additionally, the light intensity decreases as distance between two fireflies increases. The structure of the firefly algorithm is shown in listing.1. The movement (update formula) of a firefly x i towards more attractive firefly x j is given by equation similar to (.1) (line 11 in listing.1) ( x i := x i + α ɛ + β + βe γ xj xi ) (x j x i ). (.3)

56 .. Swarm intelligence based algorithms 55 In the above formula x j x i represents the distance between two fireflies x i and x j and the whole second term is known to be the attraction term. Attractiveness of a firefly x i is directly related to its brightness I i I(r) = I e γ r (.) which is directly proportional to the objective function value f(x i ). The term β + βe γ xj xi combines light absorption, where γ is a light absorption coefficient, and light intensity variation according to the inverse square law. The attractiveness at x j x i = is indicated here as β + β. Input: α, β, β, γ, N, n max, δ, L, U Output: g 1 for i := to N 1 do x i := L + (U L) U(, 1); 3 for n := to n max 1 do Sort(x i ) based on f(x i ); 5 g := x ; 6 y := x; 7 α := α δ 1/nmax ; 8 for i := to N 1 do 9 for j := to j < i 1 do 1 ɛ := (U L) U ( 1, ) 1 ; ( 11 x i := x i + α ɛ + β + βe γ xj xi ) (y j x i ); Algorithm.1: Firefly algorithm pseudocode The third term in equation (.3) is randomization. Obviously α stands for a randomization parameter and controls the randomness of the movement. The randomization parameter α is gradually reduced α n+1 α n (.5) by means of α := α δ 1/nmax (line 7 in listing.1) for a typical values of δ = 1. This coefficient controls the step size in order to gradually reduce motion of the fireflies. By U ( 1, ) 1 one understands a value sampled from the continuous uniform distribution parametrized by 1 and 1. The randomization should be understood as a separate randomization of each k th components of the spatial coordinate, i.e., α U ( 1, ) 1 Uk L k. Lower and upper box constraints are denoted as U k and L k respectively. In vector notation we have (line 1 in listing.1) ɛ := (U L) U ( 1, 1 ). (.6) There are two main differences between PSO and FA algorithms. Firstly, the update formula (.3)) includes light absorption term proportional to the inverse square law. Secondly, double loop is present in the algorithm.1 (line 9).

57 56. Multi-point, derivative-free algorithms y x 3 Figure.1: evaluations of firefly algorithm Figures.1 display an example realisation of the firefly algorithm for n max = and N = individuals (equivalent evaluations of the objective function). Trajectories of individuals are represented by means of the solid polylines. Several variants of the firefly algorithm exist. One of them, namely the rotational firefly algorithm [] is briefly discussed here. The difference between FA and the rotational firefly rest on generation of C clusters G i of fireflies. The division into clusters is a matter of convention. An example of the division into disjoint clusters (subsets) G i is given by G i G j i j C G i = {x 1,..., x N }. (.7) i=1 The number of clusters C may be chosen from [ 1; N ]. Each cluster is then rotated by an angle ϑ around a pole x r. The pole (centre of rotation) is defined as where the average firefly x a is calculated as x r = x a + κ (x b x a ) (.8) x a = 1 G i x i. (.9) G i The coefficient κ is taken from κ [; 1]. The two extreme cases are possible. For κ = the cluster rotates around an average firefly (.9) or for κ = 1 the centre of rotation is the best firefly x b = arg min xj G i f(x j ) of cluster G i. An intermediate case κ = 1 can be assumed. The division into adjacent clusters can be kept as simple as possible. The cluster size is N C and the remaining cluster, if any, is rotated if its size N mod C > N C. Rotations are intuitive and well defined in two- and three-dimensional spaces but they are not limited only to those cases. This is because the spherical coordinates i=1

58 .. Swarm intelligence based algorithms 57 are defined in D-dimensional spaces and are analogous to the spherical coordinate system defined for the three-dimensional space and polar for two-dimensional spaces. Actual rotation in the hyperspherical coordinates (r, φ 1,..., φ D 1 ) of any point x = (x 1,..., x D ) is calculated by means of the following transformation x 1 = r cos φ 1, k 1 x k = r cos φ k i=1 D 1 x D = r sin φ i i=1 (.5a) sin φ i ; if 1 < k < D, (.5b) (.5c) where the last angle φ D 1 is increased by ϑ, i.e. φ D 1 := φ D 1 +ϑ. Cluster generation and rotation is performed before the two main loops in algorithm.1 take place y x 3 Figure.11: evaluations of the bat algorithm.. Bat algorithm The bat algorithm [8] is a bio-inspired and simultaneously swarm intelligence based, global optimisation algorithm. Inspirations comes from the echolocation of bats. Properties and simplifications of the simplest bat algorithm are: Bats perform random flight with velocity v i based on pulse rate r. Random local search is performed based on pulse rate r around the current global best g. In the main, frequency of the emitted pulse F is adjustable in order to correspond to the search domain at least. Random value of F is assumed F [F l ; F u ] notwithstanding. In general, rate of the emitted pulse r is adjustable by bats the closer to a prey the faster r. However, constant values of r is assumed.

59 58. Multi-point, derivative-free algorithms In spite of the fact that the loudness A vary (the closer to a prey the quieter), it is assumed to be constant. Input: α, A, r, F l, F u, N, n max, L, U Output: g 1 for i := to N 1 do x i := L + (U L) U(, 1); 3 v i := ; g := arg min xi f(x i ); 5 for n := 1 to n max 1 do 6 for i := to N 1 do 7 if r < U(, 1) then 8 ɛ := (U L) N (, 1); 9 y := g + α ɛ; 1 else 11 F := F l + (F u F l ) U(, 1); 1 v i := v i + F (g x i ); 13 y := x i + v i ; 1 if f(y) < f(x i ) and U(, 1) < A then 15 x i := y; 16 g := arg min {f(g), f(y)}; Algorithm.15: Bat algorithm pseudocode In other words, the bat algorithm is a combination of two random walks. Switching between individual walks is controlled by the probability r. Consequently, the intermediate update equation is { g + α ɛ, if r < U(, 1); y := x n i + (.51) vn+1 i, otherwise. The random vector ɛ, present in the local search, is drawn from the standard normal distribution scaled by the search domain ɛ := (U L) N (, 1). (.5) Velocity of the random flight v i is adjustable by means of the frequency F v n+1 i := v n i + F (g x n i ). (.53) Random variations of the frequency F is limited by the lower F l and upper F u values. This leads to F := F l + (F u F l ) U(, 1). (.5) If the intermediate individual y is better than its predecessor x n i then it is passed onto a new population with a probability (loudness) A { x n+1 y, if f(y) < f(x i ) and U(, 1) < A; := x n (.55), otherwise.

60 .. Swarm intelligence based algorithms 59 The bat algorithm in the pseudocode form is shown in listing.15. Moreover, it is regarded as another variant of the particle swarm optimisation. Figures.11 display an example realisation of the bat algorithm for n max = and N = individuals (equivalent evaluations of the objective function). Solid polylines correspond to trajectories of individuals y y x 3 x 3 Figure.1: evaluations of cuckoo search ( individuals) Figure.13: evaluations of cuckoo search ( individuals)..5 Cuckoo search Cuckoo search [7] is inspired by the brood parasitism of the cuckoos. Cuckoo eggs represent individuals and are evaluated by the objective function. Eggs are dropped by cuckoos randomly and can be discovered by the host with probability p and abandoned. In practice, however, cuckoo search is simply a sequence of global and local random search. The former is performed by means of Lévy flight whereas the latter by random walk. What is more, the subsequent local random search is controlled by a probability p. The structure of cuckoo search is shown in listing.16. The intermediate update equation (line 1) for the global random search is y := x n i + α ɛ 1 (.56) where the random vector ɛ 1 (line 9) is drawn from the symmetrical Lévy stable like distribution (3.) σ N (, 1) ɛ 1 := (U L) ε (.57) N (, 1) 1 λ ε and ε is taken from the standard normal distribution (line 8) ε := N (, 1). (.58)

61 6. Multi-point, derivative-free algorithms Better solution between the intermediate y and original individual x n i, in terms of the objective function value, is passed onto a next generation x n+1 i (line 11) { x n+1 y, if f(y) < f(x n i i := ); x n i, otherwise. (.59) Input: α, λ, N, n max, p, L, U Output: g 1 for i := to N 1 do x i := L + (U L) U(, 1); 3 g := arg min xi f(x i ); ( Γ(1+λ) sin σ := λ Γ( 1+λ )λ λ 1 ) 1 λ ; 5 n := N 1; 6 repeat 7 for i := to N 1 do 8 ε := N (, 1); 9 ɛ := σ N (,1) (U L) ε N (,1) λ 1 ε ; 1 y := x i + αɛ; 11 x i := arg min {f(x i ), f(y)}; 1 n := n + 1; 13 a, b := Random Permutation({,..., N 1}); 1 for i := to N 1 do 15 ɛ := U(, 1) (x ai x bi ); 16 y := x i + αɛ H (p U(, 1)); 17 if x i y then 18 x i := arg min {f(x i ), f(y)}; 19 n := n + 1; g := arg min xi f(x i ); 1 until n < n max ; Algorithm.16: Cuckoo search pseudocode Local random search update equation includes the abandonment probability p. Consequently, the intermediate individual y is (line 16) y := x n+1 i + α ɛ H (p U(, 1)). (.6) Two different individuals x n+1 j and x n+1 k are taken randomly from the current population by means of random permutations (line 13). Two random permutation sets a, b of N numbers are generated separately in order to provide consecutive indexes j and k. The random vector ɛ is drawn from the continuous uniform distribution and is scaled by means of the difference (x n+1 j x n+1 k ) ɛ := U(, 1) ( x n+1 j x n+1 k ). (.61)

62 .. Swarm intelligence based algorithms 61 As a result, this reassembles local random search (line 15). Additionally, the Heaviside step function H is utilised in order to account for the abandonment probability p. For p := 1 equations (.6) and (.61) are similar to differential evolution (.). If the new individual y is better than its predecessor x n+1 i then it is passed onto a new population x n+ i := { y, if f(y) < f(x n+1 i ); x n+1 i, otherwise. (.6) Figures.1 display an example realisation of cuckoo search for N = individuals and evaluations of the objective function. Solid polylines correspond to trajectories of individuals. Figures.13 presents an interesting property of cuckoo search, namely, its ability to localise many local minima at the same time. This is, however, possible only if the population is large enough.

63 Chapter 5 Constraints 5.1 Unconstrained and constrained optimisation As it has been previously mentioned, the general problem of unconstrained optimisation, i.e., minimisation in this case, is expressed as min f(x) = f (5.1) x R D where f : R D R is the objective function to be minimised and x R D is an independent variable. Moreover, the argument x of the minimum value f of the objective function f is defined as x = arg min x R D f(x). (5.) In other words, the general problem of unconstrained optimisation is the process of optimising an objective function f in the absence of constraints on independent variable. This simply means that the independent variable x R D. The constrained optimisation problem, i.e., minimisation in this case, is given by min f(x) = f (5.3) x Ω where Ω is a constraint set, or the so called optimisation domain. This time, however, the argument x of the minimum value f of the objective function f is defined as x = arg min f(x), (5.) x Ω meaning that the constrained optimisation problem is the process of optimising an objective function f in the presence of constraints Ω on independent variable x or simply Ω R D. (5.5) We can distinguish:

64 5.. Lagrange multipliers 63 Equality constraints. The constraint set is expressed by means of equality constraint functions g i Ω := { x R D : g i (x) = }. (5.6) The index i belongs to the index set of equality constraints i {1,..., m}. Inequality constraints. The constraint set is expressed by means of inequality constraint functions h j Ω := { x R D : h j (x) }. (5.7) The index j belongs to the index set of equality constraints j {1,..., k}. Box constraints. The constraint set is expressed by means of simplified inequality constraints Ω := { x R D : L i x i U i }. (5.8) This also means that box constraints are the special case of inequality constraints where h j (x i ) := L i x i and so on. What is more, this type of constraints is commonly met in optimisation practice. As usual, L i and U i are lower and upper bounds, respectively. If L i := and U i := for all i {1,..., D} then the box constrained problem becomes unconstrained. Equality and inequality constraints. The constraint set is expressed by means of equality g i and inequality h j constraint functions Ω := { x R D : g i (x) =, h j (x) }. (5.9) The above set expresses the most general constrained optimisation problem. 5. Lagrange multipliers 5..1 The method The method of Lagrange multipliers is a method for converting constrained optimisation problems to unconstrained problems. Let us assume, without loss of generality, a two-dimensional function f : R R to be minimised with an equality constraint g of the form (5.6) z = f(x, y), g(x, y) =. (5.1a) (5.1b) Assuming further that y can be explicitly expressed from g as a function of x and substituted into equation (5.1a), namely z = f(x, y(x)), we have the necessary condition for optimality dz dg dx = and at the same time dx =. Using the chain rule, the two above conditions are f x + f dy y dx =, g x + g dy y dx =. (5.11a) (5.11b)

65 6 5. Constraints By virtue of relations f dy dx = x = f y g x g y we have a constant multiplier, i.e., a Lagrange multiplier f x λ := g x The necessary condition for optimality (5.11) is therefore, (5.1) f y =. (5.13) g y f x + λ g x =, f y + λ g y =. (5.1a) (5.1b) Introducing an auxiliary function F, sometimes referred to as the so called Lagrangian function F (x, y, λ) := f(x, y) + λ g(x, y) (5.15) we have the necessary condition for optimality F (x, y, λ) =. (5.16) One has to keep in mind that the Lagrangian function F is now three-dimensional in comparison with the original two-dimensional function f, Equivalently, the necessary condition for optimality (5.16) is F x = f x + λ g x =, F y = f y + λ g y =, F λ = g =. (5.17a) (5.17b) (5.17c) It can be easily verified that the above three conditions correspond to equations (5.1) and the equality constraint (5.1b). Typically, the value of λ is of no interest. If it is possible, it should be first solved for λ in order to remove it from the system of equations. 5.. Equality constraints If there are m equality constraints g i of the form (5.6), the method of Lagrange multipliers can be easily extended. It is enough to introduce the following form of the Lagrangian function F (x, λ 1,..., λ m ) := f(x) + m λ i g i (x). (5.18) i=1

66 5.. Lagrange multipliers 65 Thus, again we convert constrained optimisation problem with m equality constraints to unconstrained problem. Furthermore, if the following vectors are introduced λ := {λ 1,..., λ m }, g := {g 1,..., g m } (5.19a) (5.19b) then the Lagrangian function form (5.18) is similar to the case with one constraint (5.15) F (x, λ) := f(x) + λ g(x). (5.) Obviously, the Lagrange multiplier vector λ is of the same size as constraints vector g, namely m. z x y y x Figure 5.1: Equality constraints example Let us consider the following optimisation example. We have a two-dimensional objective function f : R R to be minimised subject to the following equality constraint (see figure 5.1) f(x, y) := x + y (5.1) Ω := { (x, y) R : g(x, y) := xy 1 = }. (5.) This also means that m := 1. According to equation (5.18) or (5.), the threedimensional Lagrangian function F : R 3 R is F (x, y, λ) := x + y + λ (xy 1). (5.3) As a consequence, the Lagrangian function is unconstrained. condition for optimality F = it arises that From the necessary x + λy =, (5.a) + λx =, (5.b) xy 1 =. (5.c)

67 66 5. Constraints The above system of equations has to be solved now. Since we are not interested in the value of λ, we first solve for λ in terms of x and y. For instance, taking into consideration equation (5.b) we have λ = x 1. Substituting λ into equations (5.a) and (5.c) results in two equation system and two unknowns x and y. The critical point, i.e., solution of the necessary condition for optimality F =, is (1, 1). Thus we have the minimum x = (1, 1) = arg min f(x) (5.5) x Ω and f(1, 1) = 6. Finally, the value of the objective function at x = (1, 1) yields 5..3 Inequality constraints min f(x) = f = 6. (5.6) x Ω The method of Lagrange multipliers can be further extended to problems with k inequality constraints h j of the form (5.7). Firstly, the so called penalty function p j has to be introduced p j (x, β j ) := h j (x) + β j (5.7) where β j R is a penalty variable. Secondly, the following form of the Lagrangian function is assumed F (x, λ 1,..., λ k, β 1,..., β k ) := f(x) + k j=1 λ j ( hj (x) + βj ). (5.8) As previously, we convert constrained optimisation problem, this time, however, with k inequality constraints, to unconstrained problem. Given that, the following vectors are introduced λ := {λ 1,..., λ k}, h := {h 1,..., h k }, β := {β 1,..., β k }, (5.9a) (5.9b) (5.9c) the Lagrangian function form (5.9) is now F (x, λ, β) := f(x) + λ (h(x) + β β). (5.3) The Lagrange multiplier vector λ is of the same size as constraints vector h and the penalty variables vector β, namely k. Let us consider the next optimisation example. A two-dimensional objective function f : R R to be minimised f(x, y) := x + y (5.31) subject to the following inequality constraint (see figure 5.) Ω := { (x, y) R : h(x, y) := 1 x }. (5.3)

68 5.. Lagrange multipliers 67 The penalty function is p(x, β) := h(x) + β, where β R. This means that k := 1. According to equation (5.8) or (5.3), the unconstrained four-dimensional Lagrangian function F : R R is F (x, y, λ, β) := x + y + λ ( 1 x + β ). (5.33) From the necessary condition for optimality F = we have the following system of four equations with four unknowns x λ =, y =, (5.3a) (5.3b) 1 x + β =, (5.3c) λ β =. (5.3d) In order to solve the system (5.3), two cases have to considered (equation (5.3d)). The first case is λ =. By equations (5.3b) and (5.3c), this immediately implies x = y = and most importantly β = 1, which is impossible by the assumption β R. Because of this inconsistency, (, ) cannot be a critical point. Also, (, ) / Ω, see figure 5.. The second case is β =. By equations (5.3b) and (5.3c), this implies x = 1 and y =. The critical point, i.e., solution of the necessary condition for optimality F =, is (1, ). Thus we have the minimum x = (1, ) = arg min f(x), (5.35) x Ω and f(1, ) = 1. Finally, the value of the objective function at x = (1, ) yields min f(x) = f = 1. (5.36) x Ω z 8 1 x y y x Figure 5.: Inequality constraints example

69 68 5. Constraints 5.. Equality and inequality constraints We simply combine the two previous cases, namely equation (5.18) and (5.8). Thus the Lagrangian function is F (x, λ 1,..., λ m, λ 1,..., λ k, β 1,..., β k ) := m f(x) + λ i g i (x) + i=1 k j=1 λ j ( hj (x) + βj ). (5.37) We have m equality constraints g i of the form (5.6) and k inequality constraints h j of the form (5.7) together with k penalty functions p j. Equivalently, in vector notation the Lagrangian function (5.37) is now F (x, λ, λ, β) := f(x) + λ g(x) + λ (h(x) + β β). (5.38) Definitions (5.19) and (5.9) hold. z 8 1 x y y x Figure 5.3: Equality and inequality constraints example Let us consider the next optimisation example. A two-dimensional objective function f : R R to be minimised f(x, y) := x + y (5.39) subject to the following equality and inequality constraints (see, figure 5.3) Ω := { (x, y) R : g(x, y) := y x 1; h(x, y) := 1 x }. (5.) Since we deal with equality and also inequality constraints, the penalty function p(x, β) := h(x) + β is necessary, where β R. In this case m := 1 and k := 1. According to equation (5.37) or (5.38), the unconstrained five-dimensional Lagrangian function F : R 5 R is F (x, y, λ, λ, β) := x + y + λ (y x 1) + λ ( 1 x + β ). (5.1)

70 5.. Lagrange multipliers 69 From the necessary condition for optimality F = we have the following system of five equations with five unknowns x λ λ =, y + λ =, y x 1 =, (5.a) (5.b) (5.c) 1 x + β =, (5.d) λ β =. (5.e) In order to solve the above system of equation, two cases have to considered, namely λ = and β =. Following the same line of reasoning, the critical point, i.e., solution of the necessary condition for optimality F =, is (1, ). Thus we have the minimum x = (1, ) = arg min f(x), (5.3) x Ω and f(1, ) = 5. Finally, the value of the objective function at x = (1, ) yields 5..5 Box constraints min f(x) = f = 5. (5.) x Ω The box constraints are often considered to be natural in numerical optimisation problems. Lower L i and upper U i bounds are usually necessary in order to generate initial population. If the updated individual x n+1 i is out of range, i.e., x n+1 i / Ω, the process can be repeated until x n+1 i Ω. If, however, constrained optimisation problem has to be converted to unconstrained problem, penalty functions are necessary for every dimension j {1,..., D} p j (x, β j ) := β j (U i x i )(x i L i ) (5.5) where β j R is a penalty variable. The Lagrangian function is F (x, λ 1,..., λ D, β 1,..., β D ) := f(x) + or introducing D ( λ j β j (U i x i )(x i L i ) ) (5.6) j=1 λ := {λ 1,..., λ D }, β := {β 1,..., β D }, L := {L 1,..., L D }, U := {U 1,..., U D } (5.7a) (5.7b) (5.7c) (5.7d) we have F (x, λ, β) := f(x) + λ (β β (U x) (x L)). (5.8) The original D-dimensional constrained optimisation problem is now converted to 3D-dimensional unconstrained problem.

71 7 5. Constraints 5.3 Penalty function method Similarly to the method of Lagrange multipliers, the penalty function method is used in order to convert the original constrained optimisation problem to a series of unconstrained problems. An auxiliary function F is introduced, consisting of the original function f to be minimised subject to inequality constraints h j in the form (5.7) plus the penalty function P F (x, γ k ) := f(x) + γ k P (x) (5.9) where γ k > is a penalty parameter. Most importantly, the penalty function P is regarded as a measure of infringement of inequality constraints. The assumed form of the penalty function should be zero if x Ω and nonzero if x / Ω, i.e., { =, if x Ω; P (x) (5.5) >, if x / Ω. One of the possible forms of the penalty function is P (x) := k max r {, h j (x)} (5.51) j=1 where r {1,,...}. The converted unconstrained optimisation problem (5.9) is then solved with the sequence lim F (x, γ k) = f(x) (5.5) k whose solutions converge to the solution of the original constrained optimisation problem. What is more, the auxiliary function (5.9) still depends on D variables as γ k is regarded only as a parameter. In order to illustrate the method, let us first assume r :=. Now, we wish to minimise function (5.31) subject to inequality constraint (5.3). Firstly, it is necessary to formulate the auxiliary function F according to equation (5.9) F (x, y, γ k ) := x + y + γ k max {, 1 x}. (5.53) From the necessary condition for optimality F = we have the following two equations with two unknowns x γ k max {, 1 x} =, y =. (5.5a) (5.5b) By equation (5.5b), this implies that y = and by equation (5.5a) we have x γ k (1 x) =. The optimal solution x is obtained by letting k Finally, x (1, ). γ k x = lim = 1. (5.55) k 1 + γ k

72 5.. Barrier method Barrier method Similarly to the penalty function method and method of Lagrange multipliers, barrier method converts the original constrained optimisation problem to a series of unconstrained problems. The original function f to be minimised subject to inequality constraints h j in the form (5.7) is converted by means of the auxiliary function F given by equation (5.9). This time, however, γ k > is a barrier parameter and P a barrier function preventing the solution leaving the constraint set Ω. The assumed form of the barrier function should tends to infinity as the solution approaches to the boundary of the constraint set. One of the possible forms of the barrier function is the so called logarithmic barrier function P (x) := k ln h j (x). (5.56) j=1 The converted unconstrained optimisation problem F (x, γ k ) := f(x) γ k k ln h j (x). (5.57) is then solved with the sequence k for γ k. In order to illustrate barrier method, let us minimise again function (5.31) subject to inequality constraint (5.3). The auxiliary function F according to equation (5.57) is F (x, y, γ k ) := x + y γ k ln(1 x). (5.58) From the necessary condition for optimality F = we have the following two equations with two unknowns γ k j=1 x + =, 1 x (5.59a) y =. (5.59b) From equation (5.59b) it immediately follows that y = and by equation (5.59a) we have (1 x)x + γ k =. Since x > 1 x = γk. (5.6) For k we have γ k, thus the optimal solution x = 1 and x (1, ).

73 Chapter 6 Variational calculus 6.1 Functional and its variation Informally, a functional is considered to be a function of functions. More formally, it is a function mapping from a space of functions to the set of real numbers. For instance J[y] = x x 1 F (x, y, y, y ) dx (6.1) is a functional. Having the specific equation y(x) :=..., it is possible to calculate the value of J. Variational calculus involve methods of either minimising or maximising functional similar to (6.1). Consequently, functions that minimise or maximise a functional are referred to as extremal functions or simply extremals. The variation δj of a functional J is given by δj := α J [y + α δy] α= (6.) where δy is the variation of the argument y of a functional J, i.e., δy := y 1 y. The variation (6.) is also referred to as the first variation of a functional J Necessary condition for an extremum A minimum or maximum of a functional is called an extremum. The necessary condition for an extremum of a functional J requires its variation to be zero, i.e., δj = or by means of equation (6.) δj := α J [y + α δy] α= =. (6.3) Considering the simplest variational problem of finding an extremum of the functional (6.1) and using the above necessary condition (6.3), we have the following

74 6.1. Functional and its variation 73 integral x x 1 ( F y d dx F y + d F dx y ) ( F δy dx + y d ) F dx y δy x x 1 + F y δy x x 1 =. (6.) It is now possible to distinguish several problems based on prescribed conditions at the endpoints x i. At least three cases are possible: Both ends constrained. For i = 1 and i =, the appropriate variations in equation (6.) are δy xi =, (6.5a) δy xi = (6.5b) meaning that at x 1, x we have prescribed conditions for y(x 1 ), y(x ) and derivatives y (x 1 ), y (x ). One end constrained. As above for either i = 1 or i =. This also means that the second end is variable or unconstrained. Mixed problems. For i = 1 or i = we have either or δy xi =, (6.6a) δy xi (6.6b) δy xi, (6.7a) δy xi =, (6.7b) meaning that at x 1, we only have prescribed conditions for y(x 1 ). Additionally, at x we can have y (x ) or conversely. This situation is referred to as partly constrained end or ends The Euler equation If both ends are constrained, i.e., δy xi = and δy xi =, the necessary condition for an extremum of a functional J (6.1) is the Euler equation F y d dx F y + d F dx = (6.8) y where y is a continuously differentiable function. The above equation is also referred to as the Euler-Lagrange equation. In general, equation (6.8) is the fourth order ordinary differential equation. This also means that the specific solution requires four boundary conditions y(x 1 ), y(x ), y (x 1 ), y (x ). As mentioned previously, the solutions of the above equation are called extremals. If the integrand F of the functional (6.1) does not depend on y then it is possible to provide a simpler form of J J[y] = x x 1 F (x, y, y ) dx. (6.9)

75 7 6. Variational calculus The Euler equation (6.8) is now also simpler or explicitly F y x F y d dx F y y F y = (6.1) F y y F y y =. (6.11) In general, it is the second order ordinary differential equation. The specific solution requires two boundary conditions y(x 1 ), y(x ). Furthermore, if the integrand F of the functional (6.9) does not depend on x, namely J[y] = x x 1 F (y, y ) dx (6.1) the Euler equation (6.11) can now be integrated once to the Beltrami identity F y F y = C (6.13) where C is an integration constant. The simplest variational problem with functions of two variables is represented by the following double integral ( J[z] = F x, y, z, z x, z ) dx dy (6.1) y Ω or using shorter notation (z x := z x and so on) J[z] = F ( x, y, z, z x, z y) dx dy. (6.15) Ω The Euler equation, expressing the necessary condition for an extremum of the above functional, is F z F F =. (6.16) x y z x Slightly more complicated variational problem with functions of two variables and higher order derivatives is ( J[z] = F x, y, z, z x, z y, z x, ) z x y, z y dx dy (6.17) or The Euler equation is F z x F z x Ω J[z] = Ω F y z y z y F ( x, y, z, z x, z y, z xx, z xy, z yy) dx dy. (6.18) + x F z xx + x y F z xy + y F z yy =. (6.19)

76 6.. Classic problems Constraints Different constraints imposed on the specific variational problem can be divided into several categories: Boundary conditions. If there are no boundary conditions, i.e., no prescribed conditions at endpoints x i then the problem can be considered as unconstrained. Consequently, the Euler equation (6.8) should be solved with additional condition ( F y d ) F dx y = (6.) x i and F =. (6.1) xi y These arise due to integral (6.). If, however, certain conditions are prescribed at the endpoint x i then depending on the specific variations δy xi = or δy xi = additional conditions (6.1) or (6.) should be solved together with the Euler equation. Integral constraint. The additional integral constraint is of the form x x 1 G(x, y, y, y ) dx = C (6.) where G and C are known. Non-integral constraint. The additional constraint is of the form where G is known. 6. Classic problems 6..1 Shortest path on a plane G(x, y, y, y ) = (6.3) The problem of finding a path on a plane of shortest length connecting two points is illustrated in figure 6.1. If there are no additional constraints imposed on the function (path), the problem is trivial. The length l of a path or, in fact, a smooth curve l is given by the following line integral l = dl. (6.) To be more precise, the above integral is the curvilinear integral of a scalar field. In order to evaluate the length let us assume first that the path l is given in explicit form, namely l := {(x, y) : y(x) =..., x [x 1 ; x ]}. Since dl = dx + dy we have l = x x 1 l 1 + y dx. (6.5)

77 76 6. Variational calculus Figure 6.1: Shortest path on a plane It is a functional of the (6.9) type and the integrand F depend only on y The explicit Euler equation (6.11) is F (y ) := 1 + y. (6.6) F y y =. (6.7) It is either F y = or y =. Integrating the latter twice we have y = C 1 + C x, (6.8) i.e., a family of lines. The two constants C 1, C can be determined from the known boundary conditions (x 1, y 1 ), (x, y ). 6.. Brachistochrone The problem was originally formulated in 1696 by Bernoulli and solved independently by Bernoulli, Newton, de l Hôpital. Brachistochrone is the curve joining two points (, ) and (x, y ) of fastest descent, see figure 6.. Meaning, that a material point moving frictionless under the influence of gravity along that curve, starting from (, ), reaches (x, y ) in the shortest time y x 1 Figure 6.: Brachistochrone

78 6.. Classic problems 77 The velocity of the material point is v = dl dt along l is t = l and the time required for descent dl v. (6.9) In order to evaluate the above line integral let us assume that the path l is given in explicit form. Since the square of differential element of arc length dl is dl = dx + dy we have t = x 1 + y dx. (6.3) v Velocity can be found by means of the law of conservation of energy m v = mgy, (6.31) i.e., v = gy. Finally, the time of travel t can be expressed as a certain functional in the following form t = 1 x 1 + y dx. (6.3) g y Moreover, the constant 1/ g can be neglected as it has no impact on the optimal path l. Consequently, equation (6.3) is a functional of the (6.9) type and the integrand F does not depend on x F (y, y 1 + y ) :=. (6.33) y 1 1 y x Figure 6.3: Cycloid The Euler equation (6.11) can now be integrated once to the Beltrami identity (6.13), namely F y F = C. (6.3) y In the case of the brachistochrone problem the integrand (6.33) provides the following form of the Euler equation 1 + y y y = C. (6.35) y(1 + y )

79 78 6. Variational calculus This, however, can be reduced to y(1 + y ) = C. (6.36) Eventually, the solution of the above equation is the cycloid. A cycloid (see figure 6.3) is the curve traced by a circular band of radius R (circumference) as the band rolls along a straight line x(t) := R(t sin t), y(t) := R(1 cos t). (6.37a) (6.37b) Surprisingly, the extremal is not a straight line Minimal surface of revolution The area S of the surface of revolution S by rotating l around the x axis is given by the following integral S = y dl. (6.38) Assuming that the arc l is given in explicit form, namely l := {(x, y) : y(x) =..., x [x 1 ; x ]}, it is possible to evaluate the above line integral. Since dl = dx + dy we have x S = 1 + y dx. (6.39) x 1 y Again, equation (6.39) is a functional of the (6.9) type and the integrand F does not depend on x l F (y, y ) := y 1 + y. (6.) y x Figure 6.: Catenary The Euler equation (6.11) can be integrated once to the Beltrami identity (6.13). This results in F y F = C. (6.1) y

80 6.. Classic problems 79 In the case of the minimal surface area of revolution the integrand (6.) provides the following form of the Euler equation y 1 + y yy 1 + y = C 1. (6.) The solution of the above equation is the catenary shown in figure 6. y = C 1 cosh x C C 1. (6.3) Figure 6.5 presents the minimal surface of revolution catenoid, i.e., the surface of revolution by rotating the catenary around the x axis. z y 1 x 1 Figure 6.5: Catenoid 6.. Isoperimetric problem The problem is to find the maximal encircled area among all closed planar curves of constant length (perimeter). Interestingly, the problem is known since antiquity. For the sake of simplicity let y 1 = y =, see figure 6.6. The area of the region below l is defined as Ω = x x 1 y dx (6.) being a functional of the (6.9) type and the integrand F depends only on y, namely F (y) := y. The isoperimetric condition l = dl (6.5) states that the length of curve l := {(x, y) : y =?; x 1 x x ; y 1 = y = } is constant and equal l. According to paragraph equation (6.5) can be classified l

81 8 6. Variational calculus as integral constraint. Since the square of differential element of arc length dl = dx + dy we have x l = 1 + y dx. (6.6) x 1 It is again a functional of the (6.9) type and this time the integrand F 1 depends only on y F 1 (y ) := 1 + y. (6.7) y (x1, y1) (x, y) x Figure 6.6: Maximal area encircled In order to find the necessary condition for an extremum (the Euler equation) it is necessary to formulate a new integrand where λ is a Lagrange multiplier. The functional is or equivalently J = x F (y, y ) := F + λf 1 (6.8) x 1 F dx = J := Ω + λ l (6.9) x x 1 (y + λ ) 1 + y dx. (6.5) Since the integrand F does not depend on x, the Euler equation (6.11) can be integrated once to the Beltrami identity (6.13) F y F y = C. (6.51) In the case of the maximal encircled area the integrand F provides the following form of the Euler equation Furthermore, this can be reduced to y y + λ 1 + y λ = C 1. (6.5) 1 + y y C 1 = λ 1 + y. (6.53)

82 6.. Classic problems 81 The solution of the above equation is a family of circles (x C 1 ) + (y C 1 ) = λ. (6.5) The three constants, i.e. C 1, C, λ can be easily determined from known boundary points (x 1, y 1 ), (x, y ) and the isoperimetric condition (6.6) 6..5 Geodesics The nontrivial problem is to find the shortest path on a surface. Geodesics, in the original sense, are the shortest path between two points on a nonplanar surface S, see figure 6.7. What is more, this is a constrained variational problem since the unknown path is constrained to lie on a nonplanar surface. z y x Figure 6.7: Geodesics on a sphere The equation for the length of a curve l is given by the curvilinear integral of a scalar field, namely l = dl (6.55) where the curve l itself in space is given by the parametric representation l l := {(x, y, z) : x(t) :=... ; y(t) :=... ; z(t) :=... ; t [t 1 ; t ]}. (6.56) Since the square of differential element of arc length dl is dl = dx + dy, it is possible to evaluate the curvilinear integral (6.55) l = t t 1 x + y + z dt. (6.57) The integrand F is now F := x + y + z. (6.58)

83 8 6. Variational calculus The nonplanar surface is S := {(x, y, z) : f(x, y, z) = } where f(x, y, z) = is the constraining equations. In order to find the necessary condition for an extremum we formulate a new integrand F = F λ(t)f (6.59) or a new functional The Euler equations are J = t t 1 ( x + y + z λ(t)f) dt. (6.6) or F x d F dt x =, (6.61a) F y d F dt y =, (6.61b) F z d F dt z = (6.61c) λ(t) f x + d =, dt x + y + z (6.6a) λ(t) f y + d y =, dt x + y + z (6.6b) λ(t) f z + d z =. dt x + y + z (6.6c) x z y 1 x 1 Figure 6.8: Geodesics on a cylinder Let us consider a circular cylinder of radius R. It can be expressed by the following implicit equation f(x, y, z) := x + y R =. (6.63)

84 6.. Classic problems 83 The solution of the Euler equations are helices, meaning that the geodesics on a circular cylinder are x(t) := R cos t, y(t) := R sin t, z(t) := k t. (6.6a) (6.6b) (6.6c) 6..6 Minimal surface passing through a closed curve in space The nontrivial problem is to find the surface of least total area stretched across a closed curve in space. For a planar curve the problem is trivial again. 1 z 1 1 y 1 x Figure 6.9: Helicoid Let there be a nonplanar surface S passing through a given closed curve S and whose projection onto the xy plane is D, i.e., S := {(x, y, z) : z(x, y) =... ; (x, y) D}. (6.65) The minimum of the following surface integral of scalar fields S = ds (6.66) is the solution to the problem of the minimal surface in space. Alternatively, the above functional can be expressed by means of the double integral S = 1 + z x + z y dx dy. (6.67) D It is a functional of the (6.15) type and the Euler equations (6.16) can now be reduced to z x + z y = (6.68) x 1 + z x + z y y 1 + z x + z y S

85 8 6. Variational calculus or explicitly ( z x 1 + ( ) ) z z y x z y ( z x y + z y 1 + ( ) ) z =. (6.69) x In order to find the minimum surface S one has to solve the above nonlinear, second order partial differential equation. Obviously, a planar surface is also a solution of equation (6.69). Furthermore, the nontrivial minimal surfaces are catenoid (figure 6.5) and helicoid (figure 6.9) Variational formulation of elliptic partial differential equations The problem of finding the minimum of the functional satisfying certain boundary condition over Ω ( 1 J[z] = z x + z y ) dx dy (6.7) Ω leads to the Laplace equation z x + z =. (6.71) y The harmonic functions, being the solution of equation (6.71), are extremals of the functional (6.7). Furthermore, the necessary condition for an extremum of the following functional ( J[z] = z x + z y zf(x, y) ) dx dy (6.7) results in the Poisson equation Ω z x + z = f(x, y). (6.73) y Finally, the problem of finding the minimum of the functional J[z] = leads to the biharmonic equation Ω ( z xx + z xy + z yy zf(x, y) ) dx dy (6.7) z x + z x y + z y = f(x, y). (6.75) Variational formulation of elliptic partial differential equations forms the foundations for the various approximate method.

86 τ 6.3. Variational method of finding streamlines in ring cascades for creeping flows Variational method of finding streamlines in ring cascades for creeping flows Introduction Creeping, steady state flow is considered here together with the additional assumption of axial symmetry. Creeping flow occurs when the Reynolds number Re 1. This condition, however, is not satisfied for typical technical applications in cascade flows. Therefore, one has to keep in mind that methods presented here are mostly of cognitive values. A very important feature of creeping flows is worth mentioning here, i.e., they are characterised by the minimum possible dissipation. R γ1 γ β γ1 R 1 γ α Figure 6.1: Ring cascade scheme 6.3. Conservation equation in curvilinear coordinate systems Because of the shape of the cascade, see figure 6.1, it is most convenient to express the conservation equations in a coordinate system in which they arrive in the simplest form. The mass conservation equation for the incompressible case does not simplify, however, under the Re 1 assumption. Introducing the incompressibility and constant viscosity µ assumptions as well as steady state character of the flow, the mass conservation equation and the two components of the Stokes equations in cylindrical coordinates (or polar on a plane) read 1 p µ r = 1 r r 1 p µ ϕ = r r (ru r) + U ϕ ϕ =, ( r U ) r + 1 U r r r ϕ ( r U ϕ r U r r r U ϕ ϕ, ) + 1 U ϕ r ϕ U ϕ r + U r r ϕ. (6.76a) (6.76b) (6.76c)

87 86 6. Variational calculus The above system (6.76) is closed. The unknown functions are the velocity components U r, U ϕ and pressure p. The uniqueness of this system with prescribed boundary conditions was first proved by Helmholtz. The concept of the stream function ψ can be introduced according to the following definitions U r = 1 ψ r ϕ and U ϕ = ψ r. The alternative definitions U r = 1 ψ r ϕ and U ϕ = ψ r are also possible. Both definitions satisfy the mass conservation equation (6.76a). After differentiating equation (6.76b) with respect to ϕ and (6.76c) with r and subsequent subtracting one from the other, we obtain ( 1 r ( ( 1 r ψ ))) + ψ r r r r r r r r ϕ + 1 ψ r ϕ 3 ψ r 3 r ϕ + ψ r =, (6.77) ϕ The above equation is the so called biharmonic equation in polar coordinates. A shorter version of this equation reads ψ =. It must be point out that although equation (6.77) corresponds to the system (6.76), it is of fourth order Dissipation function and dissipation power The strain rate tensor in polar, physical coordinates, takes the form ( ) D = 1 U ϕ r U r r + 1 r U r ϕ Uϕ r 1 U ϕ r 1 r + 1 r U ϕ ϕ U r ϕ + Ur r Uϕ r The same tensor expressed in terms of stream function ψ reads ( D = 1 r 1 ψ r ϕ ψ r ϕ 1 ψ r ϕ + 1 ψ r r 1 1 ψ r ϕ + 1 ψ r ψ r 1 r ψ ϕ 1 r r 1 ψ r ϕ ψ r. (6.78) ). (6.79) By means of this tensor it is possible to express the dissipation function φ µ = µd as φ µ = µ ( ) ( ψ ( r ϕ r ψ ( )) ) ψ ψ + r ϕ ϕ + r r r ψ r. (6.8) The dissipated power is defined as N d = φ µ r dr dϕ, (6.81) Ω where the considered flow domain Ω is the following subset of the plane Ω := {(r, ϕ) : r [R 1, R ]; ϕ [, τ]} Analytical solutions An analytical solution of the system (6.76) for an axially symmetric geometry is possible. This case can also be regarded as a cascade composed of an infinite number of infinitely thin blades. Formally, it is the case where all the streamlines are identical with respect to rotation around the symmetry axis. From this arises an additional assumption, i.e., ϕ =.

88 6.3. Variational method of finding streamlines in ring cascades for creeping flows 87 It may be shown that for axial symmetry, there exists a solution of the system (6.76). This system simplifies now to 1 dp µ dr = 1 r d dr (ru r) =, ( d r du r dr dr ) U r r, (6.8a) (6.8b) = r d U ϕ dr + du ϕ dr U ϕ r. (6.8c) We are dealing with ordinary differential equations. The first of them, i.e., (6.8a) can be integrated and gives the analytical solution U r = c 1 r 1. This solution can be substituted into equation (6.8b). This results in dp dr =, which means that pressure is constant inside the entire flow domain p = c. The last equation (6.8c) is simply an ordinary differential equation in terms of U ϕ. Its solution takes the form U ϕ = c 3 r 1 + c r. Finally, the system (6.8) is integrated to U r = c 1 r, p = c, U ϕ = c 3 r + c r. (6.83a) (6.83b) (6.83c) In view of the axial symmetry, the biharmonic equation ψ = (6.77) simplifies to ( 1 d r d ( ( 1 d r dψ ))) =. (6.8) r dr dr r dr dr This is also the case with the strain rate tensor (6.79), which takes the following form ( ) D = 1 U ϕ r U r r Uϕ r 1 U ϕ r Uϕ r U r r. (6.85) Following the same line of reasoning, the dissipation function (6.8) simplifies to φ µ = µ ( ) ( ) ( ) ) ψ ψ ( r + ϕ ϕ r ψ ψ + r r ϕ r r ψ r. (6.86) Dissipation functional The assumption of axial symmetry results in a set of identical streamlines f which depend only on the coordinate r. The following form of stream function ψ may be proposed ψ(r, ϕ) := ϕ f(r). (6.87) τ It cannot be determine whether this function satisfies the biharmonic equation (6.8), since the function f is unknown. The problem is now reduced to the search for the

89 88 6. Variational calculus single variable function f instead of two-variable function ψ. The form of ψ (6.87) is fully determined by f. The dissipation function (6.86) or (6.8) takes the following form by virtue of (6.87) φ µ = µ ( r τ + r (f rf ) ). (6.88) The dissipation power (6.81) may now be rewritten as an iterated integral for any pitch τ (see figure 6.1) τ R N d = φ µ r dr dϕ. (6.89) R 1 What is important, is that the form (6.87) allows us to integrate the dissipation power once. This is because it explicitly depends on ϕ. On the basis of equation (6.88) and (6.89), we have N d = µ R 1 ( τ r 3 + r (f rf ) ) dr. (6.9) R 1 The above integral is a certain functional which depends on the radius r and the function f together with its derivatives (up to the second). Symbolically, it is written as N[f] = R R 1 F (r, f, f, f ) dr. (6.91) The necessary condition for the optimum of this functional, in the general case with unconstrained ends, takes the form (6.8). Therefore the optimisation problem consists in the search for a streamline f which minimises the functional (6.91). The form of the function f results from the necessary condition (6.8). This condition can be simpler, if certain additional assumptions are introduced. This is discussed later Dissipation functional vs. equations of motion The method presented here consists in choosing the function f (streamlines) which would minimise the functional (6.9). However, the essential question is whether the solution obtained by minimising the functional satisfies the equations of motion (6.8). In order to answer this question, we need the functional which yields the Stokes equations as a result of a necessary condition. The general form of this functional is ( J = ρ U ) U ρg U p U + µd dω. (6.9) t Ω The necessary condition δj = yields the Stokes equations ρ U t = ρg p + µ U. In this case we deal with steady state flow t = and we neglect mass forces. If so, then the functional (6.9) simplifies to ( J = p U + µd ) dω. (6.93) Ω

90 6.3. Variational method of finding streamlines in ring cascades for creeping flows 89 From the necessary condition we obtain equation of motion in absolute notation p = µ U. We also know from solution (6.83) of the system (6.8) that pressure is constant and therefore p = and J = Ω µd dω. This means that J = N d where N d is defined by means of (6.88) (6.9). This simply guaranties that the minimisation of the dissipation functional N d, which yields the streamlines f, leads to a solution that satisfies the Stokes equation (for a constant pressure). Additionally, one may take under consideration only the cases with one end unconstrained and with both ends partly constrained (when angles are known). This will be discussed further. The above reasoning does not apply to the Navier-Stokes equations, since they are nonlinear and there is no classical variational formulation such as (6.9). However, there is a non-classical variational formulation which can be used for the nonlinear Navier-Stokes equations. This also means that dissipation is not the only component of the functional and there is no guarantee that the streamlines f, which arise from the minimisation of the functional, satisfy the equations of motion Streamlines Both ends constrained Both ends constrained means that the angle α and position γ are known at the inlet and the angle β and position γ 1 (figure 6.1) are known at the outlet. From the necessary condition (6.8) we obtain the Euler equation in the following form F f d F dr f + d F dr =. (6.9) f Since both ends are constrained, so the appropriate variations δf Ri = and δf Ri =. From the Euler equation (6.9) for functional F, we obtain an ordinary differential equation of the fourth order f r 3 f f + + f IV =. (6.95) r r This equation should be solved together with the following boundary conditions f(r 1 ) = γ 1, f(r ) = γ, f (R 1 ) = tan β, f (R ) = tan α. The general solution of equation (6.95) is the function f (streamline) f(r) := C 1 + C r + C 3 ln r + C r ln r. (6.96) It can be easily verified that the solution (6.96) satisfies the biharmonic equation (6.8). After calculating the stream function (6.87), velocities U r, U ϕ and pressure p, it arises that the second equation of motion (6.8b) gives = C τ 1. This means that the pressure does not satisfies the axial symmetry condition and the problem with both ends constrained it too general (too stiff). In addition, all the following solutions must satisfy the condition C =. Only then, the axial symmetry condition is satisfied for all the variables. Finally, the most general form of the solution of equation (6.95) has the form f(r) := C 1 + C r + C 3 ln r. (6.97)

91 9 6. Variational calculus One end partly constrained Two cases are possible. In the first one we know of the one of angles α or β. In the second, we know position γ 1 or γ. Known angle. To be more precise, we know both angles: the inlet angle α and the outlet angle β. We look for one of the positions γ i. This requires δf Ri = and δf R1 or δf R. From the necessary condition (6.) we obtain an additional equation ( F f d ) F dr f =. (6.98) r=r i The solution must satisfy this condition together with the Euler equation (6.9). It can be shown that the additional condition (6.98) for the functional F can be reduced to C r Ri = which yields C =. Therefore, the solution (6.96) takes the form (6.97). This means that the problem with one end partly constrained (with constrained position) is well formulated. The known position serves as a reference point and its value has no significance, owing to the axial symmetry of the function f. The boundary conditions take the form of f (R 1 ) = tan β, f (R ) = tan α. For a matter of simplicity, the additional reference point may be assumed as f(r ) =. In so doing we deal with two partly constrained ends (constrained position). The solution of equation (6.97) together with the discussed boundary condition has the form f(r) := R 1R (R 1 tan α R tan β) ln r R ( ) r R (R tan α R 1 tan β) (R1 R ). (6.99) By using formula (6.87) and the definition of the stream function it can be shown that velocity U r = τ 1 r 1. This means that the constant c 1 in equation (6.83a) equals c 1 = τ 1. The velocity is then ( 1 U ϕ = τ (R1 R ) r (R 1 tan β R tan α) + R ) 1R (R 1 tan α R tan β), r (6.1) which means that constants in equation (6.83c) takes the form c 3 = R 1R (R 1 tan α R tan β) τ (R 1 R ), (6.11a) c = R 1 tan β R tan α τ (R 1 R ). (6.11b) Exemplary streamlines, obtained from equation (6.99), are shown in figure 6.11, where α = 8. The outlet angles vary from 8 to 8. Known position. More precisely, both positions are known: the inlet γ and the outlet γ 1. We look for either the inlet angle α or the outlet angle β. This requires that

92 6.3. Variational method of finding streamlines in ring cascades for creeping flows 91 variations δf Ri = and δf R1 or δf R. From the necessary condition (6.) we obtain an additional equation in the following form F =, (6.1) r=ri f which must be satisfied together with the Euler equation (6.9). The additional equation (6.1) for the functional F simplifies to C 3 = C r Ri. This lead to the following form of the streamline f(r) := C 1 + C r + C ( R i + r ) ln r. (6.13) The above solution does not posses the admissible form (6.97). This means that pressure is not axially symmetric. Therefore, the case with one end partially constrained (in the form of known angle) is too stiff and hence not well formulated (C 3 = C Ri ) y y x 1 1 x Figure 6.11: Streamlines as a function of β for α = 8 for R R 1 = Figure 6.1: Streamlines as a function of inlet angle α for R R 1 = One end unconstrained Here we know either the inlet position γ and the inlet angle α or the outlet position γ 1 together with the outlet angle β. This requires that variations δf Ri = and δf Ri. Apart from the Euler equations (6.95), additional conditions (6.98) and (6.1) must be fulfilled. This is the combination of the two previously discussed cases, where C 3 = C Ri =. From equation (6.96) follows the general solution f(r) := C 1 + C r. (6.1) The specific solution of (6.1) must satisfy the following boundary conditions f(r i ) = γ i, f (R i ) = tan where R i {R 1, R }, {α, β}. From this conditions we obtain the specific solution f(r) := γ i + r R i R i tan. (6.15)

93 9 6. Variational calculus The solution is valid both for the unconstrained inlet and the unconstrained outlet. From equation (6.87) and the definition of stream function, it follows that velocity U r = τ 1 r 1, which means that c 1 in equation (6.83a) c 1 = τ 1. The velocity τ 1 tan, which means that constants in teh solution (6.83c) take the i τ 1 tan. The dissipation power as a function of the inlet or outlet angle can be calculated on the basis of equations (6.9) and (6.15). In both cases, this power is constant and for a pitch τ = it equals U ϕ = rr 1 i form c 3 = i c = R 1 N d = µ ( 1 R 1 1 R ). (6.16) The streamlines corresponding to the solution (6.15) are shown in figure 6.1. This is the case with the unconstrained outlet. The shortest streamlines are obtained for the angles. The larger the angle the longer the wraparound angle Summary It is possible to find an analytical solution of the Stokes equation for an axially symmetric geometry in terms of velocity field. This can be done by direct integration of the system (6.8). Furthermore, it is even possible to find a solution of the biharmonic equation (6.8) using the proposed decomposition of the stream function (6.87). Basing on these solutions, one cannot determine whether further relaxations of the inlet and outlet are possible, since there are no additional conditions that can be imposed on the solution. A far more general method is presented here that allows to overcome these difficulties. This method consists in the minimisation of a dissipation functional by means of the variational calculus. This allows to formulate additional conditions to be imposed on the solution. Moreover, this method allows to obtain further solutions depending on how the inlet and the outlet are fixed and to find the solutions which are too stiff. 6. Minimum drag shape bodies moving in inviscid fluid 6..1 Problem formulation Figure 6.13 presents a moving object is steady, inviscid and incompressible fluid at a constant speed U. The resistance is considered only on the peripheral of a moving object. Two cases are considered here, namely two-dimensional and three-dimensional. As for the former, the shape of the body is symmetrical with respect to the x axis, while in the latter one deals with an axial-symmetry with respect to the same axis x.

94 6.. Minimum drag shape bodies moving in inviscid fluid 93 U y y α Un y(x) :=? α x (x, y ) x Figure 6.13: Moving object 6.. Fluid Resistance Drag force Drag force is the components, directed towards the body velocity, of the total force R exerted on the moving body by the fluid. Formally, the total force is defined by means of a surface integral of stress vector over a considered body s surface S. In the absence of viscous (tangential) stresses this can be expressed by means of the normal stresses contribution R = (p p )ˆn ds. (6.17) S For further consideration it is necessary to specify a unit normal vector ˆn to the surface S. If the surface S is given explicitly S := {(x, y, z) : z = f(x, y)} or implicitly S := {(x, y, z) : F (x, y, z) := f(x, y) z = } the unit normal vector is ˆn = F F = ( z x, z y, 1). (6.18) 1 + zx + zy The above formula takes simpler form for both: two-dimensional and three-dimensional axisymmetric case ˆn = ( y, 1), (6.19) 1 + y where the unit normal vector ˆn applies to the curve l which is given by l := {(x, y) : y = f(x)} or l := {(x, y) : F (x, y) := f(x) y = } Pressure coefficients and its approximation In order to determine drag force it is necessary to know the distribution of pressure difference p p. The pressure may be determined when, for instance, the pressure coefficient distribution c p is known c p := p p 1 ρu. (6.11) The Newtonian approximation for the distribution of pressure coefficient is given by c p = sin α and together with definition (6.11) yields p p = ρu sin α. This

95 9 6. Variational calculus makes it possible to determine the optimal shape of a body moving in inviscid fluid in the sense of minimum drag Two-dimensional problem In the case of two-dimensional flows equation (6.17) is reduced to the following form R := (p p )ˆn dl = ρu ˆn sin α dl, (6.111) l where the curve l starts from (, ) and ends at (x, y ), see figure From the same figure it arises another geometrical relation, namely sin α = l dy dx + dy = y. (6.11) 1 + y In order to convert curvilinear integral to single integral it is necessary to take advantage of arc differential dl = 1 + y dx y x.6 x x.75 Bézier x + 1 Figure 6.1: Optimal shapes Drag force R x comes directly from equations (6.19) and (6.111) R x := ρu x y 3 dx (6.113) 1 + y and can be interpreted as a certain functional J. The specific value of that functional depends on a curve of interest y. A constant value ρu is regarded here as a multiplier. This leads to the following form of a functional J J[y] := x y 3 x 1 + y dx = F (y, y ) dx. (6.11) The necessary condition for the optimum of the above functional J results in the Euler equation F y d dx F y =. (6.115)

96 6.. Minimum drag shape bodies moving in inviscid fluid 95 From the Euler equation (6.115) for the functional J, we obtain an ordinary differential equation y y (y 3) =. There are four solutions to this equation. The first trivial solution y = C 1 does not fulfil boundary conditions. For a specific case when C 1 = and y(x) := one obtains a solution characterised by zero drag. The second and third solutions, namely y(x) := C 1 ± 3x, do not fulfil one boundary condition in general. For instance, if y() = then C 1 = and it is typically impossible to fulfil y(x ) = y. The fourth solution y(x) := C 1 + C x satisfies both boundary conditions y() =, y(x ) = y. Constants are determined to be C 1 =, C = y x 1. This results in the following solution y = x. (6.116) y x Introducing dimensionless variables x + := x x 1 and y + := y y 1 we have somewhat simpler form y + = x +. The above solutions is shown in figure 6.1. It is simply a straight line. Furthermore, an isosceles triangle is a two-dimensional body of a minimum drag. 6.. Three-dimensional problem Functional and Euler equation In the case of two-dimensional surfaces S of revolutions equation (6.17) is reduced to the following form R := (p p )ˆny dl (6.117) l where the curve l is peripheral of surface S. Equation (6.117) is valid for axisymmetric problems. This results in different drag force equation R x in comparison with equation (6.113) R x := ρu x y 3 y dx. (6.118) 1 + y Neglecting constant multiplier ρu makes it now possible to express the functional J as x y 3 y J[y] := dx. (6.119) 1 + y Following the same line of reasoning we have the necessary condition for the optimum of the above functional J. This results in the same Euler equation (6.115). This time, however, we obtain slightly more complicated ordinary differential equation y ( y y ( y 3 ) y y ) =. (6.1) This is an obvious consequence of a more complex form of a functional J. Solution to this equation has to fulfil the same boundary conditions as previously y() =, y(x ) = y.

97 96 6. Variational calculus 6... Exact pseudo solution A solution to nonlinear equation (6.1) may not be unique. It can, however, be integrated once setting simultaneously y = u(y) and y = u u. This leads to another nonlinear equation C 1 y y 3 = (1 + y ) this time of the first order. This equation can be classified as Lagrange equation and the parametric solution is x(p) := 1 ( 3 C 1 p + 1 ) p ln p + C, (6.11a) ( ) 1 + p y(p) := C 1 p 3, (6.11b) where p := y is a parameter and derivative at the same time. It can be easily demonstrated keeping in mind that dy dx = dy dp / dx dp = p. Unknown constant in equation (6.11) should satisfy boundary conditions. Solution (6.11) is of no practical value. This is because of nonlinear and non-unique character of equation (6.1). It possible to find the optimal solution only within the range of x + [.1, 1]. Consequently, it is impossible to find the most interesting part of the solution around x Approximate solution due to the functional The differential equation for the functional (6.1) has a complicated form. This means that it is extremely difficult to give an explicit solution. The classic approach to this problem is to simplify the form of a functional (6.119). It is assumed that y 1 and hence y This assumption is not true when x + where one would expect y (x) as x +. This is because of smoothness of the axisymmetric solution. However, far from the point of stagnation the discussed simplification is justified. If so, then the form of functional (6.119) can be simplified J[y] := x y 3 y dx. (6.1) The necessary condition for the optimum of the above functional J results in simpler Euler equation. y (y + 3y y ) =. There are two solutions to this equation. The first trivial solution y = C 1 does not fulfil boundary conditions. The second solution is y(x) := C (x 3C 1 ) 3. Applying boundary conditions y() =, y(x ) = y we have ( ) 3 y x =. (6.13) y x The same in dimensionless variables yields y + = (x + ) 3. Solution of (6.13) has the form of a parabola and it is shown in figure Approximate solution due to form of the function It can be easily verified that following curve y + = (x + ) 1 gives even smaller value of the original functional (6.119) in comparison with y + = (x + ) 3 that minimises functional

98 6.. Minimum drag shape bodies moving in inviscid fluid 97 (6.1). This suggest that instead of simplified version of functional (6.119) one can consider certain class of functions. The natural candidate is y + = ( x +) n. (6.1) The unknown exponent n should be n ], + [. In spite of appearances, this is a fairly wide range of solutions, which can take the form of from a thin picket to almost a tube. The class of functions (6.1) satisfies boundary conditions in dimensionless form y + () =, y + (1) = 1. Known form of the function (6.1) allows to transform the variational problem to the classic problem of function optimisation. One looks for the optimal value of the exponent n. This method is somewhat similar to the Ritz method. Its approximate nature is in the fact that the family of assumed function (6.1) does not have to incorporate the exact solution of a functional (6.119). Functional (6.119) now takes the following form 1 J + n 3 (x + ) n 3 := 1 + n (x + ) n dx+ (6.15) and can be only integrated numerically. Figures 6.15 presents values of functional J + as a function of exponent n according to equation (6.15). It is now apparent that exponent 3, being an optimal solution of simplified functional (6.1), is not the best solution when it comes to original functional (6.119). The optimal exponent within the class of functions (6.1) is n.6. This leads to the optimal parabola y + = (x + ).6 shown in figure J n. Figure 6.15: J + values as a function of the exponent n Approximate solution by means of a Bézier curve Another approach to minimisation of functional (6.119) is to discretise the variational problem by means of Bézier curves. This means that the original problem f : C [;1] R is now reduced to single variable optimisation f : R D R where C [;1] := {f : f, f, f : [; 1] R are continuous}. (6.16) Figure 6.16 shows an example of Bézier curve described by means of five points. First and last point are fixed as well as the x coordinate of the second point. The former

99 98 6. Variational calculus geometrical constrain is necessary in order to keep the surface of revolution smooth. The assumed geometrical constraints results in five independent variables, i.e., D = 5. The objective function is subjected to box constraints Ω := { x R D : L i x i U i }. (6.17) Differential Evolution was chosen in order to solve the optimisation problem. Uniform random initialisation within the search space with random seed based on time was considered. The algorithms stopped when the number of generations n max = 3 was reached. The total number of solution per generation (population size) was N =. The scale parameter F of DE and the crossover probability C are listed in table 6.1 together with other optimisation parameters such as lower L and upper U box constraints. Table 6.1: Basic parameters of DE Value D 5 N n max 3 F.9 C.7 L (,.1,,.5, ) U (1,.5, 1, 1, 1) The optimal Bézier curve, resulting in the lowest value of functional J, is shown in figure 6.16 which can be compared with other solutions in figure y x + Figure 6.16: Optimal Bézier curve 6..5 Summary Two new approaches and solutions to minimum drag shape body problem are introduced. Both transform the original variational problem to the classic problem of function optimisation. First approach is possible due to assumption of certain class

100 6.. Minimum drag shape bodies moving in inviscid fluid 99 of function, namely power law shapes. This makes it possible to find the optimal value of the exponent n =.6 being better than the classic n = 3. Even better solution can be accomplished by means of a Bézier curve leading to discretised functional optimisation. All solutions can be compared and ordered by means of the drag coefficient c d = R x 1 ρu A (6.18) where A = y is the reference area. Table 6. presents the drag coefficient ratios where the reference drag c dc has been calculated for cone. It is clear that the classic power law shape with the exponent n = 3 is one of the worst. Modern global optimisation methods such as DE can produce better solutions characterised by lower drag. Table 6.: Drag coefficients ratios Curve c d c dc y + = x + 1. y + = (x + ) 3.88 y + = (x + ) 1.85 y + = (x + ).6.83 Bézier.769

101 Chapter 7 Multi-objective optimisation Multi-objective optimisation problems rely on simultaneous minimisation or maximisation more than one objective function. In the case of single objective optimisation we deal with only one function. Typically, real problems are multi-objective. Multiobjective optimisation gives as a results set of solution whereas single objective gives only one. 7.1 Definitions The vector f of n scalar functions f i is denoted as f := (f 1,..., f n ). If a point x in D-dimensional space R D is denoted by x := (x 1,..., x D ) then the n-dimensional objective fitness function value is f(x) := (f 1 (x),..., f n (x)). (7.1) From this it arises that the function f maps from D- to n-dimensional space Individual fitness functions f i, being components of f, map f : R D R n. (7.) f i : R D R. (7.3) 7. Domination Let us also introduce a subset N of natural numbers set N := {1,..., n} N of n numbers. Domination is a key concept for multi-objective optimisation. We say in the case of minimisation of function f that point x 1 dominates over point (or solution) x if f i (x 1 ) f i (x ) f i (x 1 ) < f i (x ) (7.) i N i N where x 1, x Ω R D. Here the set of all admissible solutions is denoted as Ω.

102 7.3. Scalarisation The Pareto set We say, by contradiction of definition (7.), that x 1 does not dominate (for minimisation) over x (or x is not dominated by x 1 ) if i N f i (x 1 ) > f i (x ) i N f i (x 1 ) f i (x ). (7.5) The Pareto set Π of the solution is a set of those points which are not dominated by others from the set of admissible solutions Ω. This can be denoted as { ( )} Π := x j Ω : x Ω f i (x) > f i (x j ) f i (x) f i (x j ). (7.6) i N i N 7.. The Pareto front The Pareto set Π is a set of points which are not dominated. The Pareto front P is a set of values with coordinates corresponding to Pareto set elements. It is denoted as 7.3 Scalarisation P := {f(x) R n : x Π}. (7.7) The basic idea relies on reducing (scalarising) a multi-objective optimisation function to a single objective function in order to use standard optimisation methods. discus briefly: method of weighted-sum, method of target vector, method of minimax Method of weighted-sum f : R D R n (7.8) f : R D R (7.9) Here a few popular methods are The most popular method is the method of weighted-sum. The n-dimensional vector of weights w := (w 1,..., w n ) is composed of individual weights w i [, 1] which can be optionally selected, provided that n w i = 1. (7.1) i=1 Values of individual weights represent the importance of a given function f i. Function f is obtained from f by the dot product of function f and vector of weights w in the form of n f(x) := w f(x) = w i f i (x). (7.11) i=1

103 1 7. Multi-objective optimisation Proper selection of weights produces convergence for individual elements of the Pareto set Method of target vector There are also target vector g := (g 1,..., g n ) or minimax methods. Both methods reduce the function f to f by means of f(x) := (f(x) g) W 1 α. (7.1) The symbol W represents a matrix of weights and it is usually diagonal of size n n. Vector g represents imaginary optimal values to where an algorithms tries to converge. For the target vector method the norm α is replaced by the generalised Euclidean one in the form ( n ) 1/α a α := a i α. (7.13) i=1 Usually α := and W := δ where δ represents the Kronecker delta. Equation (7.1) reduces to f(x) := n (f i (x) g i ). (7.1) Method of minimax i=1 It is a special case of the target vector method. For the minimax method we take so called maximum norm a α := max a i. (7.15) i N Reducing the matrix W to diagonal form we obtain f(x) := max i N f i (x) g i w i. (7.16) This methods relies on minimisation of maximum norm of difference between objective fitness function and target vector. Results are similar to those from the target vector method. 7. SPEA The algorithm SPEA [33] maintains dominated solutions in a separate set which is Pareto set Π i. New solutions (if any) are added to this set in each generation from the current population P i which are not dominated by others from this population of solutions P i. Subsequently, dominated solutions (if they exist) are excluded from Pareto set Π i which could change their status due to a new solution added to it from P i. The selection process of solutions is similar to an ordinary genetic algorithm with the difference that parents are selected from the current i th population P i which is

104 7.5. Examples 13 increased by individuals from Pareto set Π i. Crossover and mutation have identical form as in an ordinary genetic algorithm. The single value fitness function of an individual, which represents its adaptation, is calculated differently for Π i and P i. For a Pareto set Π i the fitness function for the j th individual has the form f(x j ) := Z i [, 1[ (7.17) P i + 1 where Z i denotes the number of elements of individuals from P i which are dominated by j th individual x j from Π i. For an individual x k from population P i we have f(x k ) := D i + 1 [1, Π i + 1] (7.18) where D i denotes the number of elements of Π i that dominate over x k. This means that mutual domination of individuals from P i is of no importance. 7.5 Examples Two objective fitness functions of a single variable Let us consider first a case of minimisation of simple test function. This function is composed of two single variable functions These functions are given by definitions f(x) := (f 1 (x), f (x)). (7.19) f 1 (x) := x +, f (x) := (x + 1) (7.a) (7.b) where x [, ] =: Ω, see figure f1, f 1 1 x Figure 7.1: Two objective functions of a single variable

105 1 7. Multi-objective optimisation Analytical solution It is possible to find analytical solution by means of weighted-sum method. This reduces the multi-objective problem to single objective. According to equation (7.11) we have f(x) = w 1 f 1 (x) + w f (x). The weights w i are not independent. They must fulfil equation (7.1). If we denote first weight as λ then the second may be expressed as 1 λ by means of (7.1). Thanks to the definitions (7.) the above equation may be rewritten as f(x) = λ(x + ) + (1 λ)(x + 1). The necessary condition for optimality f (x) = gives x = λ 1 which is the parametric representation of the Pareto set Π, see figure 7.1 (dots) Π := {(x) : x(λ) = λ 1; λ [, 1]}. (7.1) The independent variable x (i.e. the Pareto set) is limited to x [ 1, ] = Π. This is because equation (7.1) must fulfil the condition (7.1). The representation of the Pareto set Π = [ 1, ] through transformations f 1 and f take forms f 1 ([ 1, ]) = [, 3], f ([ 1, ]) = [, 1]. (7.a) (7.b) Now we can obtain a parametric description of the Pareto set front on a plane f 1 f by means of equations (7.1) and (7.), see figure 7. (solid line) P := {(f 1, f ) : f 1 (λ) := λ λ + 3, f (λ) := λ, λ [, 1]}. (7.3) Single objective reconstruction of Pareto set Choosing weights from the interval [, 1] and performing single objective optimisation for selected weights it is possible to reconstruct the Pareto set. For the 11 different values of weights listed in table 7.1 one may easily reconstruct the Pareto set. This set is shown in figure 7. (dots). Very good agreement between analytical and numerical solutions can be observed Multi-objective SPEA Parameters of binary representation genetic algorithms are listed in table 7.. Figure 7.3 shows all the solutions from the whole multi-objective optimisation process (left) and the Pareto set (right). Comparing figures 7. and 7.3 we can observe again good agreement between numerical and analytical solutions Two objective fitness functions of two variables Let us consider now a case of minimisation of another test function. This function is composed of two two variable functions f(x, y) := (f 1 (x, y), f (x, y)). These are given by definitions f 1 (x, y) := (x ) + (y ), f (x, y) := x + (y + ) where x := (x, y) [, ] =: Ω, see figure 7.. (7.a) (7.b)

106 7.5. Examples 15 Table 7.1: Pareto set reconstruction λ f(x) x f 1 (x) f (x) f f 1 Figure 7.: Reconstructed Pareto set of two objective functions of a single variable Table 7.: Basic parameters for Pareto set reconstruction for two functions of a single variable Value Chromosome length Bits per variable Tournament size Population size Crossover probability.9 Mutation probability.5 Number of generations 3

107 16 7. Multi-objective optimisation f f f 1 f 1 Figure 7.3: Reconstructed Pareto front of two objective functions of a single variable y x Figure 7.: Two objective functions of two variables Analytical solution Following the same logic as for optimisation of a single variable function described previously, we have f(x, y) = λ ( (x ) + (y ) ) + (1 λ) ( x + (y + ) ). (7.5) The necessary condition for optimality f = gives x = λ and y = λ, which is the parametric representation of the Pareto set Π, see figure 7. (dots) Π := {(x, y) : x(λ) = λ, y(λ) = λ ; λ [, 1]}. (7.6)

108 7.5. Examples 17 Due to the condition λ [, 1] we obtain the following sets for independent variables x and y x [, ], y [, ]. (7.7a) (7.7b) The representation of these sets through transformations f 1 and f take exactly the same form f 1 ([, ] [, ]) = f ([, ] [, ]) = [, ]. (7.8) The parametric description of the Pareto set front P on a plane f 1 f may be obtain by means of equations (7.) and (7.6), see figure 7.5 (solid line) P := {(f 1, f ) : f 1 (λ) = (λ 1), f (λ) = λ, λ [, 1]}. (7.9) f f 1 Figure 7.5: Reconstructed Pareto set of two objective functions of two variables Single objective reconstruction of Pareto set Choosing weights from the interval [, 1] and performing single objective optimisation for selected weights one can reconstruct the Pareto set. For the 11 different values of weights listed in table 7.3 one may reconstruct the Pareto set which is shown in figure 7.5 (dots). As previously very good agreement between analytical and numerical solutions may be noticed Multi-objective SPEA Parameters of binary representation genetic algorithms are listed in table 7.. Figure 7.6 shows all the solutions from the whole multi-objective optimisation process (left) and the Pareto set (right). Comparing figures 7.5 and 7.6 we can observe again good agreement between numerical and analytical solutions. In a case where we deal with more than one independent variable the chromosome length is larger as well. This forces the population size to be larger, respectively.

109 18 7. Multi-objective optimisation f 3 3 f f 1 f 1 Figure 7.6: Reconstructed Pareto front of two objective functions of two variables Table 7.3: Pareto set reconstruction λ f(x, y) x y f 1 (x, y) f (x, y) Table 7.: Basic parameters for Pareto set reconstruction for two functions of two variable Value Chromosome length 3 Bits per variable 15 Tournament size Population size 3 Crossover probability.9 Mutation probability.1 Number of generations 5

110 7.6. Multi-objective description of Murray s law Multi-objective description of Murray s law Introduction The two Murray s laws describe the pattern of large to small or conversely small to large artery bifurcation. This is due to the optimal configuration of arteries that allows for fastest transport with minimal work involved. Murray s laws are valid for the tree structure of arteries, see figure 7.7. One of Murray s laws gives a formula for the radii, whereas the second law specifies the angles of bifurcation. Both laws were formulated for arteries but they are not limited to this venue. Some technical applications also exist. Single criterion minimisation methods were used for the derivation. A generalisation of Murray s law for multi-objective formulation is proposed here. It is shown that the original formulation of the optimal condition is a particular case of multi-objective formulation. Figure 7.7: Tree structure The original Murray s reasoning takes into consideration two energy terms that contribute to maintaining the blood flow. These are the energy necessary for overcoming the viscous drag (dissipation energy) and the metabolic power necessary for maintaining the volume of blood within an artery. For steady state flows it is more convenient to use dissipation power N instead of energy E. These quantities are explicitly related as E = t N dt = N t. The dissipation power is given by N d = V p. By means of Hagen-Poiseuille s law and the definition of constant A := 8µL 1 it is possible to express this power as N d = A V R. The metabolic power is expressed as N m = m V where m is a metabolic coefficient and volume V is given by V = R L. Introducing another constant B := Lm we can rewrite the metabolic power N m in the following form N m = B R. The equation for N d suggests that dissipation power N d is related inversely, and metabolic power N m directly to radius R. It suggest that there exists an intermediate radius which minimises the total power N = N d + N m. This total power may be expressed as N = A V R + BR. For a given V the total power N is a function of R. The stationary point can be found from the condition N (R) = which gives R = V 1/3 C 1/3 or V = CR 3, (7.3) where constant C is combined of A and B as C := 1/ A 1/ B 1/. The sign of the second derivative is N ( V 1/3 C 1/3 ) = 1B. This is because A and B are

111 11 7. Multi-objective optimisation positive. The solution (7.3) represents a constant relation between volumetric flow rate and the radius in every cross-section of an artery. This is also a condition for minimal energy requirement. The mass conservation equation gives us information that the flow rate before any bifurcation equals the sum of individual flow rates after that bifurcation V = V i i = CR. 3 This is true for incompressible and steady state flows. Equation (7.3) allows us to write V i i = C i R3 i. This is true because before and after bifurcation we deal with the same fluid which means that we have the same constant C. The above two equations give us Murray s law which states that the cube of the radius of the parent artery equals the sum of the cubes of the radii of the daughter arteries. This is written as R 3 = Ri 3. (7.31) i In the case of bifurcation this simplifies to R 3 = R R 3. It is the most widespread form of Murray s law and it is known that a large part of the branching of the mammalian circulatory and respiratory systems obeys it. Assuming that Murray s law is valid it is possible to evaluate the metabolic coefficient m. Using the definitions of A, B and C one can show that this coefficient may be written as m = 16 µc. Since equation (7.3) is valid for every branch it allows us to determine the value of constant C. Eventually, the metabolic coefficient is given by means of the following equation m = 7.6. Multi-objective description 16µ V R 6. (7.3) Since the Murray reasoning takes into consideration two powers (objective functions) it is then a natural multi-objective optimisation problem. A whole set of optimal solutions known as the Pareto set is obtained as a solution of such a problem. The problem considered by Murray (sum of two powers) is just one particular scalarisation method. A simultaneous optimisation of two powers (objective functions) in the form N := (N d, N m ) results in non-dominated set of solutions (Pareto set). It is possible to obtain an analytical solution describing the Pareto front P. Taking advantage of the weighted-sum method we can obtain a parametric representation of the solution where λ is a parameter. The scalarised form of the objective function is obtained as N := w N and takes the shape of N := λn d + (1 λ)n m. The necessary condition for optimality N (R) = gives us a formula which is analogous to (7.3) V = ( λ 1 1 ) CR 3. (7.33) According to Murray the dissipation power N d, and the metabolic power N m, have the identical contribution in the total power N. This corresponds to the situation where the weight in equation (7.33) equals λ = 1. In the multi-objective description it means that both powers are equally important. However, it follows from equation (7.33) that it does not have to be so. If we incorporate equation (7.33) into mass

112 7.6. Multi-objective description of Murray s law 111 conservation equation V = i V i we obtain the well known form of Murray s law (7.31). This means that one of the powers can have a larger share than the other. A share means weights λ and 1 λ for any λ ]; 1[. The above reasoning generalises Murray s law. Radius R may be found from equation (7.33) and incorporated into equations N d = A V R i N m = B R. Rearranging and introducing the dimensionless powers N + d, N + m we have N + d := N d (AB V ) 1 3 = ( 3 λ 1 1 ) 3, (7.3a) N m + N m ( := (AB V = 1 ) 1 3 λ 1 1 ) 3. (7.3b) 3 The Pareto front P takes the following form { P = (N + d, N m) + : N + d := ( 3 λ 1 1 ) 3, N m + ( := 1 3 λ 1 1 ) } 3. (7.35) The Pareto front is shown in figure 7.8. The point with weight λ = 1 is marked N + m 3 1 (, ) λ =.57 λ = N + d 3 1 Figure 7.8: The Pareto front Instead of choosing a solution that treats both powers as equally important (λ = 1 ) it is also possible to apply the target vector method. The target vector is chosen as an imaginary optimum. A natural choice for this optimum is such a vector for which both powers equal zero. This is really an imaginary vector because the dissipation power for viscous flows does not equal zero. However, it is assumed the target vector g = (, ). The vector is localised in the centre of coordinates system in figure 7.8. The scalarisation N to N is done by means of norm N := N g because W := δ. The norm is just Euclidian norm N = (N d + N m) 1/. It is assumed that the target vector g = simplifies also calculations. The necessary condition for optimality N (R) = gives similar equation to (7.33) V = 1 CR 3. (7.36)

113 11 7. Multi-objective optimisation The weight of this solution equals λ.57 and is shown in figure 7.8. What is interesting, the point (N + d, N + m) for this weight is not placed closer to the target vector (, ). This is because the Pareto front converges faster to its own vertical asymptote rather than to the horizontal one. It is well visible in figure 7.8. Again, using the formula (7.36) and the mass conservation equation, it is possible to show that Murray s law takes the standard form (7.31). What is more, one can observe that the solution (7.36) obtained by means of the target-vector method is another particular case of that obtained by means of weighted sum method (7.33). The same concerns the original Murray s solution for which λ = 1.

114 Chapter 8 Statistical analysis 8.1 Distributions The initial population is crucial in order for many multi-point algorithms to perform properly. Typically, uniform random distribution of points x i within the search space with random seed based on time is considered, namely x i := L + (U L) U(, 1). (8.1) The above distribution was used exclusively as an initialisation for all algorithms in this chapter. Example realisation of random sequence (8.1) is shown in figure 8.1 for 1 points. What is more, uniform distributions should be considered when there are no information of the optimum location. Halton sequence [11] generates deterministic distribution of points x i that look random and more uniform in comparison with uniform random distributions according to equation (8.1), see figure 8.. Halton sequence, in fact, is referred to as a quasirandom sequence. Listing 8.1 presents the pseudocode of Halton sequence. Input: i, p Output: h 1 h := ; f := 1; 3 j := i; while j > do 5 f := f p ; 6 h := h + f (j mod p); 7 j := j p ; Algorithm 8.1: Halton sequence pesudocode If there are some information about the possible location of the optimum, other distributions than uniform may provide better convergence of a considered algorithm.

115 11 8. Statistical analysis For instance normal distribution with a scale parameter α is an obvious choice x i := 1 (L + U) + a (U L) N (, 1) (8.) Example realisation of uniform random sequence is shown in figure 8.3 (α = 1 5 ) and figure 8. (α = 1 1 ) for 1 points. The lower the parameter α the more concentrated normal distribution around 1 (L + U) y y x 3 x 3 Figure 8.1: Uniform distribution Figure 8.: Halton sequence y y x 3 x 3 Figure 8.3: Normal distribution α = 1 5 Figure 8.: Normal distribution α = 1 1

116 8.. Discrepancy Discrepancy Discrepancy is a measure of irregularity and it is helpful to inspect similarity or regularity of the population. Discrepancy for a population P of n points in an m-dimensional unit cube [, 1[ m is defined P = {x 1,..., x n } (8.3) D = sup {D(J, P ) : J T } (8.) where the local discrepancy, D is D(J, F) = 1 N {x i P : x i J} Vol J. (8.5) Vol J is a measure of subinterval J of the form J := m [, j i [ (8.6) and T is the family of all (discrete) subintervals of unit cube [, 1[ m. We have i=1 < D 1. (8.7) The value close to represents a random population and is typical of first generations. Regular populations possess values close to Single-problem statistical analysis The test suite [15] includes 8 functions f i : R D R that are typically used as benchmarks where x = (x 1,..., x D ). They appear in Special Session & Competition on Real-Parameter Single Objective Optimization at CEC-13. All test functions are shifted and scalable and the same search ranges are defined for all test functions, namely [ 1, 1] D which means that L k = 1, U k = 1. By means of 8 functions the calculations are executed 3 times for each algorithm and the average of error of the best individuals of the population is computed. For a solution x the error measure is defined as for Error := f i (x) f i (x ), (8.8) x = min x f i (x) (8.9) being the optimum of the particular function f i. All algorithms stop when the number of generations n max is reached. The total number of generations n max is related to the maximal function evaluations number. Different cases in terms of D and the maximal evaluations number are considered. The total number of individuals is N = for all

117 Statistical analysis algorithms. The individual values of algorithms parameters are the same and shown in listings in appendix A. Statistical analysis over the test suit follows the method given in [8, 9]. The problem is referred to as a single-problem analysis and considers a comparison of several algorithms over a single function. The method comprise the use of both the parametric and non-parametric statistical tests. Because of the fact that the required conditions for using parametric test such as paired t-test are not fulfilled these tests are not considered here. Non-parametric tests such as Wilcoxon test are utilised instead. The required conditions in order to use parametric tests [8, 3] are: independence, normality and heteroscedasticity. As for independence it is fulfilled because we deal will independent runs of algorithms starting with randomly generated populations. Normality is never fulfilled because the results do not follow a normal distribution. This is also confirmed by means of Kolmogorov-Smirnov, Shapiro-Wilk and D Agostino-Pearson tests. The last condition, i.e., heteroscedasticity, is related to the hypothesis of equality of variances and is also not fulfilled according to Levene test. For instance, table 8. presents rankings based on means analogous to the Friedman ranks that correspond directly to the positions of algorithms P. It is evident that best solutions is obtained here by PSO. Comparing p-values from non-parametric Wilcoxon test in Table 8. it is obvious that PSO outperforms remaining algorithms because the all the p-values are below.5. The same is confirmed by means of box and whisker plot 8.7. However, table 8. shows that the best performing algorithms is DE/B as it outperforms all algorithms except DE/R.

118 8.3. Single-problem statistical analysis 117 Test function 1/8. Sphere function where f 1 (z) = D zi (8.1) i=1 z = x o. (8.11) The shifted global optimum o is randomly distributed in [ 8, 8] D. Properties: unimodal separable Figure 8.5: f 1 : R R

119 Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B..5 Error Table 8.1: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R 1 DE/B 1 PSO APSO APSO FA CS BA FPA/R FPA/B.6 GSA GSA Figure 8.6: Test function 1/8, D =, evaluations GA Error 5 1 Table 8.: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R DE/B PSO 1 APSO APSO FA CS BA FPA/R FPA/B GSA GSA Figure 8.7: Test function 1/8, D = 1, 1 evaluations

120 8.3. Single-problem statistical analysis 119 Test function /8. Rotated high conditioned elliptic function where f (z) = D i=1 1 6i 6 D 1 z i (8.1) z = T osz (M 1 (x o)). (8.13) Orthogonal matrices M 1, M,..., M 1 are generated from standard normally distributed entries by Gram-Schmidt orthonormalisation. T osz for x i is defined as T osz (x i ) := eˆxi+.9(sin c1 ˆxi+sin c ˆxi) sgn x i (8.1) where Properties: unimodal non-separable quadratic ill-conditioned smooth local irregularities { ln x i if x i, ˆx i = otherwise, { 1 if x i >, c 1 = 5.5 otherwise, { 7.9 if x i >, c = 3.1 otherwise. (8.15) (8.16) (8.17) Figure 8.8: f : R R

121 1 8. Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R Error 6 8 Table 8.3: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R 1 DE/B PSO APSO APSO FA CS BA FPA/R FPA/B GSA FPA/B GSA Figure 8.9: Test function /8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B.5 Error Table 8.: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R.1 DE/B 1 PSO APSO APSO FA CS BA FPA/R FPA/B GSA GSA Figure 8.1: Test function /8, D = 1, 1 evaluations

8.3. Single-problem statistical analysis 11 Test function 3/8. Rotated bent cigar function. where and f 3 (z) = z 1 + 1 6 D i= z i (8.18) z = M T.

122 8.3. Single-problem statistical analysis 11 Test function 3/8. Rotated bent cigar function. where and f 3 (z) = z D i= z i (8.18) z = M T.5 asy (M 1 (x o)) (8.19) T β asy(x i ) := Properties: unimodal non-separable smooth but narrow ridge {x 1+β i 1 x i D 1 i if x i >, otherwise. x i (8.) Figure 8.11: f 3 : R R

123 1 8. Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R Error 6 8 Table 8.5: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R DE/B 1 PSO APSO APSO FA CS BA FPA/R FPA/B GSA FPA/B GSA Figure 8.1: Test function 3/8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B.5 Error Table 8.6: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA 3.9 DE/R 1 DE/B PSO APSO APSO FA.3 CS BA FPA/R FPA/B GSA.95 1 GSA Figure 8.13: Test function 3/8, D = 1, 1 evaluations

124 8.3. Single-problem statistical analysis 13 Test function /8. Rotated discus function where f (z) = 1 6 z 1 + D zi (8.1) i= z = T osz (M 1 (x o)). (8.) Properties: unimodal non-separable asymmetrical smooth local irregularities with one sensitive direction Figure 8.1: f : R R

125 1 8. Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R Error 6 8 Table 8.7: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R DE/B 1 PSO APSO APSO FA CS BA FPA/R FPA/B GSA FPA/B GSA Figure 8.15: Test function /8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA Error 6 1 Table 8.8: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R.6 DE/B 1 PSO APSO1 3.1 APSO FA CS BA.3 FPA/R FPA/B GSA Figure 8.16: Test function /8, D = 1, 1 evaluations

126 8.3. Single-problem statistical analysis 15 Test function 5/8. Different powers function f 5 (z) = D i + z i D 1 (8.3) i=1 where Properties: unimodal separable z = x o. (8.) Figure 8.17: f 5 : R R

127 16 8. Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B. Error.. Table 8.9: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R DE/B 1 PSO APSO APSO FA CS BA FPA/R FPA/B GSA GSA Figure 8.18: Test function 5/8, D =, evaluations GA Error 5 1 Table 8.1: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R DE/B PSO 1 APSO APSO FA CS BA FPA/R FPA/B GSA GSA Figure 8.19: Test function 5/8, D = 1, 1 evaluations

128 8.3. Single-problem statistical analysis 17 Test function 6/8. Rotated Rosenbrock s function where f 6 (z) = D 1 i=1 ( 1(z i z i+1 ) + (z i 1) ) (8.5) ( ).8 z = M 1 (x o) + 1. (8.6) 1 Properties: multi-modal non-separable very narrow valley from local to global optimum Figure 8.: f 6 : R R

129 18 8. Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA Error 6 1 Table 8.11: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R 3.1 DE/B 1 PSO APSO APSO FA CS BA FPA/R FPA/B.97 GSA Figure 8.1: Test function 6/8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B Error Table 8.1: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R.95 1 DE/B PSO 1 APSO APSO FA CS BA FPA/R FPA/B GSA GSA Figure 8.: Test function 6/8, D = 1, 1 evaluations

130 8.3. Single-problem statistical analysis 19 Test function 7/8. Rotated Schaffers F7 function where f 7 (z) = ( 1 D 1 D 1 i=1 z i = ( zi + z i sin ( 5zi. )) ), (8.7a) y i + y i+1 (8.7b) y = M Λ 1 T.5 asy (M 1 (x o)). (8.8) The D-dimensional diagonal matrix Λ α is given by the i th diagonal element Properties: multi-modal non-separable asymmetrical large number of local optima Λ α ii = α i 1 D. (8.9) Figure 8.3: f 7 : R R

131 13 8. Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA 1 Error 3 Table 8.13: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R DE/B 1 PSO APSO APSO FA CS BA FPA/R FPA/B GSA Figure 8.: Test function 7/8, D =, evaluations GA 1 Error 3 Table 8.1: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA 1 DE/R DE/B PSO APSO APSO FA.9 CS BA FPA/R FPA/B GSA GSA Figure 8.5: Test function 7/8, D = 1, 1 evaluations

132 8.3. Single-problem statistical analysis 131 Test function 8/8. Rotated Ackley s function f 8 (z) = exp. D zi D exp D i=1 i=1 cos z i D + + e (8.3) where Properties: multi-modal non-separable asymmetrical z = M Λ 1 T.5 asy (M 1 (x o)). (8.31) Figure 8.6: f 8 : R R

133 13 8. Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B 5 Error 1 15 Table 8.15: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R DE/B PSO.6 APSO APSO FA 1 CS BA FPA/R FPA/B 11.7 GSA GSA Figure 8.7: Test function 8/8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B Error.5 1 Table 8.16: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA 11.6 DE/R 1.8 DE/B PSO 6.17 APSO1 1 APSO.365 FA.53 CS 7. BA 5.35 FPA/R 8.33 FPA/B 3.51 GSA 9.9 GSA Figure 8.8: Test function 8/8, D = 1, 1 evaluations

134 8.3. Single-problem statistical analysis 133 Test function 9/8. Rotated Weierstrass function where f 9 (z) = D.5 k cos ( 3 k (z i +.5) ) D.5 k cos ( 3 k) (8.3) i=1 k= Properties: multi-modal non-separable asymmetrical k= ( ( )).5 z = M Λ 1 Tasy.5 M 1 (x o). (8.33) 1 Figure 8.9: f 9 : R R

135 13 8. Statistical analysis GA. Error.5 1. Table 8.17: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R DE/B 1 PSO APSO APSO FA CS BA FPA/R FPA/B GSA GSA Figure 8.3: Test function 9/8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B 5 Error 1 15 Table 8.18: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA.665 DE/R DE/B PSO APSO APSO FA 1 CS BA FPA/R FPA/B GSA 3.6 GSA Figure 8.31: Test function 9/8, D = 1, 1 evaluations

136 8.3. Single-problem statistical analysis 135 Test function 1/8. Rotated Griewank s function f 1 (z) = 1 + D zi D i=1 i=1 cos z i i (8.3) where Properties: multi-modal non-separable rotated z = Λ 1 M 1 (6 (x o)). (8.35) Figure 8.3: f 1 : R R

137 Statistical analysis GA. Error.5 1. Table 8.19: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R.36 DE/B 1 PSO 3.51 APSO APSO FA.37 CS BA FPA/R FPA/B GSA GSA Figure 8.33: Test function 1/8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B 1 Error 3 Table 8.: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R. DE/B PSO 3.3 APSO APSO FA 1 CS BA FPA/R FPA/B GSA GSA Figure 8.3: Test function 1/8, D = 1, 1 evaluations

36) where Properties: multi-modal separable asymmetrical large number

138 8.3. Single-problem statistical analysis 137 Test function 11/8. Rastrigin s function D ( ) f 11 (z) = 1 + z i 1 cos z i i=1 (8.36) where Properties: multi-modal separable asymmetrical large number of local optima ( ( )) 5.1 z = Λ 1 Tasy. T osz (x o). (8.37) 1 Figure 8.35: f 11 : R R

139 Statistical analysis GA..5 Error Table 8.1: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R 1 DE/B PSO.11 APSO APSO FA CS BA FPA/R FPA/B GSA GSA Figure 8.36: Test function 11/8, D =, evaluations GA 5 Error 1 15 Table 8.: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R DE/B PSO APSO APSO FA 1 CS BA FPA/R FPA/B GSA.376 GSA Figure 8.37: Test function 11/8, D = 1, 1 evaluations

38) where ( ( ))) 5.1 z = M 1 Λ 1 M Tasy (T. osz M 1 (x o). (8.

140 8.3. Single-problem statistical analysis 139 Test function 1/8. Rotated Rastrigin s function D ( ) f 1 (z) = 1 + z i 1 cos z i i=1 (8.38) where ( ( ))) 5.1 z = M 1 Λ 1 M Tasy (T. osz M 1 (x o). (8.39) 1 Properties: multi-modal non-separable asymmetrical large number of local optima Figure 8.38: f 1 : R R

141 1 8. Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA 1 Error 3 Table 8.3: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R.376 DE/B 8.1 PSO APSO1 1 APSO.7 FA CS 7.7 BA 9.18 FPA/R FPA/B GSA 5.61 Figure 8.39: Test function 1/8, D =, evaluations GA Error 5 1 Table 8.: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA 3.57 DE/R DE/B PSO APSO APSO FA.51 CS BA FPA/R FPA/B GSA 1 GSA Figure 8.: Test function 1/8, D = 1, 1 evaluations

) where Properties multi-modal rotated non-separable asymmetrical large number of local optima

142 8.3. Single-problem statistical analysis 11 Test function 13/8. Non-continues rotated Rastrigins function D ( ) f 13 (z) = 1 + z i 1 cos z i i=1 (8.) where Properties multi-modal rotated non-separable asymmetrical large number of local optima z = M 1 Λ 1 M T. asy (T osz (y)), (8.1) { w i if w i.5, y i = 1 w (8.) i if w i >.5, ( ) 5.1 w = M 1 (x o). (8.3) 1 Figure 8.1: f 13 : R R

143 1 8. Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA 1 Error 3 Table 8.5: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA 9.37 DE/R.15 DE/B PSO APSO1 1 APSO.18 FA 3.33 CS 5. BA 7.67 FPA/R FPA/B GSA Figure 8.: Test function 13/8, D =, evaluations GA 5 Error 1 15 Table 8.6: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R DE/B PSO APSO APSO FA 1 CS BA FPA/R FPA/B GSA.13 GSA Figure 8.3: Test function 13/8, D = 1, 1 evaluations

8.3. Single-problem statistical analysis 13 Test function 1/8. Schwefel s function D f 1 (z) = 18.989 D g(z i ) (8.) i=1 where z = Λ 1 (1 (x o)) +.9687, (8.

144 8.3. Single-problem statistical analysis 13 Test function 1/8. Schwefel s function D f 1 (z) = D g(z i ) (8.) i=1 where z = Λ 1 (1 (x o)) , (8.5) z i sin z i, if z i 5, g(z i ) = (5 z i mod 5) sin 5 z i mod 5 (zi 5) 1 d if z i > 5, (z i mod 5 5) sin 5 z i mod 5 (zi+5) 1 d if z i < 5. (8.6) Properties: multi-modal non-separable asymmetrical large number of local optima Figure 8.: f 1 : R R

145 1 8. Statistical analysis GA 5 Error 1 15 Table 8.7: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA.1 DE/R 1 DE/B 8.1 PSO.91 APSO APSO FA 5.51 CS BA FPA/R FPA/B GSA 3.8 GSA Figure 8.5: Test function 1/8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R Error 1 Table 8.8: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA 1 DE/R DE/B PSO.9 APSO APSO FA 3. CS BA FPA/R FPA/B GSA. FPA/B GSA Figure 8.6: Test function 1/8, D = 1, 1 evaluations

8.3. Single-problem statistical analysis 15 Test function 15/8. Rotated Schwefel s function where D f 1 (z) = 18.989 D g(z i ) (8.7) i=1 z = Λ 1 M 1 (1 (x o)) +.9687, (8.

146 8.3. Single-problem statistical analysis 15 Test function 15/8. Rotated Schwefel s function where D f 1 (z) = D g(z i ) (8.7) i=1 z = Λ 1 M 1 (1 (x o)) , (8.8) z i sin z i, if z i 5, g(z i ) = (5 z i mod 5) sin 5 z i mod 5 (zi 5) 1 d if z i > 5, (z i mod 5 5) sin 5 z i mod 5 (zi+5) 1 d if z i < 5. (8.9) Properties: multi-modal rotated non-separable asymmetrical large number of local optima Figure 8.7: f 15 : R R

147 16 8. Statistical analysis GA 5 Error 1 15 Table 8.9: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R.36 DE/B 6.5 PSO 1. APSO APSO FA 1 CS 3.1 BA FPA/R FPA/B 11.1 GSA 5.9 GSA Figure 8.8: Test function 15/8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R Error 1 Table 8.3: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA 3.3 DE/R DE/B PSO APSO APSO FA.3 CS BA FPA/R FPA/B GSA 1 FPA/B GSA Figure 8.9: Test function 15/8, D = 1, 1 evaluations

148 8.3. Single-problem statistical analysis 17 Test function 16/8. Rotated Katsuura function where f 16 (z) = 1 D Properties: multi-modal continuous non-separable asymmetrical non-differentiable 1D D j j z i j z i 1 D (8.5) i=1 j=1 ( ) 5 z = M Λ 1 M 1 (x o). (8.51) 1 Figure 8.5: f 16 : R R

149 18 8. Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA 1 Error 3 Table 8.31: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R 9.3 DE/B PSO.33 APSO APSO.18 FA 3.9 CS 1.3 BA 7. FPA/R FPA/B 1 GSA 6. Figure 8.51: Test function 16/8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA 1 Error 3 Table 8.3: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R DE/B PSO APSO APSO FA 1 CS BA FPA/R FPA/B GSA Figure 8.5: Test function 16/8, D = 1, 1 evaluations

150 8.3. Single-problem statistical analysis 19 Test function 17/8. Lunacek bi-rastrigin function { D } D D f 17 (z) = min (ˆx i µ ), D + s (ˆx i µ 1 ) + 1 D 1 cos z i (8.5) i=1 i=1 i=1 where z = Λ 1 (ˆx µ ), (8.53) ˆx i = µ + y i sgn o i, (8.5) y = 1 (x o), 1 (8.55) µ =.5, (8.56) s = 1 µ µ 1 = s, (8.57) 1 D (8.58) Figure 8.53: f 17 : R R

151 15 8. Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA Error 6 Table 8.33: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA 8.7 DE/R DE/B PSO.58 APSO1 6.1 APSO.869 FA 9.7 CS 1 BA 7.6 FPA/R 11.6 FPA/B GSA Figure 8.5: Test function 17/8, D =, evaluations GA 5 Error 1 15 Table 8.3: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R DE/B PSO APSO APSO FA 1 CS BA FPA/R FPA/B GSA GSA Figure 8.55: Test function 17/8, D = 1, 1 evaluations

152 8.3. Single-problem statistical analysis 151 Test function 18/8. Rotated Lunacek bi-rastrigin function { D } D D f 18 (z) = min (ˆx i µ ), D + s (ˆx i µ 1 ) + 1 D 1 cos z i (8.59) i=1 i=1 i=1 where Properties: multi-modal continuous non-separable asymmetrical non-differentiable z = M Λ 1 M 1 (ˆx µ ), (8.6) ˆx i = µ + y i sgn o i, (8.61) y = 1 (x o), 1 (8.6) µ =.5, (8.63) µ µ 1 = s, (8.6) 1 s = 1 D (8.65) Figure 8.56: f 18 : R R

153 15 8. Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA Error 6 Table 8.35: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R 6.6 DE/B 9. PSO 1 APSO APSO.19 FA 7.1 CS BA.6 FPA/R FPA/B GSA Figure 8.57: Test function 18/8, D =, evaluations GA 5 Error 1 15 Table 8.36: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA.1 DE/R DE/B PSO APSO APSO FA 1 CS BA FPA/R FPA/B GSA GSA Figure 8.58: Test function 18/8, D = 1, 1 evaluations

154 8.3. Single-problem statistical analysis 153 Test function 19/8. Rotated expanded Griewank s plus Rosenbrock s function D 1 f 19 (z) = f 1 (f 6 (z D, z 1 )) + f 1 (f 6 (z i, z i+1 )) (8.66) i=1 where Properties: multi-modal non-separable ( ) 5 z = M 1 (x o) + 1. (8.67) 1 Figure 8.59: f 19 : R R

155 15 8. Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B..5 Error Table 8.37: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R 1 DE/B.13 PSO APSO APSO FA CS BA FPA/R FPA/B GSA GSA Figure 8.6: Test function 19/8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B Error 5 1 Table 8.38: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R DE/B PSO.6 APSO APSO FA 1 CS BA FPA/R FPA/B GSA GSA Figure 8.61: Test function 19/8, D = 1, 1 evaluations

), i=1 (8.68a) g(x, y) = 1 + sin x + y 1 (1 + 1 3 (x + y )) (8.

156 8.3. Single-problem statistical analysis 155 Test function /8. Rotated expanded Scaffer s F6 function D 1 f (z) = g(z D, z 1 ) + g(z i, z i+1 ), i=1 (8.68a) g(x, y) = 1 + sin x + y 1 ( (x + y )) (8.68b) where Properties: multi-modal non-separable asymmetrical z = M T.5 asy (M 1 (x o)). (8.69) Figure 8.6: f : R R

157 Statistical analysis GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B..5 Error Table 8.39: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA 8.5 DE/R 1 DE/B PSO APSO1.69 APSO FA CS BA.36 FPA/R FPA/B 9.8 GSA GSA Figure 8.63: Test function /8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA Error 6 Table 8.: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA.85 DE/R DE/B 6. PSO 1 APSO1 9.1 APSO FA.376 CS 5.1 BA 7.1 FPA/R FPA/B GSA 8.1 Figure 8.6: Test function /8, D = 1, 1 evaluations

7) i {1,,3,,5} where Properties: multi-modal non-separable asymmetrical ω i = w i i {1,,3,,5}

158 8.3. Single-problem statistical analysis 157 Test function 1/8. Composition function f 1 (z) = ω i (λ i f i (z) + b i ) (8.7) i {1,,3,,5} where Properties: multi-modal non-separable asymmetrical ω i = w i i {1,,3,,5} D (x j o ij) w i, (8.71) w i = exp j=1 Dσi. (8.7) D (x j o ij ) j=1 Figure 8.65: f 1 : R R Table 8.1: f 1 coefficients σ i λ i b i

159 Statistical analysis GA 5 Error 1 15 Table 8.: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R. DE/B PSO APSO1 9. APSO FA CS 1 BA 8.61 FPA/R FPA/B GSA 1.1 GSA Figure 8.66: Test function 1/8, D =, evaluations GA Error Table 8.3: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R DE/B PSO APSO APSO FA CS 1 BA FPA/R FPA/B GSA GSA Figure 8.67: Test function 1/8, D = 1, 1 evaluations

160 8.3. Single-problem statistical analysis 159 Test function /8. Composition function 3 f (z) = ω i (λ i f 1 (z) + b i ) (8.73) i=1 where Properties: multi-modal separable asymmetrical ω i = w i. (8.7) 3 w i i=1 Figure 8.68: f : R R Table 8.: f coefficients σ i λ i b i

161 16 8. Statistical analysis GA 5 Error 1 15 Table 8.5: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA.579 DE/R 1 DE/B 7.6 PSO APSO1 1.1 APSO FA 5.17 CS BA FPA/R FPA/B GSA.181 GSA Figure 8.69: Test function /8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R 1 Error 3 Table 8.6: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA DE/R. DE/B PSO.1 APSO APSO FA 1 CS BA FPA/R FPA/B GSA FPA/B GSA Figure 8.7: Test function /8, D = 1, 1 evaluations

162 8.3. Single-problem statistical analysis 161 Test function 3/8. Composition function 3 f 3 (z) = ω i (λ i f 15 (z) + b i ) (8.75) i=1 where Properties: multi-modal non-separable asymmetrical ω i = w i. (8.76) 3 w i i=1 Figure 8.71: f 3 : R R Table 8.7: f 3 coefficients σ i λ i b i

163 16 8. Statistical analysis GA 1 Error 3 Table 8.8: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R.387 DE/B PSO.9 APSO APSO 5.7 FA 1 CS 6.1 BA FPA/R FPA/B GSA 3.7 GSA Figure 8.7: Test function 3/8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R 1 Error 3 Table 8.9: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA 1 DE/R DE/B PSO APSO APSO FA.789 CS BA FPA/R FPA/B GSA 3.1 FPA/B GSA Figure 8.73: Test function 3/8, D = 1, 1 evaluations

164 8.3. Single-problem statistical analysis 163 Test function /8. Composition function f (z) = ω i (λ i f i (z) + b i ) (8.77) i {9,1,15} where ω i = w i i {9,1,15} w i. (8.78) Properties: multi-modal non-separable asymmetrical Figure 8.7: f : R R Table 8.5: f coefficients σ i λ i b i

165 16 8. Statistical analysis GA Error 5 1 Table 8.51: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R 1 DE/B 6.56 PSO.666 APSO1 7.3 APSO. FA CS BA FPA/R FPA/B GSA 5.6 GSA Figure 8.75: Test function /8, D =, evaluations GA 1 15 Error 5 Table 8.5: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R DE/B PSO APSO APSO FA CS 1 BA FPA/R FPA/B GSA GSA Figure 8.76: Test function /8, D = 1, 1 evaluations

166 8.3. Single-problem statistical analysis 165 Test function 5/8. Composition function f 5 (z) = ω i (λ i f i (z) + b i ) (8.79) i {9,1,15} where ω i = w i i {9,1,15} w i. (8.8) Properties: multi-modal non-separable asymmetrical Figure 8.77: f 5 : R R Table 8.53: f 5 coefficients σ i λ i b i

167 Statistical analysis GA Error 5 1 Table 8.5: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R 5.1 DE/B 9.1 PSO APSO1 8.1 APSO 1 FA CS 1 BA 6. FPA/R.9 FPA/B GSA 7.1 GSA Figure 8.78: Test function 5/8, D =, evaluations GA 15 Error 5 Table 8.55: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA.71 DE/R DE/B PSO APSO APSO.35 1 FA 1 CS 3.1 BA FPA/R FPA/B GSA 5. GSA Figure 8.79: Test function 5/8, D = 1, 1 evaluations

168 8.3. Single-problem statistical analysis 167 Test function 6/8. Composition function f 6 (z) = ω i (λ i f i (z) + b i ) (8.81) i {,9,1,1,15} where ω i = w i i {,9,1,1,15} w i. (8.8) Properties: multi-modal non-separable asymmetrical Figure 8.8: f 6 : R R Table 8.56: f 6 coefficients σ i λ i b i

169 Statistical analysis GA Error 5 1 Table 8.57: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA DE/R 3.16 DE/B PSO.19 APSO APSO 1 FA CS.3 BA 8.6 FPA/R FPA/B GSA 5.15 GSA Figure 8.81: Test function 6/8, D =, evaluations GA 1 Error 3 Table 8.58: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA. DE/R DE/B PSO 6.1 APSO APSO FA CS 1 BA FPA/R FPA/B GSA GSA Figure 8.8: Test function 6/8, D = 1, 1 evaluations

170 8.3. Single-problem statistical analysis 169 Test function 7/8. Composition function f 7 (z) = ω i (λ i f i (z) + b i ) (8.83) i {1,9,1,1,15} where ω i = w i i {1,9,1,1,15} w i. (8.8) Properties: multi-modal non-separable asymmetrical Figure 8.83: f 7 : R R Table 8.59: f 7 coefficients σ i λ i b i

171 17 8. Statistical analysis GA 1 Error 3 Table 8.6: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA 5.5 DE/R 1 DE/B 7.1 PSO 3.19 APSO1 9. APSO.5 FA CS.61 BA 8. FPA/R 6. FPA/B GSA GSA Figure 8.8: Test function 7/8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R Error Table 8.61: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA 1 DE/R DE/B PSO APSO APSO FA.161 CS.58 BA FPA/R FPA/B GSA FPA/B GSA Figure 8.85: Test function 7/8, D = 1, 1 evaluations

172 8.3. Single-problem statistical analysis 171 Test function 8/8. Composition function f 8 (z) = ω i (λ i f i (z) + b i ) (8.85) i {1,7,15,19,} where ω i = w i i {1,7,15,19,} w i. (8.86) Properties: multi-modal non-separable asymmetrical Figure 8.86: f 8 : R R Table 8.6: f 8 coefficients σ i λ i b i

173 17 8. Statistical analysis GA 5 Error 1 15 Table 8.63: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GA 8.1 DE/R.9 DE/B PSO.68 APSO1 7.7 APSO FA 6.6 CS 1 BA FPA/R FPA/B GSA GSA Figure 8.87: Test function 8/8, D =, evaluations GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R Error 5 1 Table 8.6: Position and Wilcoxon test p-value as a function of an algorithm. Algorithm P p-value GA. DE/R.8 DE/B PSO APSO APSO FA 3.1 CS 1 BA FPA/R FPA/B GSA FPA/B GSA Figure 8.88: Test function 8/8, D = 1, 1 evaluations

174 8.. Multiple-problem statistical analysis Multiple-problem statistical analysis Statistical analysis of the algorithms over optimisation problems follows the method given in [8, 9]. The method comprise the use of the non-parametric statistical tests. The problem is referred to as a multiple-problem analysis and considers a comparison of several algorithms over more than one problem (function) simultaneously. As previously, the test suite [15] includes all 8 functions f i : R D R. In general, a parametric statistical tests might obtain similar conclusions to a non-parametric test but the former can lead to incorrect conclusions. This is because of dissimilarities in the results and small size of the analysed sample data. What is more, the non-parametric tests do not require explicit conditions while required conditions for parametric tests are typically not satisfied. For instance, the results of Friedman test for D = and 1 evaluations in terms of is p-value= This make it possible to see if there are global differences in the results. Indeed, the p-value shows significant differences among algorithms for the value lower than the level of significance α =.5. The differences may be revealed by means of a post-hoc statistical analysis. Table 8.65 presents rankings coming from the Friedman test and corresponding positions of the algorithms. It is evident that best solutions are reached by the APSO1 algorithm. However, the differences among APSO1, DE/B, PSO, GSA are insignificant for D = and 1 evaluations. This fact is also displayed in figure 8.89 which presents ranking from table 8.65 (the lower the better). The horizontal lines for α =.5 and.1 represent the threshold for best performing algorithms. The threshold height equal the lowest rank increased by the corresponding critical difference CD α calculated by the Bonferroni-Dunn s method k(k + 1) CD α = q. (8.87) 6h In the above k = 1 stands for the number of algorithms and h = 8 number of test functions. The critical value q for a multiple non-parametric comparison is taken from statistical tables [3]. If bars exceed these lines this simply means that the associated algorithms perform significantly worse in comparison with the algorithm associated with the lowest bar [8]. According to figure 8.89 APSO1 algorithm outperforms significantly GA, DE/R, APSO, FA, CS, BA, FPA/R, FPA/B (D = and 1 evaluations). APSO1, however cannot outperform DE/B, PSO and GSA. Similar conclusions can be drawn from the Wilcoxon signed-rank test. The Wilcoxon test p-values are presented in table It conducts individual comparisons between two algorithms rather than multiple comparisons. APSO1 outperforms GA, DE/R, APSO, FA, CS, BA, FPA/R, FPA/B. However, APSO1 cannot outperform DE/B, PSO, GSA.

175 17 8. Statistical analysis 8..1 D =, 1 evaluations As mentioned earlier, Friedman s rank test p-value is for multipleproblem analysis. According to figure 8.89 and table 8.65 APSO1 algorithm outperforms significantly GA, DE/R, APSO, FA, CS, BA, FPA/R, FPA/B but it cannot outperform DE/B, PSO and GSA. According to table 8.66 APSO1 outperforms GA, DE/R, APSO, FA, CS, BA, FPA/R, FPA/B. However, APSO1 cannot outperform DE/B, PSO, GSA. Table 8.65: Friedman s ranks R and positions P as a function of an algorithm Algorithm R P GA DE/R DE/B PSO 3.5 APSO APSO.63 FA CS BA FPA/R FPA/B GSA.1 3 Table 8.66: Wilcoxon test p-values as a function of an algorithm Algorithm p-value GA.13 DE/R.18 DE/B.311 PSO.356 APSO1 APSO.783 FA.551 CS. BA.9655 FPA/R FPA/B. GSA.67 Position α =.5 α =.1 GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA Algorithm Figure 8.89: D =, 1 evaluations

176 8.. Multiple-problem statistical analysis D =, evaluations Friedman s rank test p-value is. 1 6 for multiple-problem analysis. According to figure 8.9 and table 8.67 PSO algorithm outperforms significantly GA, DE/R, FA, CS, BA, FPA/R, FPA/B but it cannot outperform DE/B, APSO1, APSO and GSA. According to table 8.68 PSO outperforms GA, DE/R, FA, CS, BA, FPA/R, FPA/B. However, PSO cannot outperform DE/B, APSO1, APSO and GSA. Table 8.67: Friedman s ranks R and positions P as a function of an algorithm Algorithm R P GA DE/R DE/B.9 PSO APSO1.75 APSO FA CS BA FPA/R FPA/B GSA Table 8.68: Wilcoxon test p-values as a function of an algorithm Algorithm p-value GA.6967 DE/R.1 DE/B PSO APSO APSO FA CS.663 BA.17 FPA/R FPA/B.915 GSA.5885 Position α =.5 α =.1 GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA Algorithm Figure 8.9: D =, evaluations

177 Statistical analysis 8..3 D =, evaluations According to figure 8.91 and table 8.69 DE/R algorithm outperforms significantly GA, DE/B, APSO1, APSO, FA, BA, FPA/R, FPA/B and GSA but it cannot outperform PSO, and CS. According to table 8.7 DE/R outperforms GA, DE/B, APSO1, FA, BA, FPA/R, FPA/B and GSA. However, DE/R cannot outperform PSO, APSO and CS. Table 8.69: Friedman s ranks R and positions P as a function of an algorithm Algorithm R P GA 8. 1 DE/R DE/B 5.67 PSO.71 APSO APSO FA CS.99 3 BA FPA/R FPA/B GSA Table 8.7: Wilcoxon test p-values as a function of an algorithm Algorithm p-value GA.3 DE/R DE/B.737 PSO.7388 APSO1.61 APSO.5331 FA.79 CS.515 BA.6 FPA/R. FPA/B.1 GSA.39 Position α =.5 α =.1 GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA Algorithm Figure 8.91: D =, evaluations

178 8.. Multiple-problem statistical analysis D = 1, 1 evaluations According to figure 8.91 and table 8.71 FA algorithm outperforms significantly DE/R, DE/B, APSO1, APSO, CS, BA, FPA/R, and FPA/B but it cannot outperform GA, PSO and GSA. According to table 8.7 FA outperforms GA, DE/R, DE/B, PSO, APSO1, APSO, CS, BA, FPA/R, FPA/B and GSA. However, FA cannot outperform GA. Table 8.71: Friedman s ranks R and positions P as a function of an algorithm Algorithm R P GA 3.67 DE/R DE/B PSO.31 3 APSO APSO FA.81 1 CS BA FPA/R FPA/B GSA.96 Table 8.7: Wilcoxon test p-values as a function of an algorithm Algorithm p-value GA.3156 DE/R.1959 DE/B PSO APSO1.77 APSO.1 FA CS.3153 BA.9 FPA/R.53 FPA/B GSA.1377 Position α =.5 α =.1 GA DE/R DE/B PSO APSO1 APSO FA CS BA FPA/R FPA/B GSA Algorithm Figure 8.9: D = 1, 1 evaluations

179 Bibliography [1] Bäck, T., Fogel, D. B, Michalewicz, Z., (Eds.). Evolutionary computation advanced algorithms and operators Bristol and Philadelphia: Institute of Physics Publishing Ltd. [] Bronshtein, I. N., Semendyayev, K.A., Musiol, G., Mühlig, H. 7. Handbook of mathematics Berlin, Heidelberg: Springer-Verlag [3] Eiben, A. E., Smith, J. E. 3. Introduction to evolutionary computing Berlin: Springer-Verlag [] Eiben, A. E., Smit, S. K. 11. Parameter tuning for configuring and analyzing evolutionary algorithms Swarm and Evolutionary Computation 1 (1): [5] Elsgolc, L.D. 7. Calculus of variations New York: Dover Publications, Inc. [6] Fister, I., Fister, I. Jr., Yang, X. S., Brest, J. 13. A comprehensive review of firefly algorithms Swarm and Evolutionary Computation 13 (1): 3 6 [7] Fister, I. Jr., Yang, X. S., Fister, I., Brest, J., Fister, D. 13. A brief review of nature-inspired algorithms for optimization Elektrotehniški Vestnik 8 (3): 1 7 [8] García, S., Fernandez, A., Benítez, A.D., Herrera, F. 7. Statistical Comparisons by Means of Non-Parametric Tests: A Case Study on Genetic Based Machine Learning Proceedings of the II Congreso Español de Informática (CEDI 7) [9] García, S., Molina, D., Lozano, M., Herrera, F. 9. A study on the use of non-parametric tests for analyzing the evolutionary algorithms behaviour: a case study on the CEC 5 Special Session on Real Parameter Optimization Journal of Heuristics 15: [1] Goldberg, D. E Genetic algorithms in search, optimization and machine learning Boston, MA: Addison-Wesley [11] Halton, J.H On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals Numerische Mathematik : 8 9 [1] Holland, J.H Outline for a logical theory of adaptive systems Journal of the Association for Computing Machinery 3: 97 31

180 Bibliography 179 [13] Kennedy, J., Eberhart, R Particle swarm optimization Proceedings of IEEE International Conference on Neural Networks IV [1] Kirkpatrick, S., Gelatt Jr, C. D., Vecchi, M. P Optimization by simulated annealing Science (598): [15] Liang, J. J., Qu, B. Y., Suganthan, P. N., Hernández-Díaz, A. G. 13. Problem definitions and evaluation criteria for the CEC 13 special session on realparameter optimization (Technical Report 11), China: Zhengzhou University, and Singapore: Nanyang Technological University [16] Mantegna, R. N Fast, accurate algorithm for numerical simulation of Levy stable stochastic processes Physical Review E 9 (5): [17] Michalewicz, Z Genetic algorithms + data structures = evolution programs 3rd ed. Berlin, Heidelberg, New York: Springer [18] Price, K. V., Storn, R., and Lampinen, J. 5 Differential evolution: A practical approach to global optimization Berlin: Springer-Verlag [19] Rashedi, E., Nezamabadi-pour, H., Saryazdi, S. 9. GSA: A gravitational search algorithm Information Sciences 179: 3 8 [] Storn, R., Price, K Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces Journal of Global Optimization, 11: [1] Tesch, K., Atherton, M.A., Karayiannis, T.G., Collins, M.W., Edwards, P. 9. Determining heat transfer coefficients using evolutionary alogrithms Engineering Optimization 1 (9): [] Tesch, K. 1. On some extensions of Murray s law TASK Quarterly 1 (3): [3] Tesch, K., Banaszek, M. 11. A variational method of finding streamlines in ring cascades for creeping flows TASK Quarterly 15 (1): 71 8 [] Tesch, K., Kaczorowska, K. 16. Arterial cannula shape optimization by means of the rotational firefly algorithm Engineering Optimization 8 (3): [5] Thiémard, E., 1. An algorithm to compute bounds for the star discrepancy Journal of Complexity, 17: [6] Yang, X.S., 8. Nature-inspired metaheuristic algorithms Frome, UK: Luniver Press [7] Yang, X.S., Deb, S. 9. Cuckoo search via Levy flights In: Proceeings of world congress on nature & biologically inspired computing (NaBIC 9), USA: IEEE Publications, 1 1

181 18 Bibliography [8] Yang, X.S. 1. A new metaheuristic bat-inspired algorithm In: Cruz C, González, J.R., Pelta, D.A., Terrazas, G., (Eds). Nature inspired cooperative strategies for optimization (NISCO 1). Studies in computational intelligence. Berlin, Germany: Springer, 65 7 [9] Yang, X.S. 1. Flower pollination algorithm for global optimization In: Unconventional computation and natural computation. Lecture notes in computer science, 75, 9 [3] Yang, X.S., Deb, S., Fong, S. 11. Accelerated particle swarm optimization and support vector machine for business optimization and applications In: Networked digital technologies. Communications in computer and information science, 136. Berlin, Germany: Springer, [31] Yang, X. S. 9. Firefly algorithm, stochastic test functions and design optimisation International Journal of Bio-inspired Computation (): 78 8 [3] Zar, J.H Biostatistical analysis New Jersey: Prentice Hall [33] Zitzler, E., Thiele, L Multiobjective evolutionary algorithms: A comparative study and the strength pareto approach IEEE Transaction on Evolutionary Computation, 3 (): 57 71

182 Appendix A Codes This appendix contains working examples of single-point and multi-point, derivative free algorithm codes. Implementations are provided in the Mathematica programming language. It has to be pointed out that the readability and comprehensibility of implementations is preferred to efficiency and length. A.1 Single-point, derivative-free algorithms d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 n = ; 1 α = Sqrt d n ; x = Table, Table, d, {n} ; x 1, = L + (U - L) * RandomVariate UniformDistribution[{, 1}], d ; x 1, 1 = f[x 1, ]; g = x 1 ; For i =, i n, i++, x i, = x i - 1, + α (U - L) * RandomVariate NormalDistribution[, 1], d ; x i, 1 = f x i, ; If x i, 1 g 1, g = x i ; Print "f b (", g, ")=", g 1 Figure A.1: Uncontrolled random walk code

183 18 A. Codes d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 n = ; 1 α = Sqrt d n ; x = Table, Table, d, {n} ; x 1, = L + (U - L) * RandomVariate UniformDistribution[{, 1}], d ; x 1, 1 = f[x 1, ]; g = x 1 ; For i =, i n, i++, within = False; While within, x i, = x i - 1, + α (U - L) * RandomVariate NormalDistribution[, 1], d ; within = IsWithin x i ; x i, 1 = f x i, ; If x i, 1 g 1, g = x i ; Print "f b (", g, ")=", g 1 Figure A.: Domain controlled random walk code IsWithin = Function {x1}, b = True; For j = 1, j d, j++, b = And b, L j x1, j U j ; b ; Figure A.3: Within or without?

184 A.1. Single-point, derivative-free algorithms 183 d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 n = ; 1 α = 1 Sqrt d ; x = Table, Table, d, {n} ; x 1, = L + (U - L) * RandomVariate UniformDistribution[{, 1}], d ; x 1, 1 = f[x 1, ]; g = x 1 ; For i =, i n, i++, x i, = g + α (U - L) * RandomVariate NormalDistribution[, 1], d ; x i, 1 = f x i, ; If x i, 1 g 1, g = x i ; Print "f b (", g, ")=", g 1 Figure A.: Position controlled random walk code d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 n = ; δ = 1 - ; T = 1; α = 1 1 ; x = Table, Table, d, {n} ; x 1, = L + (U - L) * RandomVariate UniformDistribution[{, 1}], d ; x 1, 1 = f[x 1, ]; g = l = x 1 ; For i =, i n, i++, T *= δ 1/n ; x i, = l + α (U - L) * RandomVariate NormalDistribution[, 1], d ; x i, 1 = f x i, ; Δ = x i, 1 - l 1 ; If Δ < Exp - Δ > Random[], l = x i ; T If l 1 < g 1, g = l ; Print "f b (", g, ")=", g 1 Figure A.5: Simulated annealing code

185 18 A. Codes A. Multi-point, derivative-free algorithms d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 Nn = ; n = ; α = 1; G = 1; v = a = Table Table, d, {Nn} ; M = Table[, {Nn}]; x = Table, L + (U - L) * RandomVariate UniformDistribution[{, 1}], d, {Nn} ; Do For j = 1, j Nn, j++, x j, 1 = f x j, ; b, w = Sort[x, #1 1 < # 1 &] {1, Nn} ; If k 1, g = b ; If b 1 < g 1, g = b ; x All, 1 - w 1 M = ; b 1 - w 1 M M = Total[M] ; G = G e -α k/n ; e = Table Table, d, {Nn} ; For i = 1, i Nn, i++, ; For j = 1, j Nn, j++, M j RandomReal 1, d * x j, - x i, If i j, e i += Norm x i, - x j, ; v = RandomReal 1, Nn, d * v + e G; x All, += v;, k, 1, n Print "f b (", g, ")=", g 1 Figure A.6: Gravitational search algorithm code

186 A.. Multi-point, derivative-free algorithms 185 d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 Nn = ; n = ; pc =.7; pm =.15; st = 3; x = y = Table, L + (U - L) * RandomVariate UniformDistribution[{, 1}], d, {Nn} ; For i = 1, i Nn, i++, x i, 1 = f x i, ; g = Sort[x, #1 1 < # 1 &] 1 ; Do For j = 1, j Nn, j +=, p1 = x Tournament[sT] ; p = x Tournament[sT] ; {c1, c} = Crossover[p1, p]; y j = Mutation c1, i ; y j + 1 = Mutation c, i ; ; For j = 1, j Nn, j++, y j, 1 = f y j, ; l = Sort[y, #1 1 < # 1 &] 1 ; If l 1 < g 1, g = l ; x = SurvSelection[x, y];, i,, n Print "f b (", g, ")=", g 1 Figure A.7: Genetic algorithm code Tournament = Function {Tour}, k = RandomInteger[{1, Nn}]; For l = 1, l < Tour, l++, m = RandomInteger[{1, Nn}]; If x m, 1 < x k, 1, k = m ; ; k ; Figure A.8: GA parent selection (tournament) code

187 186 A. Codes Crossover = Function {x1, x}, y1 = x1; y = x; If RandomReal[] < pc, a = RandomReal[]; y1 = a x1 + (1 - a) x ; y = a x + (1 - a) x1 ; ; {y1, y} ; Figure A.9: GA crossover code Mutation = Function x1, i, n, y1 = x1; For k = 1, k d, k++, If RandomReal[] < pm, y1, k = L k + RandomReal[] U k - L k ; ; y1 ; Figure A.1: GA uniform mutation code Mutation = Function x1, i, ; y1 = x1; For k = 1, k d, k++, ; y1 If RandomReal[] < pm, Δ = 1 - RandomReal[] (1-i/n) ; If RandomInteger[], y1, k += U k - y1, k Δ, y1, k -= y1, k - L k Δ ; ; Figure A.11: GA nonuniform mutation code

188 A.. Multi-point, derivative-free algorithms 187 SurvSelection = Function {x, y}, x1 = Join[x, y]; x1 = Sort[x1, #1 1 < # 1 &]; x1 = x1 1 ;; Nn ; x1 ; Figure A.1: GA (µ + λ) strategy code SurvSelection = Function[{x, y}, y ]; Figure A.13: GA (µ, λ) strategy code d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 Nn = ; n = ; CR =.9; F =.7; x = y = Table, L + (U - L) * RandomVariate UniformDistribution[{, 1}], d, {Nn} ; For i = 1, i Nn, i++, x i, 1 = f x i, ; g = Sort[x, #1 1 < # 1 &] 1 ; For i =, i n, i++, For j = 1, j Nn, j++, R = RandomSample Complement Range[Nn], j ; K = Table HeavisideTheta CR - RandomReal[], d ; K RandomInteger 1, d = 1; y j, = K * (x R 3, + F (x R 1, - x R, )) + (1 - K) * x j, ; ; For j = 1, j Nn, j++, y j, 1 = f y j, ; If y j, 1 x j, 1, x j = y j ; If y j, 1 g 1, g = y j ; ; Print "f b (", g, ")=", g 1 Figure A.1: Differential evolution code

189 188 A. Codes d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 Nn = ; n = ; p =.8; λ = 1.5; α =.1; σu = Gamma[1 + λ] Sin λ Gamma 1+λ λ (λ-1)/ 1/λ ; x = Table, L + (U - L) * RandomVariate UniformDistribution[{, 1}], d, {Nn} ; For i = 1, i Nn, i++, x i, 1 = f x i, g = y = Sort[x, #1 1 < # 1 &] 1 ; For i =, i n, i++, For j = 1, j Nn, j++, If p < RandomReal[], u = σu RandomVariate NormalDistribution[, 1], d ; v = RandomVariate NormalDistribution[, 1], d ; u y = x j, + α * g - x j,, Abs[v] 1/λ R = RandomSample[Range[Nn]]; ϵ = RandomVariate UniformDistribution[{, 1}] ; y = x j, + ϵ (x R 1, - x R, ) ; y = CheckRange[y]; y 1 = f[y ]; If y 1 < x j, 1, x j, = y ; If[y 1 < g 1, g = y]; ; ; Print "f b (", g, ")=", g 1 Figure A.15: Flower pollination algorithm code CheckRange = Function {x1}, y1 = x1; For l = 1, l d, l++, If y1, l < L l, y1, l = L l ; If y1, l > U l, y1, l = U l ; ; y1 ; Figure A.16: Check range code

190 A.. Multi-point, derivative-free algorithms 189 d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 Nn = ; n = ; α = 1; β = 1; θ =.3; θ =.7; δ = 1 - ; v = Table Table, j, 1, d, {Nn} ; x = Table, L + (U - L) * RandomVariate UniformDistribution[{, 1}], d, {Nn} ; For i = 1, i Nn, i++, x i, 1 = f x i, l = x; g = Sort l, #1 1 < # 1 & 1 ; For i =, i n, i++, θ *= δ 1/n ; For j = 1, j Nn, j++, v j = (θ + θ) v j + β RandomReal 1, d * g - x j, + α RandomReal 1, d * l j, - x j, ; x j, += v j ; x j, 1 = f x j, ; If x j, 1 < l j, 1, l j = x j ; If l j, 1 < g 1, g = l j ; ; ; Print "f b (", g, ")=", g 1 Figure A.17: Particle swarm optimisation code

191 19 A. Codes d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 Nn = ; n = ; α =.1; β =.3; θ =.5; θ =.1; δ = 1 - ; v = Table Table, d, {Nn} ; x = Table, L + (U - L) * RandomVariate UniformDistribution[{, 1}], d, {Nn} ; For i = 1, i Nn, i++, x i, 1 = f x i, g = Sort[x, #1 1 < # 1 &] 1 ; For i =, i n, i++, θ *= δ 1/n ; For j = 1, j Nn, j++, v j = (θ + θ) v j + α (U - L) * RandomReal 1, d β g - x j, ; x j, += v j ; x j, 1 = f x j, ; If x j, 1 < g 1, g = x j ; ; Print "f b (", g, ")=", g 1 Figure A.18: Accelerated particle swarm optimisation 1 code d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 Nn = ; n = ; β =.; α =.5; α =.1; δ = 1 - ; x = Table, L + (U - L) * RandomVariate UniformDistribution[{, 1}], d, {Nn} ; For i = 1, i Nn, i++, x i, 1 = f x i, g = Sort[x, #1 1 < # 1 &] 1 ; For i =, i n, i++, α *= δ 1/n ; For j = 1, j Nn, j++, x j, += (α + α) (U - L) * RandomReal 1, d β g - x j, ; x j, 1 = f x j, ; If x j, 1 < g 1, g = x j ; ; Print "f b (", g, ")=", g 1 Figure A.19: Accelerated particle swarm optimisation code

192 A.. Multi-point, derivative-free algorithms 191 d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 Nn = ; n = ; α =.5; β =.; β =.8; γ = 1; δ = 1-3 ; x = Table, L + (U - L) * RandomVariate UniformDistribution[{, 1}], d, {Nn} ; Do For i = 1, i Nn, i++, x i, 1 = f x i, ; x = y = Sort[x, #1 1 < # 1 &]; g = x 1 ; α *= δ 1/n ; For i = 1, i Nn, i++, ; For j = 1, j < i, j++, ; x i, += β + β Exp -γ Norm x j, - x i, y j, - x i, +, k, 1, n α (U - L) * RandomVariate UniformDistribution - 1, 1, d ; Print "f b (", g, ")=", g 1 Figure A.: Firefly algorithm code d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 Nn = ; n = ; A =.5; r =.5; Fl = ; Fu = 1; α =.; v = Table Table, d, {Nn} ; x = Table, L + (U - L) * RandomVariate UniformDistribution[{, 1}], d, {Nn} ; For i = 1, i Nn, i++, x i, 1 = f x i, y = g = Sort[x, #1 1 < # 1 &] 1 ; For i =, i n, i++, For j = 1, j Nn, j++, If r < RandomReal[], y = g + α (U - L) * RandomVariate NormalDistribution[, 1], d, F = Fl + Fu - Fl RandomReal[]; v j += g - x j, F; y = x j, + v j ; ; y 1 = f[y ]; If y 1 x j, 1 && RandomReal[] < A, x j, = y ; If[y 1 < g 1, g = y]; ; Print "f b (", g, ")=", g 1 Figure A.1: Bat algorithm code

193 19 A. Codes d = ; L = Table, d ; U = Table, d ; d d f = -5 Sin #1 i - Sin 7 #1 i &; i=1 i=1 Nn = ; n = ; α =.; λ = 1.5; pa =.; F = Nn; σu = Gamma[1 + λ] Sin λ Gamma 1+λ λ (λ-1)/ 1/λ ; x = y = Table, L + (U - L) * RandomVariate UniformDistribution[{, 1}], d, {Nn} ; For i = 1, i Nn, i++, x i, 1 = f x i, g = Sort[x, #1 1 < # 1 &] 1 ; While F < n, For i = 1, i Nn, i++, u = σu RandomVariate NormalDistribution[, 1] ; v = RandomVariate NormalDistribution[, 1] ; ϵ = RandomVariate NormalDistribution[, 1], d ; u ϵ y i, = x i, + α (U - L) * Abs[v] 1/λ Norm[ϵ] ; F++; y i, 1 = f y i, ; If y i, 1 < x i, 1, x i = y i ; ; R1 = RandomSample[Range[Nn]]; R = RandomSample[Range[Nn]]; For i = 1, i Nn, i++, ϵ = α RandomVariate UniformDistribution[{, 1}], d ; y i, = x i, + ϵ * x R1 i, - x R i, HeavisideTheta pa - Random[] ; If x i, y i,, F++; y i, 1 = f y i, ; If y i, 1 < x i, 1, x i = y i ; ; g = Sort[x, #1 1 < # 1 &] 1 ; Print "f b (", g, ")=", g 1 Figure A.: Cuckoo search code

194 A.3. Miscellaneous 193 A.3 Miscellaneous halton = Function i, p, h = ; ff = 1; j = i; While j >, ff /= p; h += ff Mod j, p ; j = Floor j p ; ; h ; Figure A.3: Halton sequence code

195 Appendix B AGA Advanced Genetic Algorithm B.1 Brief introduction The program AGA 1) (Advanced Genetic Algorithms) can be used to solve the great majority of the single objective optimisation problems. It is also possible to solve multi-objective problems providing that it can be reduced to single objective. Short description of this program is presented below on the basis of two variables function optimisation f : R [, ] R given by equation (1.17) and shown in figure 1.1. The minimum value is ( min f = f [,], ) = 6. (B.1) The easiest way to find that minimum is to write a script file in a text editor in the following format: [VARIABLES] x = real [; 3.115] y = real [; 3.115] [FUNCTION] analytical = 1 [EQUATIONS] f = -5.*sin(x)*sin(y) - sin(5*x)*sin(5*y) [SHOW] arrange = 1 [OPTIONS] maximisation = loop_counter = 1 max_iteration = 1 [SELECTION] 1) krzyte/ga/aga.zip

196 B.. Detailed introduction 195 tournament = 1 tournament_size = 3 [CROSSOVER] probability =.7 arithmetical = 1 [MUTATION] probability =.15 nonuniform = 1 [POPULATION] size = 3 constatnt = 1 [PLOTS] Convergence = Average;CurrentMin D = Discrepancy Different_vs_^Entropy = Different;^Entropy Next step is to save it as test.txt and it can be run writing in the command line aga test.txt. Finally, a window will appear similar to figure B.1. Figure B.1: Main window AGA found the minimum called GlobalMin (see window named Statistics ) f(1.5778, ) = 6. It is very close to the exact value. If one does not want to use the command line then it is necessary to run AGA. A window like in figure B. can be seen. Next, option Open from menu File should be chosen and file test.txt highlighted and Open button pressed. B. Detailed introduction Instead of writing script file it is possible to define the optimisation problem manually. In order to do this menu File (figure B.3) should be chosen. New... defines a new problem

196 B. AGA Advanced Genetic Algorithm Figure B.: Empty main window Figure Menu File B.3: Figure B.

terminates the running process About? short note about the author Exit exits application When option New.

197 196 B. AGA Advanced Genetic Algorithm Figure B.: Empty main window Figure Menu File B.3: Figure B.: I/O console Open... opens an existing problem definition from file. There are two types of file *.txt script files (text) *.aga AGA internal files (binary) Close all closes all the windows and finishes our optimisation Save saves current problem Save as? saves current problem with a different name Run runs the optimisation process Terminate? terminates the running process About? short note about the author Exit exits application When option New... from menu File is chosen, dialog box called Initialise appears. It consists of five pages. First page Variables (figure B.5) is used in order to declare optimisation variables (in the case of function optimisation independent variables). Figure B.5: Dialog box Initialise first tab Figure B.6: Dialog box Initialise second tab There are three types of variables: Boolean logical variable. Two values are possible: (false) and 1 (true). Integer integer variable. Values from the subset of integer numbers [ 31

198 B.. Detailed introduction 197 ; 31 1] are possible. If an integer variable has been chosen it is necessary to specify the range of the desired subset. There are two edit windows: from and to. Real floating-point variable. Values from the subset of floating-point numbers [ ; ] are possible. In this case also it is necessary to specify the range of the desired subset. There are two edit windows: from and to. After filling them the range of the subset is given by [F rom; T o]. When the edit windows from and to are filled, the next step is to add variables to the list. In order to do this the button Add should be used. If one wants to remove a variable from the list then it should be highlighted and the button Remove pushed. In order to name a variable, it is enough to click on the empty place in the column Name. If the name is not specified the program will assume default name (x1, x,...). In our case we can name variables: x and y. Second page control Function (figure B.6) of the dialog box Initialise determines the entry method for the value fitness function. There are three possibilities: Manual enters individual values by hand Analytical analytical equation can be provided External automatically by external program (*.exe or *.bat file). Manual entry of the value fitness function is required for all the values that are not calculated yet during the optimisation process step. We decided to deliver value fitness function by analytical equation. In order to do this the following equation should be provided in the edit box f(...) = : -5.*sin(x)*sin(y) - sin(5*x)*sin(5*y) If description of the function is more complicated then one has to click the button I/O that will open I/O console (figure B.). For more detailed description see I/O Console section B.3. The function must be called f. If the value fitness function is delivered from an external program, the name of the text file must be specified into which AGA will send independent variables that are generated by GA. The external program must calculate value of the function for these variables ( Data for external program (variables from AGA) ). Default name of this file is input.txt. The name of executable (*.exe) program or *.bat file which is going to calculate these values must also be specified ( External program ). The external program has to save the calculated value to another text file. That text file will be received by AGA in order to continue optimisation process ( Data from external program (function value for AGA) ). Default name of this file is output.txt. This makes it possible to create a fully automated optimisation process. In our case example, the source code looks: #include <fstream. h> #include <math. h> using namespace std ; int main ( int / argc /, char argv [ ] ) { i f s t r e a m in ( argv [ 1 ] ) ;

199 198 B. AGA Advanced Genetic Algorithm } ofstream out ( argv [ ] ) ; double x1, x, f ; i n >> x1 >> x ; f = 5. s i n ( x1 ) s i n ( x ) s i n ( 5. x1 ) s i n ( 5. x ) ; out << f << endl ; return ; It calculates the value of function given by equation (1.17). Source code listed above should be compiled and the executable version should be placed into the folder together with AGA. Input data are received from the file specified as a first argument (argv[1]) and result saved into file specified as a second argument (argv[]). The AGA program will call the external program with command-line arguments, using the syntax: External.exe input.txt output.txt Third page control General (figure B.7) allows to choose population size and windows that will be seen during optimisation process Statistics history statistics for every iteration or for the last one (current) if it is unchecked will be seen Buffer individuals fitness function values that were already calculated can be buffer in order to save computation time. It works very well for integer and boolean variables. Show buffered shows buffered individuals with calculated fitness function values (if Buffer individuals has been chosen) Show population shows individuals of last generation. It is important for manual delivering of fitness function. Those values have to be input Population history as above (if Show population has been chosen) but it shows all the generations Show variables displays basic information about optimisation variables (Initial) population size the size of the population can be specify here. For variable population size method it is initial population size only (first step) In our case we set (Initial) population size to 3. Fourth page control Plots (figure B.8) makes it possible to specify the plots we want to see during optimisation process. One plot is predefined (for minimisation). It is plot called Convergence. The name we choose is inconsequential. If we do not name the plot then AGA will do it for us. In order to plot something we must input this in the window Plot and click the button Add. If we change our mind then we can remove it using the button Remove. We can plot variables or even equations. To remind ourselves of the standard variable names click the button? and we will see the window in figure B.9. In order to input our equation, variable names specified in the first column of this window should be used. If we want to see more than one plot in the same window then

8: Dialog box Initialise fourth tab Figure B.9: I/O console Figure B.

using ;. In our case we can add two additional plots.

button Add and finally change the plot s name to D The second plot

200 B.. Detailed introduction 199 Figure B.7: Dialog box Initialise third tab Figure B.8: Dialog box Initialise fourth tab Figure B.9: I/O console Figure B.1: Dialog box Initialise fifth tab it is necessary to separate equations using ;. In our case we can add two additional plots. To do it we input Discrepancy in the edit window Plot, then we click the button Add and finally change the plot s name to D The second plot consists of two graphs that show the number of different individuals and the function Entropy. In order create it we input Different;^Entropy and optionally change the name to

201 B. AGA Advanced Genetic Algorithm Different_vs_^Entropy Fifth page control Export (figure B.1) allows us to export variables or equations to file. The rules of exporting are the same as in page control Plots. There are two differences: we have to specify a single equation (we do not use ; to separate) and we can export vector variables (instead of scalar for plotting). In order to choose the name of export file name we input it to Export file name. Check box Every iteration allows us to export results from each iteration, or from the last only if it is left unchecked. It is also possible to modify the optimisation process options. In order to do this we choose either option Options from menu File or the relevant speed button. Dialog box Options consists of five pages. The first of them, labelled General (figure B.11), allows us to choose maximal number of iterations Max iteration and change the number of generations Loop counter. The number of generations is important only for external and analytical method of supplying the function value. We can decide here whether we maximise or minimise our problem Maximisation. AGA can optionally be closed when the optimisation is finished. To do this we choose Close AGA when finished. This is useful when we call AGA from other program or want to fully automate our computation. Figure B.11: Dialog box Options first tab Figure B.1: Dialog box Options second tab Second page Options (figure B.1) allows us to choose selection method. There are two methods so far: Tournament random tournament selection (default). We can specify the size of tournaments Tournament size. Default tournament size equals 3. Due to its virtues this method is recommended. Roulette roulette wheel selection. Roulette wheel method is a classical method of individual selection. It possesses more defects than virtues. Both methods fits either maximisation or minimisation problems. This means that roulette method is modified version of classical roulette. Third page Crossover (figure B.13) contains options that are necessary in order to change crossover method. One can also change Crossover probability. There are

B.. Detailed introduction 1 three crossover methods: Multipoint multipoint method. Points indicates the number of crossover points. Default number is 1 Arithmetical arithmetical crossover method.

1: Dialog box Options fourth tab Fourth page Mutation (figure B.

202 B.. Detailed introduction 1 three crossover methods: Multipoint multipoint method. Points indicates the number of crossover points. Default number is 1 Arithmetical arithmetical crossover method. This method is suitable for numerical optimisation Uniform exchanges all the genes with Uniform cross probability Figure B.13: Dialog box Options third tab Figure B.1: Dialog box Options fourth tab Fourth page Mutation (figure B.1) allows us to specify Mutation probability and to choose among mutation methods Uniform this method does not depend on current generation number (stage of optimisation) Nonuniform this method depends on current generation number. The later the generation the smaller the influence of mutation. We can also change the Uniformity coefficient Fifth page Population (figure B.15) lets us choose population size. There are two possibilities Constant population size does not depend on generation number is constant during optimisation process Variable population size varies during optimisation process Menu Window (figure B.16) consists of at least five options for windows manipulation Cascade Tile Horizontally Tile Vertically Minimize All Arrange All The speed toolbar (figure B.17) consists of six buttons. They are: New same as option New form menu File Open same as option Open from menu File Save same as option Save from menu File

B. AGA Advanced Genetic Algorithm Figure B.15: Dialog box Options fifth tab Figure B.

Vertically from menu Window Run same as option Run from menu File Terminate same as option Terminate from menu File Options same as options Options from menu File

18) for each set of independent variables listed out by the program. First we click on the desired line and then we click again on the?. Finally we can input a value.

203 B. AGA Advanced Genetic Algorithm Figure B.15: Dialog box Options fifth tab Figure B.16: Menu window Cascade same as option Cascade from menu Window Tile Horizontally same as option Tile Horizontally from menu Window Tile Vertically same as option Tile Vertically from menu Window Run same as option Run from menu File Terminate same as option Terminate from menu File Options same as options Options from menu File Figure B.17: Speed toolbar If we have decided to input function value manually then we have to supply these values in the window Population (figure B.18) for each set of independent variables listed out by the program. First we click on the desired line and then we click again on the?. Finally we can input a value. Figure B.18: Population window Figure B.19: Chart window Right mouse button clicked on chart window displays popup menu (figure B.19). This menu allows us to modify our chart: Save to file saves the chart as bitmap to file Copy to clipboard as copies the chart to clipboard as

Optimization Methods

Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available