GLOBAL SEARCH AS A SEQUENCE OF RATIONAL DECISIONS UNDER UNCERTAINTY Antanas ŽILINSKAS

Size: px

Start display at page:

Download "GLOBAL SEARCH AS A SEQUENCE OF RATIONAL DECISIONS UNDER UNCERTAINTY Antanas ŽILINSKAS"

Dorcas Golden
5 years ago
Views:

1 GLOBAL SEARCH AS A SEQUENCE OF RATIONAL DECISIONS UNDER UNCERTAINTY Antanas ŽILINSKAS Vilnius University Institute of Mathematics and Informatics, Lithuania Research Council of Lithuania, grant No. MIP-051/

2 PLENARY LECTURE AT NUMTA 2016 Numerical Computations: Theory and Algorithms Pizzo Calabro, Calabria, Italy June

3 Abstract. Statistical models for global optimization are substantiated and constructed. Global optimization algorithms are constructed implementing principles of the theory of rational decisions under uncertainty. An algorithm for global optimization of expensive black box functions is considered as a sequence of decisions under uncertainty where the rationality of decisions is postulated as the invariance of search with respect to the scaling of data used by the algorithm. 3

4 Contents Introductory remarks. Single-objective global optimization. Multi-objective global optimization. Open problems. 4

5 Introductory remarks 5

6 In global optimization prevail heuristic methods This conference is an exception 6

7 7

8 8

9 9

10 10

11 11

12 As a general rule, publication of papers on metaphor-based metaheuristics has been limited to second-tier journals and conferences, but some recent exceptions to this rule can be found. Sörensen (2013) states that research in this direction is fundamentally flawed. Most importantly, the author contends that the novelty of the underlying metaphor does not automatically render the resulting framework "novel". On the contrary, there is increasing evidence that very few of the metaphor-based methods are new in any interesting sense. Weyland (2013) demonstrates convincingly, that Harmony Search (Geem, 2001) is nothing more than a special case of Evolution Strategies (Beyer, 2002) in which each of the concepts of Evolution Strategies has been relabeled from an evolution-inspired term to a term inspired by musicians playing together. Even though the development of Evolution Strategies precedes that of Harmony Search by at least 30 years, the latter is proposed as an innovation and has by now attracted an impressively long list of follow-up research. 12

13 Why homo sapience imitates an animal? Why not to behave rationally? 13

14 RATIONAL DECISION MAKING: Choosing among alternatives in a way that properly" accords with the preferences and beliefs of an individual decision maker. The theory may be developed in axiomatic fashion from the axioms, in which philosophical justifications are given for each of the axioms. This approach, pioneered by Ramsey and de Finetti and developed into a useful mathematical and practical method by VON NEUMANN and Savage, uses real or hypothesized sets of actions to construct probability and utility functions that would give rise to the set of actions. 14

15 Class of problems, F, Class of algorithms, D, consisting of the sequence of functions d = (d i ( ), i = 1,..., N) which define the current point of computations of the value of objective function depending on the information used at the current step, The notion of uncertainty U and error ε(f, d), f F, d D. An algorithm which corresponds to the notion of rationality is defined as a solution of the following optimization problem ˆd = min d D U f F (ε(f, d)) 15

16 If you optimize everything, you will always be unhappy. D.Knuth 16

17 Single-objective global optimization 17

18 Black box optimization Mathematically the problem of optimization is formulated as f = min x A f(x), where f(x) is a nonlinear objective function of continuous variables f : R d R, A R d. Global minimum and minimum point should be found. Frequently the objective functions of practical problems are available as computer programs, and the properties of the objective function are difficult to elicit. We assume that the objective function values are given by a black box. 18

19 Deterministic models 19

20 ˆd = min d D U f F (ε(f, d)) U f F ( ) = max f F ( ) 20

21 Statistical model 21

22 A standard model of functions under uncertainty considered in the probability theory is a stochastic function. Frequently the terms stochastic process and random field are used for stochastic functions of one and several variables correspondingly. U f F ( ) = E f F; µ ( ) 22

23 Since theoretical properties of stochastic functions, explicitly presented in probabilistic textbooks, normally are not sufficient to substantiate the selection of a suitable statistical model, the visual analysis of sample functions of a candidate model can be helpful. 23

24 4 2 y x 24

25 y x 25

26 y x 26

27 y x

28 y x

29 y x

30 y x

31 y x

32 x y

33 The algorithm is defined by functions d i ( ) for computation of the point of current computation of an objective function value x i = d i (x j, y j, j = 1,..., i 1), i = 2,..., n, (1) and the functions for computing the point of the first computation x 1 which depends on the considered model and on n, and for computing the approximation of the global minimizer x on, which depends on the complete information extracted during the search. 33

34 An optimal algorithm (Bayesian) with respect to a stochastic function selected for a model of objective functions can be defined similarly as an optimal algorithm with respect to a deterministic model replacing the criterion of worst case error with the criterion of average error d = arg min d E(ξ(x on ) min ξ(x)), (2) x A where expectation is defined with respect to the probability measure defining the stochastic function in question, x on the random point in A found as an approximation to the minimizer of ξ(x) by the algorithm d after the fixed in advance number n of computations of function values. 34

35 The implementation of the optimal Bayesian algorithm is difficult, therefore various simplified versions were proposed. One of the most popular simplifications is named a one-step Bayesian algorithm, which at the current step is defined by the following formula x i = arg min x A E(min{ξ(x), y o,i 1} x j, y j, j = 1,..., i 1), (3) where E{ } denotes the condition average, and y o,i = min j=1,...,i y j. 35

36 The first global optimization algorithm based on a statistical model selects a current point for computation of function value according to the following formula x i = arg max x A P(ξ(x) y o,i 1 ε i x j, y j, j = 1,..., i 1), (4) y o,i = min y j, ε i > 0, (5) j=1,...,i meaning that it is aimed at maximal probability of the improvement of the current estimate of global minimum. Later this algorithm was substantiated axiomatically and was named the P-algorithm. 36

37 The rationality of the P-algorithm was defined by the formalization of the following informal assumptions The choice of the current point is not rational in the close vicinities of the points of previous computations of values of objective functions. The priority of points where large function values are expected with large probabilities is low. Aspirations for elicitation of global information and for local improvement do not dominate each other. 37

38 Any characterization of a random variable normally includes a location parameter (e.g., mean) and a spread parameter (e.g., standard deviation); we use a minimal description of ξ x by these two parameters which will be denoted by m(x) and s(x). The dependence of both parameters on the information available at the current optimization step (x j, y j, j = 1,..., i 1) will be included into the notations where needed. 38

39 Let us assume that the average utility u i (x) of computation of the i-th objective function value at the point x depends on x via m(x) and s(x). A desired target value ŷ o,i 1, ŷ o,i 1 < min 1 j i 1 y j, is also assumed as a parameter which defines u i (x): u i (x) = U(m(x), s(x), ŷ o,i 1 ), (6) and the point of current computation is defined as the maximizer of u i (x). 39

40 The following assumptions on U( ) express rationality of the utility as invariance with respect to the scales of the objective function values: U(m + c, s, z + c) = U(m, s, z), U(m C, s C, z C) = U(m, s, z), C > 0. 40

41 Since the utility function is used for the construction of a minimization algorithm, a smaller function value obtained at the current optimization step is considered more desirable than a larger one; therefore we postulate that m < µ implies U(m, s, z) > U(µ, s, z). (7) 41

42 The analysis above suggests the construction of a global optimization algorithm, the i-yh iteration of which is defined by the solution of the following maximization problem: ) (ŷo,i 1 x i = max P m(x x j, y j, j = 1,..., i 1). (8) x A s(x x j, y j, j = 1,..., i 1) 42

43 An objective function intended to minimize normally is not a completely black box, since some information on f( ) is available, e.g. from the similar problems solved in the past. Let values of the considered function be observed at some points x i, but f(x) be not yet observed. The weakest assumption on available information seems to be the comparability of likelihood of inclusions f(x) Y 1 and f(x) Y 2 where Y 1, Y 2 are arbitrary intervals. If it seems more likely that f(x) Y 1 than f(x) Y 2, such a subjective assessment will be denoted Y 1 x Y 2. 43

44 The axioms on rationality of comparative (subjective) likelihood imply the existence of a probability density compatible with the axioms, i.e. the existence of a probability density p x (t) such that Y 1 x Y 2 p x (t)dt Y 1 p x (t)dt. Y 1 Correspondingly, the family of random variables ξ(x), x A, with probability densities p x ( ) is acceptable as a statistical model of objective functions. Generally speaking, the distribution of ξ(x) is not necessary Gaussian. 44

45 Examples 45

46 We consider the planning of i-th computing of the objective function value. The feasible region is partitioned using the Delaunay triangulation. The optimal triangle is selected, where optimality was defined by the assumptions of the rationality of search. The value of the objective function is computed at the center of the selected triangle 46

48 Some results needed for the further development are presented in A. Žilinskas, G. Gimbutiene On an Asymptotic Property of a Simplicial Statistical Model of Global Optimization ρ k = y o,i 1 m k ε i T k, (9) 48

49 The main theorem basically says that for large enough i, we have i = y o,i min x A f(x) exp( i C), (10) where C depends on the condition number for f(x) at global minimum point, but not on i. Thus the convergence rate is exponential in i, roughly speaking. 49

50 A quadratic function f(x 1, x 2 ) = (x 1 + x 2 1) (x 1 x 2 ) 2, 1 x 1, x 2 1. The Rastrigin function f(x 1, x 2 ) = x x 2 2 cos(18x 1 ) cos(18x 2 ), 2 x 1, x 2 2. The Branin function ( f(x 1, x 2 ) = x π 2 x x ) 2 ( 1 π ) cos(x 1 ) + 10, 8π 5 x 1 10, 0 x The six-hump camel back function f(x 1, x 2 ) = ( 4 2.1x x 4 1/3 ) x x 1 x 2 + ( 4 + 4x 2 2) x 2 2, 3 x 1 3, 2 x

60 To exclude the influence of the coincidence of specific properties of the algorithm and a test problem each problem has been optimized 100 times with bounds perturbed by random numbers uniformly distributed in the interval of 10% of the length of the corresponding ranges of variables. 60

61 The dependence of the averaged errors (in the base-10 logarithmic scale, n = log 10 {(y on f min )}) on the number of function evaluations performed is presented in the next slide. For comparison these test functions have also been minimized by the well-known global optimization code DIRECT, and by the simplicial partition algorithm based on a statistical model. 61

62 5 0 Proposed DIRECT Simplicial 2 0 Proposed DIRECT Simplicial Proposed 2 DIRECT Simplicial Proposed 2 DIRECT Simplicial Figure 1: Comparison of normalized errors for different test functions; the quadratic function (top left), the Rastrigin function (top right), the Branin function (lower left), and the camel back function (lower right). 62

63 The figures corroborate the high convergence rate of the proposed algorithm predicted by the theoretical analysis. 63

64 Multi-objective global optimization 64

65 Nonlinear multi-objective optimization is very active research area. Depending on the properties of a multi-objective optimization problem, different approaches to its solution can be applied. The best direction developed is optimization of convex problems; for the latter problems the methods that generalize the ideas of classical mathematical programming suit well. For the problems with the objectives not satisfying the assumption of convexity, metaheuristic methods are frequently favorable. However, there remains a class of important problems without sufficient attention of researchers: the problems with black-box, multimodal, and expensive objectives. 65

66 Black box multi objective optimization Mathematically the problem of optimization is formulated as f = min x D f(x), where f(x) is a nonlinear objective function of continuous variables f : R n R m, D R n. Pareto optimal solutions should be found. Frequently the objective functions of practical problems are available as computer programs, and the properties of the objective function are difficult to elicit. We assume that the objective function values are given by a black box. 66

67 67

68 The mathematical model of the weighted sum method takes the form of: min f(x), f(x) = m i=1 where ω i is the weight of i-th criterion, 0 ω i 1, i = 1,..., m, and m i=1 ω i = 1. ω i f i (x), (11) 68

69 Weighted sum scalarization theoretically is validated for convex problems. The other versions of scalarization, e.g. lexicographic goal programming algorithm are developed to generate solutions in the Pareto set of non-convex problems. For the problems with the objectives not satisfying the assumption of convexity, metaheuristic methods are frequently favorable. 69

70 Worst-case optimal multi-objective optimization algorithm for Lipshitzian problems. 70

71 The limitation in collecting general information about the function, and particulary about its minima, strongly requires the rationality in distribution the points where to compute the objective function values. Therefore the algorithms justified by the principles of rational decision theory here are of especial interest. 71

72 Theoretically the solution to a multi-objective optimization problem consists of two sets: P(f) O - the Pareto optimal solutions in the space of objectives, and P(f) D - Pareto optimal decisions in D. Analytically these sets can be found only in very specific cases. We are interested in the approximation of P(f) O, although most frequently it suffices to find several desirable points in the mentioned sets. 72

73 A class of Lipshitz objective functions is considered, i.e. f k (x) Φ(L k ) iff f k (x) f k (z) L k x z D, where x D, z D, L = (L 1,..., L m ) T, L k > 0, k = 1,..., m, and D is the Euclidean norm in R d. The feasible region D is supposed to be compact. To simplify the formulae below, let us change the scales of function values so that L k = 1/ m, k = 1,..., m. 73

74 An algorithm consists of two sub-algorithms. The first sub-algorithm selects the points x i D, and computes f(x i ), i = 1,..., n. The second sub-algorithm computes an approximation of P(f) O using the output of the first sub-algorithm. The number of points n is supposed to be chosen in advance. Such an assumption seems reasonable in the case of expensive objectives where the lengthy computation of a single value of the vector of objectives essentially restricts the permissible value of n. To exploit rationally an essentially restricted resource it is important to know its limit in advance. 74

75 Two versions of the first sub-algorithm are considered. The passive (non-adaptive) algorithm selects x i, i = 1,..., n, only taking into account information on the problem, i.e. on (n, D). The sequential (adaptive) algorithm consists of the algorithmic steps which compute x i taking into account the points and function values computed at the previous steps: x 1 = x 1 (n, D), x i = x i (n, D, x j, f(x j ), j = 1,..., i 1), i = 2,..., n. The result of the first sub-algorithm is (X n, Y n ), where X n = (x 1,..., x n ), and Y n = (f(x 1 ),..., f(x n )). The second sub-algorithm computes an approximation of P(f) O using (X n, Y n ) as an input. 75

76 Informally, a solution is defined by a lower bound for P(f) O, and an upper bound for P(f) O. The trivial upper bound for P(f) O is the south-west boundary of the set W(Y n ) which is the subset of R m, dominated by at least one solution belonging to Y n : W(Y n ) = n i=1 {y : y R m, y > f(x i )}, where y > z is valid iff y i > z i, i = 1,..., m. 76

77 We aim at the minimization of the gap between the lower and upper bounds. The lower bound is constructed excluding possibly a large set of vectors which can be proved to be non-realizable. For a discrete realizable support of lower bound it is supposed to select a subset of Y n. 77

78 The approximation for P O (f) based on Y n consists of a lower bound for P O (f), denoted by ˇP O (f), which consists of vectors not dominated by P O (f), a discrete realizable support, denoted by ˆP O (f), constituted of those f(x i ), the maximum feasible vicinity f(x i ) of which, intersects with the trivial lower bound. 78

79 Let us define an approximation, taking into account (X n, Y n ), and assuming that the objective functions belong to the sub-class of Lipshitz functions with the Lipshitz constants defined above. We define ˇP O (f) as follows: ˇP O (f) = {z : min 1 i n z f(x i) = r(x n )} (R m \ W(Y n )), r(x n ) = max x A min x x i D. 1 i n The set ˆP O (f) is defined as the subset of {f(x i ), i = 1,..., n}, such that {z : z f(x i ) r(x n )} (R m \ W(Y n )). 79

80 The approximation error is bounded by (f) = max y P O (f) max{ min y z O, z ˆP O (f) min y z O }, z ˇP O (f) (12) where O denotes the Euclidean norm in R m. The optimal algorithm minimizes (f) for the worst case function f. 80

81 Theorem. The worst-case optimal (passive/adaptiva) multi-objective optimization algorithm selects the points x 1,..., x n, at the centers of n balls of minimum radius which cover the feasible region D. The minimum worst-case error is equal to the radius r of the balls of the optimal cover. The worst-case objective functions ϕ( ) and φ( ) are defined by the following formulae: ϕ k (x) = c k, x D, k = 1,..., m, φ k (x) = max 1 i n φ k(x i ) ( x x i D ), k = 1,..., m. 81

82 A surprising result is that the components of the worst-case vector objective function coincide up to linear scaling. Such a coincidence means that the worst-case model of multi-objective global optimization degrades to the single objective model. 82

83 The idea of the one-step optimal algorithm is to tighten the Lipschitz lower bounds for the non-dominated solutions, and to indicate the subsets of D with dominated objective vectors. Let us consider the n + 1 optimization step: ε n = x n+1 = arg min max Y ˇP O (f) max x D min Z ˆP O (f) y n+1 ε n+1, Y Z, where ˇP O (f) and ˆP O (f) are lower and upper bounds of the Pareto set. 83

84 Rational decision theory based multi-objective optimization algorithm. 84

85 Similarly to the case of single-objective optimization we assume that the utility of choice of the point for the current computation of the vector value f(x) has the following structure u n+1 (x) = U(m(x), Σ(x), y o,n ), (13) where m(x) = (m 1 (x),..., m m (x)) T, and y o,n denotes a vector desired to improve. 85

86 At the current optimization step a point for computing the value of f(x) is sought by an optimization algorithm which maximizes u n+1 (x) which should be invariant with respect to the scales of data. Such a rationality assumption can be expresses by the following properties of U( ): U(m + c, Σ, z + c) = U(m, Σ, z), c = (c 1,..., c m ) T, U(C m, C Σ, C z) = U(m, Σ, z), C i > O, C C = C m 86

87 Theorem A function which satisfies assumptions (14) has the following structure ( z1 m 1 U(m, Σ, z) = π,..., z ) m m m. (14) s 1 s m If moreover, assumption of monotonicity is satisfied then π( ) is an increasing function of all its variables. 87

88 As an example, the bi-objective π-algorithm has been implemented. A product of two arctangents was used for π( ). Then the n + 1 step of the π-algorithm is defined as the following optimization problem ( x n+1 = arg max arctan( y1 on m 1 (x) ) + π ) x D s 1 (x) 2 ( arctan( y2 on m 2 (x) ) + π s 2 (x) 2 ), 88

89 Examples 89

90 The vector objective function is composed of the Rastrigin function f 1 (x) = (x + 0.5) 2 cos(18(x + 0.5)), f 2 (x) = (x 0.5) 2 cos(18(x 0.5)), 1 x 1. Rastrigin function is frequently used for testing global optimization algorithms. 90

91 The feasible objective region of the Rastrigin problem, and the points generated by MC

92 Test problem proposed by Fonseca and Fleming (1995) f 1 (X) = 1 exp( f 2 (X) = 1 exp( d (x i 1/ n) 2 ), i=1 d (x i + 1/ n) 2 ), i=1 4 x i 4, i = 1,..., d. (15) 92

93 Rastrigin problem. The points generated by the P-algorithm and the π-algorithm

94 The points generated by the optimal one-step algorithm in the feasible objective regions of Ras (left) and Fon (right) correspondingly. The red lines show the local Lipshitzian lower bound for the Pareto front

95 The vector objective function is composed of two Shekel functions f(x) = (f 1 (x), f 2 (x)) T, x = (x 1, x 2 ) T, 0 x 1 1, 0 x 2 1, 1 f 1 (x) = (x 1 0.1) 2 + 2(x 2 0.1) (x ) 2 + (x ) 2, 1 f 2 (x) = (x ) 2 + (x ) (x 1 0.3) 2 + (x ) 2. Such test functions are frequently used for testing global optimization algorithms. 95

96 Contours of the objective functions

97 Feasible objective region (Shekel), and points generated by MC

98 Shekel problem. The points generated by the P-algorithm and the π-algorithm

99 Algorithm P-algorithm Modified P-algorithm Problem Problem Ras Problem Fon Problem Ras Problem Fon NP NN MD DP Algorithm One-step opt. Uniform Problem Problem Ras Problem Fon Problem Ras Problem Fon NP NN MD DP

100 Results of search by π-algorithm z o = ( 9, 9)

101 Results of search by π-algorithm z o = ( 6, 6)

102 1 Second objective First objective Figure 2: The feasible region of problem Fon. 102

103 Figure 3: The points generated by the RUS in the objective feasible region of the problem Fon. 103

104 1 Second objective First objective Figure 4: The points generated by the π-algorithm in the objective feasible region of problem Fon. 104

105 Visualization: a case study of an optimal process design. Pressure swing adsorption (PSA) is a cyclic adsorption process for gas separation and purification. PSA systems have the potential of achieving a higher productivity for CO 2 capture than alternative separation processes, such as absorption. However, an efficient and cost-competitive PSA unit is one that achieves high levels of purity and recovery of the product: two competing objectives should be taken into account. This PSA system is considered for the recovery CO 2 from the flue gas in a power plant. With the pressure equalization step the CO 2 purity can be enriched at the price of a small increase in power consumption. 105

106 Vent Vent tank BH V3 V6 BH V7 Bed 1 Vaccum Tank Bed 2 BH BH V2 V5 V1 V4 Feed tank Feed Figure 5: 2-bed/6-step PSA Skarstrom Cycle 106

107 Bed 1 Heavy Product Heavy Product Feed Feed CnD LR LEE FP F LEE Light Product Purge Purge Purge Light Product Purge FP F LEE CnD LR LEE Feed Bed 2 Feed Heavy Product Heavy Product Figure 6: 2-bed/6-step PSA Skarstrom Cycle 107

108 The computational model used relies on the finite volume scheme using 40 volume elements with a Van Leer flux limiter. The simulation time for a single design configuration ranged between 10 minutes to an hour, depending on the design configuration used. 108

109 The described optimal design problem can be mathematically formulated as max x A f(x), f( ) = (f 1 ( ), f 2 ( )) T, x = (x 1,..., x 6 ) T, A = {x : 0 x j 1}, where x j, j = 1, 2,..., 6 denote optimization variables (decision parameters) re-scaled to the unit interval, f(x) is the vector objective function the first component of which f 1 (x) is CO 2 Purity, and the second component f 2 (x) is CO 2 Recovery, and the optimization result is the Pareto front of the formulated bi-objective optimization problem. 109

110 The application of the statistical multi-objective algorithm to the problem considered resulted in computing of N = 1584 two dimensional vectors of objectives y i at the points x i, i = 1,..., N, the components of which belong to the unit interval: 0 x ij 1 where the index j, (1 j 6), denotes the number of component of the i-th vector. The subset of y i, i = 1,..., N constituted of non-dominated points represents the Pareto front of the considered problem F P ; it consists of N P =179 two-dimensional vectors; the corresponding subset of x i, i = 1,..., N is denoted by X P. 110

111 1 1 I f II f 1, f III f Ordered indices Figure 7: a) Pareto front of the considered problem; b) the graphs of f 1 (x i ) and f 2 (x i ), x i X P with respect to the reordered indices 111

112 By a visual analysis of graphs in Figure 7 the decision X X P can be selected which corresponds to a favorable point of X P. However, such a choice is not always satisfactory since it does not pay respect to such properties of the corresponding decision as e.g. the location of the selected decision vector in the feasible region A. The analysis of the location of the set of efficient points in A can be especially valuable in cases of structural properties of the considered set important for the decision making. For example, some subsets of A might not be forbidden but may be unfavorable, and that property may not be easy to introduce into mathematical model. The analysis of the properties of the set of efficient points can enable the discovery of latent variables, a relation between which essentially defines the Pareto front. 112

113 Figure 8: Two-dimensional image of vertices of the six-dimensional hypercube 113

114 0 I II III 1 Figure 9: The image of the discrete representation of the Pareto set and of vertices of the six-dimensional hypercube; the markers 0 and 1 show the location of minimizers of f 2 ( ) and f 1 ( ) respectively. 114

115 x 2 f 1, f Ordered indices t Figure 10: a) The dependency of x 2 on indices of ordered points in the Pareto front; b) the graphs of f 1 (x(t)) and f 2 (x(t)), where values of t correspond to the kink of the Pareto front 115

116 0.5 I II 1 x 2, x III t 0 Figure 11: a) the dependency of x 2 (increasing curve) and of x 4 on t, 0.15 t 0.4; b) image of the discrete representation of a part of set of Pareto optimal decisions; the markers 0 and 1 shows location of the vertices (0, 1, 1, 0, 0, 1) T and (0, 0, 1, 0, 0, 1) T correspondingly 116

117 Before the analysis presented above, we expected that a high-temperature adsorbent bed would be advantageous to achieve a high CO 2 content in the product stream. Also, we expected that a purge-to-feed would reduce the CO 2 content of the heavy product. We also knew that the amount of feed gas that enters the system is linear with respect to the feed/purge time, x 2, which explains why x 2 displayed such a substantial impact on the performance. For example, long cycle times typically diminish the recovery but lead to a higher purity of the heavy product as the adsorbent beds approach the saturation limit. The feed flow rate, x 3, which also affects the amount of feed gas that enters, is not as influential, but partially this is because the design constraint given for the flow rate in the design problem. 117

118 It is however quite surprising in this case that when comparing the optimal design with respect to the purity, against the recovery, on the most interesting segment of the Pareto curve, they only disagree on the feed/purge time, whereas for the other design parameters they concur. Hence, the visualization method revealed that the purity and recovery is only weakly conflicting; in this case the two objectives only contested each other on the feed/purge time. 118

119 Open problems 119

120 Choice of a stochastic function adequate to the available information on the problem in question. Estimation of parameters of the selected (Gaussian) stochastic function. Difficulties in numerical computations (ill defined covariance matrices). Absence of well substantiated testing methodology of algorithms aimed at expensive problems. 120

121 121

Some Background Material

Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important