Experiments with classification-based scalarizing functions in interactive multiobjective optimization

European Journal of Operational Research 175 (2006) 931 947 Decision Support Experiments with classification-based scalarizing functions in interactive multiobjective optimization Kaisa Miettinen a, *, Marko M. Mäkelä b, Katja Kaario b a Helsinki School of Economics, P.O. Box 1210, FI-00101 Helsinki, Finland b Department of Mathematical Information Technology, P.O. Box 35 (Agora), FI-40014 University of Jyväskylä, Finland Received 13 July 2004; accepted 15 June 2005 Available online 24 August 2005 www.elsevier.com/locate/ejor Abstract In multiobjective optimization methods, the multiple conflicting objectives are typically converted into a single objective optimization problem with the help of scalarizing functions and such functions may be constructed in many ways. We compare both theoretically and numerically the performance of three classification-based scalarizing functions and pay attention to how well they obey the classification information. In particular, we devote special interest to the differences the scalarizing functions have in the computational cost of guaranteeing Pareto optimality. It turns out that scalarizing functions with or without so-called augmentation terms have significant differences in this respect. We also collect a set of mostly nonlinear benchmark test problems that we use in the numerical comparisons. Ó 2005 Elsevier B.V. All rights reserved. Keywords: Multiple objective programming; Classification; Interactive methods; Test problems; Guaranteeing Pareto optimality 1. Introduction The aim in solving multiobjective optimization problems is to find a solution that simultaneously gives the best possible value for several conflicting objectives see, for example, [2,6,9,21] for further information and further references with special emphasis on nonlinear problems. In multiobjective optimization problems, we can identify with mathematical tools a set of compromise solutions, so-called Pareto optimal solutions but a human decision maker and her or his preference information is typically needed for identifying the most satisfactory compromise among them. A widely-used class of multiobjective optimization methods * Corresponding author. Fax: +358 9 431 38 535. E-mail addresses: miettine@hse.fi (K. Miettinen), makela@mit.jyu.fi (M.M. Mäkelä), kaario@mit.jyu.fi (K. Kaario). 0377-2217/$ - see front matter Ó 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2005.06.019

932 K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 is that of interactive methods. In interactive methods, the decision maker takes actively part in the solution process and directs the search according to her or his preferences. Interactive methods are useful in solving many types of multiobjective optimization problems because the decision maker has a possibility to learn about the problem considered during the solution process. In addition, they are computationally efficient because only those Pareto optimal solutions are generated that are of interest to the decision maker. Some interactive methods allow the decision maker even to change her or his mind as (s)he learns more about the behaviour of the problem during the solution process. Many different types of interactive methods have been developed, see, for example, [2,6,9,21] for collections of methods and further references. The methods differ in the way the decision maker specifies preference information, receives information from the method and how preference information is used. Typically, preference information is used in converting the multiple objectives into a nonlinear single objective optimization problem, where the objective function is called a scalarizing function. These functions may be constructed in many ways see, for example, [1,9,19,22 24]. Because the human decision maker plays a crucial role in interactive methods, it is important that (s)he feels comfortable using them. The dialogue between the method and its user should be easily understandable and should not set cognitive burden on the decision maker. Classifying objective functions is a psychologically acceptable task for the decision maker [7]. In classification, the decision maker is shown the current values of each objective function and (s)he is asked to say which objective functions have acceptable values, which should be improved and which could impair. In this way, the decision maker can move around the Pareto optimal set towards more satisfactory solutions. By using classification we can avoid, for example, the well-known drawbacks related to expressing preference information in the form of weighting coefficients [9]. A reference point is a point consisting of desirable or acceptable values for the decision maker. It is shown in [14] that classification is related to reference points, in other words, a reference point can be formed once objective functions have been classified. Earlier studies [1,14] have shown how different reference point based scalarizing functions may produce very different solutions based on the same input. This is not a desirable property because the functioning of the method becomes dependent on which scalarizing function is used. The aim of this paper is to study three classification-based scalarizing functions and their behaviour in nonlinear multiobjective optimization. On one hand, we are interested in how they differ theoretically as well as how well they obey the classification information, that is, how well they satisfy the hopes of the decision maker. On the other hand, we compare computational costs. This opens us an interesting topic not widely studied in the literature, namely the difference between guaranteeing weak Pareto optimality or Pareto optimality. Many scalarizing functions produce only weakly Pareto optimal solutions but such solutions are not necessarily interesting for the decision maker because some objective function may be improved without impairing any of the others. Guaranteeing Pareto optimality often needs solving another optimization problem. Alternatively, many scalarizing functions may be augmented so that weakly Pareto optimal solutions can be avoided. In this case, the functions generate solutions with bounded trade-offs. Usually, in the literature these two are considered as alternatives but less is said about their computational differences. That is why it is time to study also this aspect, in other words, how guaranteeing Pareto optimality is reflected in the computational costs. Sometimes, when testing multiobjective optimization methods or for some other reasons, it would be useful to have a collection of benchmark test problems available where the objective functions are guaranteed to be conflicting. Well-known benchmarks exist in many optimization areas, like linear programming or global optimization, but not in multiobjective optimization. Naturally, comparing the solutions of multiobjective optimization problems is not trivial because of the important role of the decision maker but still a suite of test problems would be valuable. A nice collection of such problems is given in [6], but it is out-ofprint. That is why we here, besides comparing the scalarizing functions, also collect summaries of 12 small

K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 933 test problems from the literature involving three to seven objective functions and two to ten variables. Naturally, our numerical experiments are based on using these test problems. The rest of this paper is organized as follows. We present some concepts and notations in Section 2. In Section 3, we introduce three classification-based scalarizing functions and the related single objective subproblems tested together with some theoretical results for them. Then we introduce the numerical experiments realized and summarize their results in Section 4. The paper is concluded in Section 5. 2. Concepts and notations We handle multiobjective optimization problems of the form minimize ff 1 ðxþ; f 2 ðxþ;...; f k ðxþg ð2:1þ subject to x 2 S involving k (P2) conflicting lower semicontinuous objective functions f i : R n! R that we want to minimize simultaneously. The decision (variable) vectors x =(x 1, x 2,..., x n ) T belong to the nonempty compact feasible region S R n. Objective (function) values form so-called objective vectors f (x) =(f 1 (x), f 2 (x),..., f k (x)) T and the image of the feasible region is denoted by Z = f(s) R k. In multiobjective optimization, objective vectors are regarded as optimal if none of their components can be improved without deterioration to at least one of the other components. Definition 2.1. A decision vector x* 2 S is Pareto optimal if there does not exist another x 2 S such that f i (x) 6 f i (x*) for all i =1,..., k and f j (x) <f j (x*) for at least one index j. On the other hand, x* isweakly Pareto optimal if there does not exist another x 2 S such that f i (x) <f i (x*) for all i =1,..., k. An objective vector is (weakly) Pareto optimal if the corresponding decision vector is (weakly) Pareto optimal. Under the assumptions mentioned in the problem formulation, we know that Pareto optimal solutions exist (see [21], Corollary 3.2.1) and the set of Pareto optimal solutions is a subset of weakly Pareto optimal solutions. Pareto optimality corresponds to the intuitive idea of a compromise. However, we deal with both the definitions because weakly Pareto optimal solutions are often computationally more convenient to produce than Pareto optimal solutions. Note that Definition 2.1 introduces global optimality. Computationally, it may, however, be difficult to generate globally optimal solutions if the problem is not convex (that is, if some of the objective functions or the feasible region are not convex). In that case, the solutions obtained may be only locally optimal unless a global solver is available. The ranges of the objective functions among Pareto optimal solutions provide valuable information if the objective functions are bounded over the feasible region. The components z H i of the ideal objective vector z % 2 R k are obtained by minimizing each of the objective functions individually subject to the feasible region. Lower bounds of the Pareto optimal set are, thus, available in z %. Sometimes, a vector strictly better than z % is required. This vector is called a utopian objective vector and denoted by z %%. In practice, the components of the utopian objective vector are calculated by subtracting some small positive scalar from the components of the ideal objective vector [9]. The upper bounds of the Pareto optimal set, that is, the components of a nadir objective vector z nad, are usually difficult to obtain. Unfortunately, there exists no constructive way to obtain the exact nadir objective vector for nonlinear problems. It can, however, be estimated using a payoff table but the estimate is not necessarily too good (see e.g., [9]). In multiobjective optimization, we need a human expert, a decision maker, to find the best compromise solution according to her/his preferences. The decision maker is expected to know the problem domain and

934 K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 to be able to express preference information related to the problem. In our studies, we assume that less is preferred to more in the decision makerõs opinion. 3. Scalarizing functions and related subproblems In this section, we give some background for the classification-based scalarizing functions to be used in this paper as well as introduce some of their theoretical properties. We start with the general idea of classification in multiobjective optimization. The idea of interactive classification-based multiobjective optimization methods is that the decision maker examines the values of the objective functions calculated at a current Pareto optimal decision vector x c and classifies the objective functions into different classes. This means that the decision maker is asked to indicate (by means of classification) what kind of a solution would be more satisfactory than the current one. In this paper, we use up to five different classes and they are for functions f i whose values should be decreased (i 2 I < ), should be decreased down till some aspiration level (i 2 I 6 ), are satisfactory at the moment (i 2 I = ), are allowed to increase up till some upper bound (i 2 I P ), and are allowed to change freely (i 2 I ). The decision maker is asked to specify the aspiration levels z i for i 2 I 6 satisfying z i < f i ðx c Þ and the upper bounds e i for i 2 I P such that e i > f i (x c ). The difference between the classes I < and I 6 is that functions in I < are to be minimized as far as possible but functions in I 6 only till the aspiration level. The idea is that with classification, the decision maker directs the solution process in order to find the most preferred Pareto optimal solution. (Note that, as mentioned in Section 2, problem (2.1) has at least one Pareto optimal solution.) That is why a classification is feasible only if at least some of the objective functions should improve and some is allowed to increase from their current levels, that is, I < [ I 6 5 ; and I P [ I 5 ;. We can also assume that in a feasible classification, an objective function that has reached its ideal value cannot be assigned to either of the classes I < or I 6. Based on the classification information expressed by the decision maker, we can form different scalarizing functions consisting of the original objective functions and the preference information specified. Here we treat three different scalarizing functions and the related single objective subproblems. The first of them, to be called subproblem A, is of the form minimize max i2i < j2i 6 " # f i ðxþ z H i max½f j ðxþ z j ; 0Š ; jz H i j jz H j j subject to f i ðxþ 6 f i ðx c Þ for all i 2 I < [ I 6 [ I ¼ ; f i ðxþ 6 e i for all i 2 I P ; x 2 S. If jz H i j 6 d with some small d > 0, we replace jz H i j by 1 in the denominator. Theorem 3.1. The solution of (3.1) is weakly Pareto optimal if I < 5 ;. Proof. The result is proved in [9, p. 202]. h Subproblem A has been used in the interactive classification-based NIMBUS method [9,13,14]. However, due to Theorem 3.1, we cannot guarantee the Pareto optimality of the solutions generated. Next, ð3:1þ

we use subproblem A as the starting point for developing a new subproblem. In this subproblem B (to be used in the synchronous version of the interactive NIMBUS method [15]), we change the formulation so that Pareto optimality can be guaranteed. This is realized by removing the latter max-term and adding a so-called augmentation term to the scalarizing function. Subproblem B is of the form " # minimize max i2i < j2i 6 f i ðxþ z H i z nad i z HH i ; f jðxþ z j z nad z HH j j þ q Xk subject to f i ðxþ 6 f i ðx c Þ for all i 2 I < [ I 6 [ I ¼ ; f i ðxþ 6 e i for all i 2 I P ; x 2 S; i¼1 z nad i f i ðxþ z HH i where q > 0 is a small scalar. Besides the augmentation term, we have also changed the denominators. The new denominators are based on the (approximated) ranges of the original objective functions and they scale each term in the objective function of (3.2) to be of a similar magnitude. This aims at capturing the preferences of the decision maker better. Note that by the definitions of utopian and nadir objective vectors we have z nad i K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 935 z HH i > 0 for all i =1,..., k. Theorem 3.2. The solution of (3.2) is Pareto optimal. Proof. Let x 2 S be the optimal solution of problem (3.2). Let us assume that it is not Pareto optimal. In this case there exists another vector ^x 2 S such that f i ð^xþ 6 f i ðx Þ for all i =1,..., k and for at least one index j is valid f j ð^xþ < f j ðx Þ. Because x is feasible for (3.2), we have f i ð^xþ 6 f i ðx Þ 6 f i ðx c Þ for i 2 I < [ I 6 [ I = and f i ð^xþ 6 f i ðx Þ 6 e i for i 2 I P. This means that ^x is feasible for (3.2). Let us, for simplicity, ignore the positive denominators z nad i z HH i in what follows. Now we have f i ð^xþ z H i 6 f i ðx Þ z H i for i 2 I < and f i ð^xþ z i 6 f i ðx Þ z i for i 2 I 6. This implies the relation max i2i < [I 6½ðf ið^xþ z H i Þ; ðf i ð^xþ z i ÞŠ 6 max i2i < [I 6½ðf iðx Þ z H i Þ; ðf i ðx Þ z i ÞŠ. On the other hand, we have q > 0 and, thus, q P k i¼1 f ið^xþ < q P k i¼1 f iðx Þ. This means that x cannot be the optimal solution of problem (3.2). This contradiction implies that x has to be Pareto optimal. h When we formulated subproblem B to guarantee Pareto optimality, we had to somehow weaken the way how aspiration levels are taken into account. In other words, the max-term hindering the solution to get beyond the aspiration level for functions in the class I 6 in (3.1) had to be removed. On the other hand, in both subproblems A and B the solution generated may easily have a better value than the corresponding aspiration level because this direction is not bounded in any way. Thus, it could be interesting to consider aspiration levels even in a more strict way so that objective function values would not exceed aspiration levels in class I 6. In what follows, we shall refer to this idea as obeying aspiration levels. This is realized in subproblem C, which is of the form f i ðxþ z H i minimize ; jf jðxþ z j j f i ðx c Þ z H i f j ðx c Þ z j max i2i < j2i 6 subject to f i ðxþ 6 f i ðx c Þ for all i 2 I < [ I 6 [ I ¼ ; f i ðxþ 6 e i for all i 2 I P ; x 2 S. ð3:2þ ð3:3þ

936 K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 Note that because of the definition of a feasible classification, none of the denominators can be equal to zero. The unfortunate drawback of subproblem C is that it cannot guarantee even weak Pareto optimality if aspiration levels are used. Theorem 3.3. The solution of (3.3) is weakly Pareto optimal if I 6 = ;. Proof. Let x 2 S be the optimal solution of problem (3.3) when I 6 = ;. Let us assume that x is not weakly Pareto optimal. Then there exists a vector ^x 2 S such that f i ð^xþ < f i ðx Þ for all i =1,..., k. Since x is feasible for (3.3), we have f i ð^xþ < f i ðx Þ 6 f i ðx c Þ for i 2 I < [ I = and f i ð^xþ < f i ðx Þ 6 e i for i 2 I P. In other words, ^x is feasible for (3.3). Since the classification has to be feasible, the assumption I 6 = ; implies that I < 5 ; and we have f i ð^xþ z H i f i ðx c Þ z H i < f iðx Þ z H i f i ðx c Þ z H i for i 2 I < (note that the denominator cannot be negative). This means that x cannot be the optimal solution of problem (3.3) and, thus, x has to be weakly Pareto optimal. h The next simple example briefly illustrates the role of the assumptions in Theorems 3.1 and 3.3. Example. Let us consider a problem with n = k =2,f 1 (x) =x 1 +1,f 2 (x) = x 1 x 2 + 3 and S ={x 2 R 2 j0 6 x 1, x 2 6 1}. Then we have z % = (1, 1) and z nad = (2, 2). Let us start from a Pareto optimal point x c = (1, 1), where f(x c ) = (2, 1), with the classification I < = {1} and I = {2}. Note that this classification fulfills the assumption of Theorem 3.1. Then subproblem A reduces to minimize x 1 subject to x 2 S, which has minima at x 1 = 0 and x 2 2 [0, 1]. The solution x = (0, 1) is Pareto optimal and the others are weakly Pareto optimal. Thus, it is up to the single objective optimization method, which solution is generated. On the other hand, the feasible classification I 6 = {1}, with z 1 ¼ 1.5 and I = {2} does not fulfill the assumption in Theorem 3.3. In this case, subproblem C reduces to minimize 2jx 1 1 2 j subject to x 2 S, which has minima at x 1 ¼ 1 2 and x 2 2 [0, 1]. However, actually none of these solutions is weakly Pareto optimal (as the theorem says). Based on Theorem 3.3 (and demonstrated by the example) we can say that if we want to obey aspiration levels in class I 6, we must take the risk of getting solutions that are not even weakly Pareto optimal. In other words, obeying aspiration levels and getting weakly Pareto optimal solutions conflict with each other. What all the three subproblems have in common is that objective functions in classes I < and I 6 are not allowed to increase in value. This means that the constraints in all the subproblems are the same. We can say that each of these three subproblems tries to take the classification information into account as well as possible but they each do it in a slightly different way. As far as the optimality of their solutions is concerned, the weak Pareto optimality of (3.1) requires the class I < to be nonempty. On the other hand, the solution of (3.3) is weakly Pareto optimal only, if the class I 6 is empty. These facts are consequences of the maximum and absolute value terms, respectively. The idea of these terms is to take into account the special nature of the second class, in other words, stop minimizing the objective function whenever the aspiration level is reached. Because we cannot guarantee subproblems A and C to produce Pareto optimal solutions, we need to solve an additional problem for that purpose. Let x c be the solution of subproblem A or C. Then we solve the problem maximize X k i¼1 l i subject to f i ðxþþl i 6 f i ðx c Þ; i ¼ 1;...; k; l i P 0; i ¼ 1;...; k; x 2 S ð3:4þ

K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 937 with n + k variables x 2 R n and l i 2 R for i =1,..., k. The solution of this additional problem is guaranteed to be Pareto optimal (see e.g., [9, pp. 34 35]). Because we assume that less is preferred to more in the mind of the decision maker, the solution of problem (3.4) is better than f(x c ). 4. Numerical comparison In this section, we briefly describe the test problems used in our studies for comparing our three subproblems. After that, we present the experiments carried out to test the differences among the subproblems and the related scalarizing functions as well as report the results of the experiments. To be more specific, we compare both the computational costs related to the subproblems as well as how the solutions of the subproblems manage to satisfy the decision makers in terms of how well the solutions follow the preference information specified. 4.1. Test examples Numerical experiments were carried out to test the performance of the three scalarizing functions and the subproblems. A number of 12 test problems of different types were selected from the literature. The subproblems were solved by a proximal bundle method [8], which is an efficient local optimization method for nonsmooth problems. The ideal objective vector values were calculated using a real-coded genetic algorithm with the method of parameter free penalties for constraint handling [16]. All the test runs were performed on an AMD Athlon MP 1900+, 1600 MHz dual processor SMP system. A summary of the 12 test problems used is given in Table 1. In the table, after the number of the problem, the number of variables is denoted by n and the number of objective functions by k. The next columns indicate the numbers of linear constraints lc and nonlinear constraints nc, respectively. The problems are classified to be of a linear (lin), quadratic (quad), nonlinear (nonl) or nonsmooth (nons) type. The problem is regarded as linear if all the functions involved are linear and quadratic if at least one of the objective functions is quadratic. In the same way, problems are classified as nonlinear or nonsmooth if at least one of the functions is nonlinear or nonsmooth, respectively. The next column specifies whether the problem is convex (conv) or not (nonc). Finally, the reference of the problem is given. Details of the test problems are given in Appendix A. Table 1 Summary of test problems Prob n k lc nc Type Convexity Ref. 1 3 3 0 1 nonl nonc [19] 2 2 5 0 0 nons nonc [12] 3 2 3 0 0 nonl nonc [2,5] 4 2 3 1 0 quad conv [2] 5 3 5 1 1 nonl nonc [4] 6 2 7 0 0 nons nonc [11] 7 3 3 0 1 quad conv [18] 8 3 6 0 1 nonl nonc [6] 9 3 3 3 0 lin conv [6] 10 10 3 3 5 nonl nonc [20] 11 2 3 0 1 nonl conv [3] 12 2 5 0 0 nonl nonc [10]

938 K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 4.2. Experiments In order to compare the computational performances of the three different subproblems, we carried out two experiments. Six test persons participated in these experiments and all of them knew the basics of multiobjective optimization. When solving multiobjective optimization problems, it is up to the decision maker which of the different (Pareto optimal) solutions is selected as the final one. In order to justify the comparison of the final solutions obtained by the different test persons, we gave them some pre-defined preference information. To be more specific, test persons were given desirable ranges for every objective function and they were also told that the functions were in the order of importance, that is, the first objective function was the most important in each problem, etc. In addition, they obtained information about the ranges of the objective functions in the Pareto optimal set in the form of an ideal objective vector and an approximated nadir objective vector. In the first experiment, test persons were supposed to solve six multiobjective optimization problems (Problems 1 6 in Table 1) by specifying their preferences in the form of classifications. The solution process was started from a so-called neutral compromise solution [15,24], which is Pareto optimal. Test persons solved every test problem three times so that in each of the three solution processes, a different subproblem was used. Actually, test persons did not seem to notice the fact that they solved the same problems more than once because the identical problems did not follow each other; but every sixth problem was always the same. After each classification, test persons were shown the solution of the current subproblem besides the previous solution (i.e. the starting point of the classification) and they had to choose either of them as the starting point of the next classification or as the final solution. Test persons were supposed to make as many classifications as needed to find a solution within the desirable ranges or one that was close enough, in their own minds. They were given enough time to familiarize themselves with the problems and when they learned about the possibilities and the conflicting nature of the objective functions, they were able to decide what could be a satisfactory solution. In other words, test persons stopped solution processes when they thought they could not improve the solution any more in spite of the fact that all the objective function values were not necessarily between the corresponding pre-defined desirable ranges. After every solution process, test persons were asked how well the solution process proceeded when compared to their personal hopes and desires and how satisfied they were with the final solution. Test persons were asked to evaluate both the process and the satisfaction on a scale from 1 to 5, where 1 represented a very poor and 5 a very good grade. The time used for the total solution process per problem was recorded as well as the number of classifications used. In addition, the number of function evaluations used while solving the subproblem formed after each classification was recorded. In the second experiment, test persons had to again solve six problems (Problems 7 12 in Table 1) by specifying their preferences using classifications starting from the neutral compromise solution, but this time all the three subproblems were solved after each classification and the solutions of them were shown to test persons at the same time. In other words, test persons could get three new solutions after each classification if the solutions were different from each other. The solutions were considered to be the same if the values of all the corresponding components differed less than 1.0 10 3. Test persons were supposed to choose the best of the solutions available just by subjective preferences as the starting point of the next classification or as the final solution. After each solution process, the number of classifications used as well as the number of function evaluations per each subproblem were recorded. Afterwards, we could also find out which subproblem produced the solutions test persons selected. Then we could count the scores of each subproblem in the following way: The subproblem that produced the selected solution was given one point and the total score

of points for each subproblem in every problem was calculated as the sum of these points. The order of showing the solutions produced by different subproblems to the test persons varied. Therefore, we could avoid the possibility that test persons would have, for example, always selected the first new solution automatically and, thus, the first subproblem would have become the most popular just because it happened to be the first one. 4.3. Results K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 939 Table 2 summarizes the results of the first experiment. In Table 2, we can see the comparison among the three subproblems when considering the average values for the time used per problem, the number of classifications per problem, the time used per classification as well as the personal opinions, that is, the satisfaction to the solution process and the goodness of the final solution. In the first experiment, the average times per problem and per classification as well as the average numbers of classifications did not differ remarkably among the subproblems, as can be seen in Table 2. Similarly, there was no remarkable difference considering the subjective opinions concerning the satisfaction to the solution process and the goodness of the final solution. We can say that in these respects, the subproblems worked quite identically. What is important, all the subproblems were able to find solutions that satisfied the test persons. In other words, test persons considered the solutions produced by the three subproblems almost equally good, even though they were mathematically at least slightly different from each other. If we wish to draw some conclusions, subproblem A was slightly preferred to the others. Thus far, we can say that we found no practical differences between the subproblems. However, when we consider the number of function evaluations needed for solving them, the results between the subproblems differ really significantly (see Table 3). Table 3 presents the average numbers of function evaluations per problem and per classification for each subproblem. The numbers in Table 3 were computed as follows: First, the number of function evaluations per problem for each test person was calculated. Then, the results per problem were averaged. The number of function evaluations per classification was calculated by dividing the number of function evaluations per problem by the number of classifications used. We can see that subproblem B required considerably less function evaluations than subproblems A and C. As a matter of fact, subproblems A and C needed roughly speaking three to four times more function evaluations than subproblem B. The numbers of function evaluations among the subproblems are comparable because all the solutions were Pareto optimal (the numbers include the function evaluations used to Table 2 Average performance times and subjective opinions of the test persons in the first experiment A B C Time per problem 4 minutes 18 seconds 4 minutes 35 seconds 4 minutes 53 seconds Number of classifications per problem 4.0 4.2 4.4 Time per classification 1 minute 10 seconds 1 minute 7 seconds 1 minute 8 seconds Satisfaction to the solution process 4.0 3.8 3.8 Goodness of the final solution 4.1 3.8 4.0 Table 3 Average numbers of function evaluations in the first experiment A B C Number of function evaluations per problem 222.6 60.6 292.7 Number of function evaluations per classification 77.3 24.4 78.0

940 K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 check the Pareto optimality of solutions produced by subproblems A and C). By combining the summaries in Tables 2 and 3, we can say that the numbers of function evaluations among the subproblems differed significantly but the performance times, the numbers of classifications as well as the subjective opinions of the test persons were close to each other. We may wonder why the longer computational time for subproblems A and C was not reflected in the satisfaction grades in Table 2. One reason for this might be that solving the problems was fast because the problems were relatively small and, therefore, test persons did not feel frustrated while waiting for a new solution. As far as the second experiment is concerned, we could not compare subproblems as in the first experiment because all the subproblems were solved for every test problem. However, the number of function evaluations was comparable among the subproblems also in this experiment. In addition, we could compare the number of times test persons selected solutions produced by different subproblems, that is, their scores (as described earlier). The results obtained in the second experiment are consistent with the results of the first experiment when considering the numbers of function evaluations. Table 4 summarizes the average numbers of function evaluations per problem and per classification as well as the scores of subproblems. The average numbers of function evaluations were calculated similarly as in the first experiment. Again, subproblems A and C used remarkably (about three or four times) more function evaluations than subproblem B. Because of its highly nonconvex nature, Problem 12 had to be solved using a global solver (i.e., genetic algorithm) and for this reason the solution process required substantially more function evaluations than the other problems. That is why the average numbers of function evaluations are considerably larger than in the first experiment. However, the same trend between subproblems as in the first experiment is still valid. The last row in Table 4 refers to the choices of the test persons. As previously mentioned, test persons were shown up to three new solutions after each classification and they were supposed to choose one of them to stop or to continue with. The subproblem that had produced the selected solution was given one point and the sum of the points can be seen in Table 4. Sometimes, two or three of the subproblems produced exactly the same solution (within the tolerance 1.0 10 3 ) after the classification (and then, only one or two new solutions were shown to test persons). Then each of the subproblems that produced the solution selected was given a point. This happened in 15% of all the classifications. If some of the shown solutions were so similar that test persons were not able to tell which of them was the best one, all such subproblems were also given one point regardless of the choices that test persons made. This was the case with 7% of the solutions. Table 4 indicates that the scores of subproblems B and C are clearly larger than the score of subproblem A. As mentioned in Section 3, in subproblems A and C, we can only guarantee the weak Pareto optimality of the solutions obtained (under some additional assumptions) and in the experiments we solved an additional problem to guarantee Pareto optimality. Based on the results concerning the numbers of function evaluations, we wanted to study how many function evaluations were needed to check the Pareto optimality of the solutions of subproblems A and C. We studied a half of the problems used in the experiments. In those randomly selected cases checking the Pareto optimality needed a significant amount of function evaluations when compared to the number of function evaluations in total. The average number of function Table 4 Average numbers of function evaluations and the scores of subproblems in the second experiment A B C Number of function evaluations per problem 50,930.2 16,176.7 62,120.7 Number of function evaluations per classification 6376.4 1804.1 6223.3 Scores of subproblems 42 49 53

K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 941 evaluations in total were 88.7 for subproblem A, 27.2 for B and 88.5 for C. The checking of Pareto optimality required on the average 47.7 function evaluations for subproblem A and 52.5 function evaluations for C. Let us mention that in the set of problems tested, the checking of Pareto optimality needed a considerable amount of function evaluations also in the cases when the solutions did not necessarily change, that is, even though the solution already happened to be Pareto optimal. We must emphasize that when the additional subproblem was not used, subproblem A used on the average 41.0 and C 36.0 function evaluations to produce a solution that was only weakly Pareto optimal whereas B used as few as 27.2 function evaluations to produce a solution that was guaranteed to be Pareto optimal. Thus, if we had not checked Pareto optimality, subproblem B with an augmentation coefficient would still have been more efficient when considering computational costs. In other words, the fact that subproblems A and C involved solving an additional subproblem does not completely explain the higher computational costs but there was also some other reason to explain why subproblem B was more efficient than the two. This reason could be the scaling used in subproblem B. When considering the second experiment, the results clearly favour subproblem B because of its superior computational efficiency and the fact that its scores in Table 4 are good. In all, based on the two experiments we can say that using augmentation terms as parts of scalarizing functions can strongly be supported because the user satisfaction was comparable to others but computational efficiency was significantly higher. Let us finally study how different solutions the three subproblems produced irrespectively of the preferences of the individual test persons. For each test problem, a classification was made according to the pre-defined preference information and the three subproblems were solved. For each objective function, standardized scores (i.e., the difference of the objective function value and the average of the objective function values divided by the standard deviation) were calculated. Then, these scores were averaged over each subproblem and these averaged values are displayed in Fig. 1 for the 12 test problems. A negative value indicates that the solution was better (i.e., had lower objective function values) than the average and vice a versa. The average standardized scores over all the test problems were 0.079, 0.011 and 0.091 for Fig. 1. Standardized scores for 12 test problems.

942 K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 subproblems A, B and C, respectively. This means that the solutions generated using subproblem A were a little bit better than when using the other subproblems on the average. This kind of a trend can also be seen in the figure. On the other hand, subproblem C had an opposite trend whereas the performance of subproblem B was between these two. However, the overall differences were not that significant. 5. Conclusions We have compared three different classification-based scalarizing functions and related subproblems focusing on their computational costs and how well they obey the classification information. Based on the results, we can say that all the subproblems studied produced satisfactory solutions in a reasonable time and the solutions did not differ from each other remarkably. Test persons valued the solutions as well as the solution processes almost equally. Furthermore, the times used per problem and per classification as well as the numbers of classification used per problem were close to each other. Surprisingly outstanding differences were found in the number of function evaluations used for solving the subproblems. In our studies, the additional checking of Pareto optimality used a considerable amount of function evaluations and using augmentation terms in scalarizing functions was clearly the most efficient approach. Using some other additional problem for guaranteeing Pareto optimality could have had some effect in the results but would not have changed the final outcome because subproblem B produced Pareto optimal results with lower computational costs than what subproblems A and C needed for producing solutions that were only weakly Pareto optimal. In the literature, there have been no studies related to the computational costs of guaranteeing Pareto optimality. However, this issue is important because showing only weakly Pareto optimal or non-pareto optimal solutions to the decision maker is usually not justifiable. Typically, solving an additional subproblem or using augmentation terms are regarded as alternatives without any implication about their computational costs. In any case, our experiences show that the difference in computational costs is really significant. Classification-based multiobjective optimization methods are interactive and the significance of the time used in computing is emphasized. Thus, it is important to use efficient approaches. In other words, the decision maker may feel frustrated if (s)he has to wait for a long time to get results. This aspect is even more important with larger and more complicated (i.e., computationally demanding) real-life problems. Acknowledgements This research was partly supported by Tekes, the National Technology Agency of Finland and CoMaS, the Jyväskylä Graduate School in Computing and Mathematical Sciences at the University of Jyväskylä. The authors wish to thank Mr. Vesa Ojalehto for his efforts with the experiments as well as Mr. Kari Nissinen and Mr. Markku Könkkölä for helping with standardized scores and Fig. 1, respectively. Appendix A. Test problems Here we present the 12 test problems used with a citation to the reference where further information can be found. We also give a brief description of the meaning of the functions in the problem, if available. The feasible regions S in the problems consist of simple lower and upper bounds for variables together with linear (or affine) and nonlinear inequality constraints. For clarity, linear constraints are presented in the form

K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 943 h i ðxþ 6 b; i ¼ 1;...; lc; where b 2 R and lc is the number of linear constraints. Furthermore, nonlinear constraints are given in the form g j ðxþ 6 0; j ¼ 1;...; nc; where nc is the number of nonlinear constraints. Problem 1. Sakawa-Mori [19] minimize f 1 ðxþ ¼2x 2 1 þ 2ðx 2 þ 5Þ 2 þðx 3 3Þðx 3 4Þðx 3 5.5Þðx 3 12Þþ50; 000 f 2 ðxþ ¼ðx 1 þ 40Þ 2 þðx 2 224Þ 2 þðx 3 þ 40Þ 2 f 3 ðxþ ¼ðx 1 224Þ 2 þðx 2 þ 40Þ 2 þðx 3 þ 40Þ 2 subject to g 1 ðxþ ¼x 2 1 þ x2 2 þ x2 3 100 6 0 0 6 x 1 ; x 2 ; x 3 6 10. Problem 2. River Pollution Problem [12] minimize f 1 ðxþ ¼ 4.07 2.27x 1 f 2 ðxþ ¼ 2.60 0.03x 1 0.02x 2 0.01 0.30 1.39 x 2 1 1.39 x 2 2 f 3 ðxþ ¼ 8.21 þ 0.71 1.09 x 2 1 f 4 ðxþ ¼ 0.96 þ 0.96 1.09 x 2 2 f 5 ðxþ ¼max½jx 1 0.65j; jx 2 0.65jŠ subject to 0.3 6 x 1 ; x 2 6 1.0. The problem was originally presented in [17] and modified in [12], where the fifth (nonsmooth) objective function was included. The problem describes a (hypothetical) pollution problem of a river, where a fishery and a city are polluting water. The decision variables represent the proportional amounts of biochemical oxygen demanding material removed from water in two treatment plants located after the fishery and after the city. Here f 1 and f 2 describe the quality of water after the fishery and after the city, respectively, while f 3 and f 4 represent the percent return on investment at the fishery and the addition to the tax rate in the city, respectively. Finally, f 5 describes the functionality of the treatment plants. Problem 3. Water Resources Planning [2,5] minimize f 1 ðxþ ¼e 0.01x 1 x 0.02 1 x 2 2 f 2 ðxþ ¼0.5x 2 2 f 3 ðxþ ¼ e 0.005x 1 x 0.001 1 x 2 2 subject to 0.01 6 x 1 6 1.3 0.01 6 x 2 6 10. This problem was presented in [2,5] with lower bounds 0.0 and no upper bounds for the variables. The lower bounds are modified here in order to guarantee the existence of gradients for the objective functions. The problem describes a simplified water resources planning problem. A multipurpose dam is to be constructed

944 K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 such that the cost of construction (f 1 ) and the water loss (f 2 ) (volume/year) are minimized, and the total storage capacity (f 3 ) of the reservoir is maximized. The decision variables are the total man-hours devoted to building the dam and the mean radius of the lake impounded (in miles), respectively. Problem 4. Chankong-Haimes [2] minimize f 1 ðxþ ¼ðx 1 1Þ 2 þðx 2 1Þ 2 f 2 ðxþ ¼ðx 1 2Þ 2 þðx 2 3Þ 2 f 3 ðxþ ¼ðx 1 4Þ 2 þðx 2 2Þ 2 subject to h 1 ðxþ ¼x 1 þ 2x 2 6 10 0 6 x 1 6 10 0 6 x 2 6 4. The natural upper bound (i.e. not restrictive) for variable x 1 was here included. Problem 5. Caballero-Rey-Ruiz [4] minimize f 1 ðxþ ¼ðx 1 þ x 2 þ x 3 Þ 5=2 f 2 ðxþ ¼ðx 1 4Þ 4 þðx 2 3Þ 4 þ x 4 3 f 3 ðxþ ¼1=ðx 1 þ x 2 þ x 3 Þ f 4 ðxþ ¼3x 3 1 þðx 2 1Þ 4 þ 2ðx 3 20Þ 4 f 5 ðxþ ¼ðx 1 2Þ 2 lnðx 2 þ x 3 Þ subject to h 1 ðxþ ¼x 1 þ 2x 2 þ x 3 6 15 g 1 ðxþ ¼ðx 1 3Þ 2 þðx 2 3Þ 2 þðx 3 3Þ 2 4 6 0 0.5 6 x 1 ; x 2 ; x 3 6 6. Originally in [4], the problem was presented without bounds for the variables. Problem 6. Seven Hard Functions [11] minimize f 1 ðxþ ¼100ðx 2 x 2 1 Þþð1 x 1Þ 2 f 2 ðxþ ¼max½x 2 1 þðx 2 1Þ 2 þ x 2 1; x 2 1 ðx 2 1Þ 2 þ x 2 þ 1Š f 3 ðxþ ¼max½x 2 1 þ x4 2 ; ð2 x 1Þ 2 þð2 x 2 Þ 2 ; 2e x 1þx 2 Š f 4 ðxþ ¼max½5x 1 þ x 2 ; 5x 1 þ x 2 ; x 2 1 þ x2 2 þ 4x 2Š f 5 ðxþ ¼max½x 2 1 þ x2 2 ; x2 1 þ x2 2 þ 10ð 4x 1 x 2 þ 4Þ; x 2 1 þ x2 2 10ðx 1 þ 2x 2 6ÞŠ f 6 ðxþ ¼max½ x 1 x 2 ; x 1 x 2 þðx 2 1 þ x2 2 1ÞŠ f 7 ðxþ ¼ x 1 þ 20 max½x 2 1 þ x2 2 1; 0Š subject to 100 6 x 1 ; x 2 6 100. This is a collection of standard test functions (Rosenbrock, Crescent, CB2, Dem, QL, LQ, and Mifflin, respectively) in nonsmooth optimization (see, for example, [8]). Originally in [11], the problem was presented without bounds for the variables.

Problem 7. Sakawa [18] minimize f 1 ðxþ ¼x 2 1 þðx 2 þ 5Þ 2 þðx 3 60Þ 2 f 2 ðxþ ¼ðx 1 þ 40Þ 2 þðx 2 224Þ 2 þðx 3 þ 40Þ 2 f 3 ðxþ ¼ðx 1 224Þ 2 þðx 2 þ 40Þ 2 þðx 3 þ 40Þ 2 subject to g 1 ðxþ ¼x 2 1 þ x2 2 þ x2 3 100 6 0 0 6 x 1 ; x 2 ; x 3 6 10. Problem 8. Bow River Valley [6] K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 945 minimize f 1 ðxþ ¼ 4.75 2.27ðx 1 0.3Þ f 2 ðxþ ¼ 2.0 0.524ðx 1 0.3Þ 2.79ðx 2 0.3Þ 0.882ðw 1 0.3Þ 2.65ðw 2 0.3Þ f 3 ðxþ ¼ 5.1 0.177ðx 1 0.3Þ 0.978ðx 2 0.3Þ 0.216ðw 1 0.3Þ 0.768ðw 2 0.3Þ f 4 ðxþ ¼ 7.5 þ 0.708ðu 1 1Þ f 5 ðxþ ¼0.9576ðu 2 1Þ f 6 ðxþ ¼1.125ðu 3 1Þ subject to g 1 ðxþ ¼ 0.0332ðx 1 0.3Þ 0.0186ðx 2 0.3Þ 3.34ðx 3 0.3Þ 0.0204ðw 1 0.3Þ 0.778ðw 2 0.3Þ 2.62ðw 3 0.3Þþ2.5 6 0 0.3 6 x 1 ; x 2 ; x 3 6 1; where 1 u i ¼ ; w 1.09 x 2 i ¼ 0.39 i ¼ 1; 2; 3. i 1.39 x 2 i Originally in [6, pp. 187 196], the first four objectives were to be maximized. Like Problem 2, also this is a water quality management problem describing pollution of an artificial river basin, where a cannery and two cities are polluting water. The decision variables represent, again, the treatment levels of waste discharges at the cannery and the two cities, respectively. The first three objective functions (f 1, f 2 and f 3 )describe the quality of water in the cities and at a park located between them. Then, f 4 represents the percent return on investment at the cannery and f 5 as well as f 6 represent the addition to the tax rate in the cities, respectively. The nonlinear constraint describes the quality of water at the end of the river. Problem 9. Ace Electronic Inc. [6] minimize f 1 ðxþ ¼ 10x 1 30x 2 50x 3 100x 4 f 2 ðxþ ¼ x 1 x 2 f 3 ðxþ ¼ x 1 4x 2 6x 3 2x 4 subject to h 1 ðxþ ¼5x 1 þ 3x 2 þ 2x 3 240 6 0 h 2 ðxþ ¼ 3x 3 þ 8x 4 320 6 0 h 3 ðxþ ¼2x 1 þ 3x 2 þ 4x 3 þ 6x 4 180 6 0 0 6 x 1 ; x 2 ; x 3 6 4000.

946 K. Miettinen et al. / European Journal of Operational Research 175 (2006) 931 947 The natural (i.e. not restrictive) upper bound 4000 for all the variables was here included. In spite of the name, the possible practical meaning of the problem was not presented in [6, p. 214]. Problem 10. Sakawa-Yauchi [20] minimize f 1 ðxþ ¼7x 2 1 x2 2 þ x 1x 2 14x 1 16x 2 þ 8ðx 3 10Þ 2 þ 4ðx 4 5Þ 2 þðx 5 3Þ 2 þ 2ðx 6 1Þ 2 þ 5x 2 7 þ 7ðx 8 11Þ 2 þ 2ðx 9 10Þ 2 þ x 2 10 þ 45 f 2 ðxþ ¼ðx 1 5Þ 2 þ 5ðx 2 12Þ 2 þ 0.5x 4 3 þ 3ðx 4 11Þ 2 þ 0.2x 5 5 þ 7x 2 6 þ 0.1x4 7 4x 6x 7 10x 6 8x 7 þ x 2 8 þ 3ðx 9 5Þ 2 þðx 10 5Þ 2 f 3 ðxþ ¼x 3 1 þðx 2 5Þ 2 þ 3ðx 3 9Þ 2 12x 3 þ 2x 3 4 þ 4x2 5 þðx 6 5Þ 2 þ 6x 2 7 þ 3ðx 7 2Þx 2 8 x 9x 10 þ 4x 3 9 þ 5x 1 8x 1 x 7 subject to h 1 ðxþ ¼4x 1 þ 5x 2 3x 7 þ 9x 8 6 105 h 2 ðxþ ¼10x 1 8x 2 17x 7 þ 2x 8 6 0 h 3 ðxþ ¼ 8x 1 þ 2x 2 þ 5x 9 2x 10 6 12 g 1 ðxþ ¼3ðx 1 2Þ 2 þ 4ðx 2 3Þ 2 þ 2x 2 3 7x 4 þ 2x 5 x 6 x 8 120 6 0 g 2 ðxþ ¼5x 2 1 þ 8x 2 þðx 3 6Þ 2 2x 4 40 6 0 g 3 ðxþ ¼x 2 1 þ 2ðx 2 2Þ 2 2x 1 x 2 þ 14x 5 þ 6x 5 x 6 6 0 g 4 ðxþ ¼0.5ðx 1 8Þ 2 þ 2ðx 2 4Þ 2 þ 3x 2 5 x 5x 8 30 6 0 g 5 ðxþ ¼ 3x 1 þ 6x 2 þ 12ðx 9 8Þ 2 7x 10 6 0 5 6 x i 6 10; i ¼ 1;...; 10. Problem 11. Caballero-Rey-Ruiz 2 [3] minimize f 1 ðxþ ¼50x 4 1 þ 10x4 2 f 2 ðxþ ¼30ðx 1 5Þ 4 þ 100ðx 2 3Þ 4 f 3 ðxþ ¼70ðx 1 2Þ 4 þ 20ðx 2 4Þ 4 subject to g 1 ðxþ ¼ðx 1 2Þ 2 þðx 2 2Þ 2 1 6 0 1 6 x 1 ; x 2 6 3. Originally in [3], the problem was presented without bounds for the variables. Problem 12. Peak Functions [10] minimize f 1 ðxþ ¼/ðx 1 ; x 2 Þ f 2 ðxþ ¼/ðx 1 1.2; x 2 1.5Þ f 3 ðxþ ¼/ðx 1 þ 0.3; x 2 3.0Þ f 4 ðxþ ¼/ðx 1 1.0; x 2 þ 0.5Þ f 5 ðxþ ¼/ðx 1 0.5; x 2 1.7Þ subject to 4.9 6 x 1 6 3.2 3.5 6 x 2 6 6.0;