Permutation distance measures for memetic algorithms with population management

MIC2005: The Sixth Metaheuristics International Conference??-1 Permutation distance measures for memetic algorithms with population management Marc Sevaux Kenneth Sörensen University of Valenciennes, CNRS, UMR 8530, LAMIH-SP Le Mont Houy - Bat Jonas 2, F 59313 Valenciennes cedex 9, France marc.sevaux@univ-valenciennes.fr University of Antwerp, Faculty of Applied Economics Prinsstraat 13, B 2000 Antwerp, Belgium kenneth.sorensen@ua.ac.be 1 Introduction Most researchers agree that the balance between diversification and intensification strategies is one of the determining factors of the quality of a metaheuristic [3]. One possible way of controlling this balance is to apply population management, such as in the memetic algorithm with population management approach (MA PM) [9]. The basic idea of MA PM is to only allow a new solution into the population if it sufficiently contributes to the diversity of the population. Population management strategies can be developed to actively monitor and control this diversity. MA PM requires a distance measure adapted to the representation of a solution of the problem that is being solved. This distance measure should reflect the difference between a given pair of solutions. Although difficult to prove, it seems natural to conjecture that a distance measure that more accurately reflects the distance between two solutions is preferable to one that does not. For problems of which the solutions are most naturally represented as a binary string, the Hamming distance seems to be the only distance measure to use. For problems represented as real-valued vectors, Euclidean, Manhattan, or Chebychev distance metrics can be used. The Mahalanobis distance is scale-invariant and takes the dispersion of the solutions into account. For permutation problems, a myriad of distance measures has been developed, originating in different domains (statistics, computer science, molecular biology,... ). In this paper, we look at nine of them: the exact match distance, the deviation distance (Spearman s footrule), the squared deviation distance (Spearman s rank correlation coefficient), the A distance, the R distance, the edit distance, the longest common subsequence, the reversal distance and Kendall s tau. We determine the effort required to calculate them and develop a normalized version of each distance measure, that produces a value between 0 and 1. Some of the distance

??-2 MIC2005: The Sixth Metaheuristics International Conference measures can be calculated in approximately linear time, whereas others require approximately quadratic time.we then study experimentally which distance measure works best in a simple MA PM for a single-machine scheduling problem. 2 Assumptions and notation In the following section, we discuss several distance measures that are able to calculate a distance between two given strings of integers. We assume that both strings are permutations without repetition of a given set of integers. As a consequence, both strings are assumed to be of equal length. This underuses the potential of some distance measures. The edit distance, e.g., can calculate the distance between any two strings composed of characters of a given alphabet. Without loss of generality, we assume that both strings are permutations of the set {1,..., n}. The i-th character in a permutation P is represented as P (i). 3 Distance metrics 3.1 The exact match distance The exact match distance [7], also called generalized Hamming distance is defined as the number of times the characters in the same positions in both strings are different. { n 0 if S(i) = T (i) d em (S, T ) = x i where x i = 1 otherwise i=1 A normalized version of the exact match distance can be obtained by dividing the unnormalized version by the maximum exact match distance. The exact match distance is maximal if both strings are different in each of the n positions and is therefore equal to n. The normalized exact match distance is thus ˆd em (S, T ) = 1 n d em. 3.2 The deviation distance The deviation of item k between strings S and T is the absolute difference between the position of the item in string S and its position in T, The deviation distance is the sum of the deviations of all characters 1. n d dev (S, T ) = i j where T (j) = S(i) = k k=1 Ronald [7] proves that the maximum deviation distance occurs between two inverted strings. E.g. the distance between S = [1, 2, 3, 4, 5] and T = [5, 4, 3, 2, 1] is maximal. The value of the maximal distance can be calculated as a function of n. When n is even, the maximal distance is equal to n2 2. When n is odd, it is equal to n2 1 2. 1 Unlike [7], we do not divide by n 1.

MIC2005: The Sixth Metaheuristics International Conference??-3 The normalized deviation distance is therefore defined as: 2 ˆd dev (S, T ) = n 2 d dev(s, T ) if n is even 2 n 2 1 d dev(s, T ) if n is odd In statistics, this measure is known as Spearman s footrule. 3.3 The squared deviation distance The squared deviation distance assigns a larger distance when deviations are larger. defined as follows n d sdev (S, T ) = (i j) 2 where T (j) = S(i)k k=1 It is It is easy to show that the maximum squared deviation distance is equal to n3 n 3 ˆd sdev (S, T ) = 3 n 3 n d sdev(s, T ) This distance measure is similar to Spearman s rank correlation coefficient, a metric often used in statistics to calculate the correlation between two rankings. Usually, it is normalized between 1 (strong negative correlation) and 1 (strong positive correlation) [8]. 3.4 The A distance Campos et al. [1] propose the A distance measure, equal to the sum of all absolute differences between the positions of all items in strings S and T. d A (S, T ) = n S(i) T (i). i=1 This distance measure has the (undesirable) property of being dependent on the labeling of the items. Consider e.g. the following: the distance between [1, 2, 3, 4] and [1, 3, 2, 4] is 2 the distance between [2, 1, 4, 3] and [2, 4, 1, 3] is 6 3.5 The R distance For so-called R-type permutation problems, i.e. problems for which the relative position of the elements is more important, Campos et al. [1] propose the R distance measure. This distance

??-4 MIC2005: The Sixth Metaheuristics International Conference measure is equal to the number of times S(i 1) does not immediately follow S(i) in T. More formally, { n 1 1 if j : S(i) = T (j) and S(i 1) = T (j 1) d R (S, T ) = y i where y i = 0 otherwise i=1 The maximal value of this distance measure is n 1. Therefore ˆd R (S, T ) = 1 n 1 d R(S, T ) 3.6 The edit distance The most complex of the distances used in this paper is the edit distance. In contrast to the other distance measures, the edit distance can be extensively parametrized. The edit distance between strings S and T is defined as the total cost of elementary edit operations required to transform S into T. Three different elementary edit operations are defined: (1) insertion, (2) deletion and (3) substitution. Each edit operation can be assigned a different cost. If all costs are equal to 1, the edit distance is equal to the minimum number of edit operations required to transform S into T. Example. Assuming all edit operations to have a cost of 1, the edit distance between S = [5, 2, 6, 4, 3, 1, 8, 7] and T = [6, 5, 3, 2, 1, 4, 8, 7] is d edit (S, T ) = 5 (1) To transform S into T in 5 edit operations, remove 3 and 1 from S, change 6 into 1 and then add 6 and 3 in the appropriate locations. 3.7 The longest common subsequence distance A subsequence of a string is an subset of characters that appear in the same order in the string. Example. [3, 5, 7] is a subsequence of [1, 2, 3, 4, 5, 6, 7] The length of the longest subsequence that is shared by two strings is a measure of the similarity between these two strings. The maximum length of the longest common subsequence is equal to n. The minimum length is one. We define the longest common subsequence distance as n minus the length of the longest common subsequence. The longest common subsequence distance lies between 0 and n 1. The longest common subsequence can be calculated in O(n 2 ) by a simple dynamic programming algorithm [4]. The normalized longest common subsequence distance can be calculated by dividing by n 1. ˆd lcs = d lcs n 1 A similar distance metric could be defined by using the longest common substring (in which all characters need to be adjacent).

MIC2005: The Sixth Metaheuristics International Conference??-5 3.8 The reversal distance The reversal distance is defined as the minimum number of reversals required to transform a permutation into another one. A reversal is defined as the operation of reversing a substring of the given string. E.g. 123456789 123876549. Calculation of the reversal distance is very important in molecular biology, but has been shown to be NP -hard [2]. Because of this, we exclude it from further research, although it has some obvious merits, the most important one being the fact that the well-known 2-opt move is nothing else than a substring reversal. 3.9 Kendall s tau The value of this distance metric [5] is given by { n n 1 if S(i) < S(j) and T (i) > T (j) d τ (S, T ) = z ij where z ij = 0 otherwise i=1 j=1 It can be shown that this value is equal to the number of pairwise adjacent permutations required to transform S into T. A normalized version of this distance metric is given by ˆd τ (S, T ) = 2 d τ (S, T ) n 2 n 4 Preliminary experiment 4.1 The total weighted tardiness single machine scheduling problem A set of n jobs has to be sequenced on a single machine. Preemption is not allowed. Jobs are not available before a release date r j and are processed for p j units of time. For each job, a due date d j and a weight w j is given. If C j denotes the completion time of job j, the tardiness is defined by T j = max(0, C j d j ). The objective is to minimize the total weighted tardiness (1 r j w j T j ). This problem is N P-Hard in a strong sense. In [9], we develop an MA PM for this problem and show that it outperforms a similar memetic algorithm without population management. 4.2 An MA PM for the total weighted tardiness problem The basic idea of MA PM is to not allow any new solutions into the population unless they sufficiently contribute to the diversity. This contribution is measured as the minimal distance of the candidate solution to the population, i.e. to any solution in the population: d P (x c ) = d(x c, x i ) x i P. A candidate solution is allowed into the population if it satisfies the diversity criterion, i.e. if its distance to the population is greater than or equal to the diversity parameter. The value of can be used in a population management strategy to control the diversity of the population. A large value of will increase the diversity, a small value will decrease it.

??-6 MIC2005: The Sixth Metaheuristics International Conference Table 1: Summary of distance metrics Complexity Avg. it. Exact match O(n) 6719.3 A O(n) 3924.4 R O(n) 2617.9 Deviation O(n 2 ) 437.9 Squared deviation O(n 2 ) 232.5 Edit O(n 2 ) 362.7 Longest common subsequence O(n 2 ) 418.8 Kendall s tau O(n 2 ) 329.3 Solutions are selected from the population using biased random selection [6]. After crossover using a linear ordered crossover (LOX) operator, they are subjected to local search using adjacent pairwise interchange (API) steepest descent. Solutions that do not satisfy the diversity criterion are mutated using a general pairwise interchange (GPI) mutation operator until their distance to the population is greater than. A new solution replaces a solution randomly chosen from the worst half of the population. The MA PM uses a simple sawtooth population management strategy, alternating periods of intensification with periods of diversification. Initially, is equal to 0.1. Each 3 generations, is multiplied by 1.05. When it reaches the maximal value of 0.5, it is reset to its initial value of 0.1. 4.3 Results The MA PM with the eight different distance measures was tested on twenty 100-job instances of the OR library. For each distance measure, the MA PM was given a maximum time of 5 minutes to find a good solution. Table 1 shows each of the eight distance metrics used and their approximate complexity. We have divided the distance metrics in two classes: those that can be calculated in (approximately) linear time and those that require (approximately) quadratic time. The final column shows the average number of iterations performed by the MA PM in the given 5 minutes. Figure 1 shows a summary of the results for the different distance measures. For each problem set, we record the objective function value f(x ) of the best solution found using any distance measure. We then calculate how much worse the other distance measures perform by calculating for each distance measure the value of f(x i) f(x ) f(x ), where x i is the best solution found using distance measure i. Although the results are very preliminary, they seem to show that the linear distance measures outperform the quadratic distance measures. This can be explained by the fact that the local search operator is very fast, but not very powerful. As a result, population management using the quadratic distance measures takes up a relatively large portion of the time. The number of iterations performed when using the linear distance measures is therefore dramatically higher than when using the quadratic distance measures. This increased number of iterations cannot be compensated by any potential gain achieved by using a more accurate distance measure. This raises the interesting questions whether these results can be generalized, or whether an MA PM that uses a more intensive local search requires a better

å QP {z ÏÎ '& ON yx %$ ÍÌ ä æ )( ÑÐ SR ML ËÊ wv ã #" KJ ut ŸžÉÈ â! IH ÇÆ ásr GF kj ¾ Ý šåä à qp ED ih½¼ onãâ ß CB A@?> =< cb WV ;: Ö ˆ Š ed Ú 98 76 54 ]\ 32 10 YX /. -, * UT ÁÀ ml Þ Õ [Z Ü _^ ³²»º Û gf a` µ Ò~ } œ Ž ¹ Ù Œ Ø ± Ô ƒ Ó «ª MIC2005: The Sixth Metaheuristics International Conference??-7 Fraction over best-found 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Edit distance A distance R distance Exact match distance Deviation distance Squared deviation distance Longest common subsequence Kendall s tau 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Data file Figure 1: Comparison of the MA PM with different distance measures distance measure. Another issue we have not touched upon in this paper is the interplay between the distance measure used and the population management strategy. References [1] V. Campos, M. Laguna, and R. Martí. Context-independent scatter search and tabu search for permutation problems. INFORMS Journal on Computing, 17:111 122, 2005. [2] A. Caprara. Sorting permutations by reversals and eulerian cycle decompositions. SIAM Journal on Discrete Mathematics, 12(1):91 110, 1999. [3] A. Hertz and M. Widmer. Guidelines for the use of meta-heuristics in combinatorial optimization. European Journal of Operational Research, 151(2):247 252, 2003. [4] D. Hirschberg. A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18:341 343, 1975. [5] M. Kendall and J. Dickinson Gibbons. Rank Correlation Methods. Oxford University Press, New York, 1990. [6] C.R. Reeves. A genetic algorithm for flowshop sequencing. Computers and Operations Research, 22:5 13, 1995. [7] S. Ronald. More distance functions for order-based encodings. In Proceedings of the IEEE Conference on Evolutionary Computation, pages 558 563, IEEE Press, New York, 1998.

??-8 MIC2005: The Sixth Metaheuristics International Conference [8] S. Siegel and N. J. Castellan. Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, 1988. [9] K. Sörensen and M. Sevaux. MA PM: memetic algorithms with population management. Computers and Operations Research, To appear, 2004.