Interaction-Detection Metric with Differential Mutual Complement for Dependency Structure Matrix Genetic Algorithm

Size: px

Start display at page:

Download "Interaction-Detection Metric with Differential Mutual Complement for Dependency Structure Matrix Genetic Algorithm"

Juliet Crawford
5 years ago
Views:

1 Interaction-Detection Metric with Differential Mutual Complement for Dependency Structure Matrix Genetic Algorithm Kai-Chun Fan Jui-Ting Lee Tian-Li Yu Tsung-Yu Ho TEIL Technical Report No April, 2010 Taiwan Evolutionary Intelligence Laboratory (TEIL) Department of Electrical Engineering National Taiwan University No.1, Sec. 4, Roosevelt Rd., Taipei, Taiwan

2 Interaction-Detection Metric with Differential Mutual Complement for Dependency Structure Matrix Genetic Algorithm Kai-Chun Fan, Jui-Ting Lee, Tian-Li Yu, Tsung-Yu Ho National Taiwan University Abstract Dependency structure matrix genetic algorithm (DSMGA), one of estimation of distribution algorithms (EDAs), adopts model-building mechanisms via dependency structure matrix clustering techniques. Previous researches have shown that DSMGA can effectively solve nearly decomposable problems. DSMGA utilizes an entropy-based metric to detect the interactions among genes. The efficiency of DSMGA and other model-building GAs greatly depend on their interaction-detection metrics. This paper investigates several commonly used metrics, and proposes a new interaction-detection metric which aims at what GAs really need. The proposed metric, namely the differential mutual complement, is based on both the disruption and reproduction effects of the crossover operator on significant schemata. Empirical results show that DSMGA with the proposed metric performs better than the other existing metrics on the aspects of population size and sensitivity to threshold. This new metric is shown to perform well on DSMGA and could be expected to work with other EDAs. 1 Introduction Genetic algorithms (GAs) (Holland, 1975; Goldberg, 1989) are stochastic search and optimization methods which require only the quality information of solution candidates. The simple GA (SGA) is a specific class of GAs and has been widely used in many different fields for optimization. However, SGA does not involve learning problem structures and can be deceived. The trap function (Goldberg, 1987) is an example where SGA failed. It has been shown that linkage learning (Goldberg, Korb, & Deb, 1989; Harik, 1996) is essential for GA success on problems of strong linkages. Linkage learning algorithms can identify building blocks (BBs) among genes, where BBs are roughly as groups of interacting genes. The more discussions of BBs can be referred in (Holland, 1975; Goldberg, 1989; Goldberg, 2002). Since the importance of BBs has been addressed, many different linkage-learning techniques have been developed including uni-metric and multi-metric methods. Uni-metric methods utilize merely fitness to serve as the metric to detect interactions and solve problems. The most typical example is the linkage learning genetic algorithm (LLGA) (Harik, 1997). However, till now, LLGA can only solves problems with fewer than 80 BBs. The limited success of LLGA raises the questions that whether there exists intrinsic limitation for uni-metric linkage learning. The answer to that question is yet unknown. Due to the difficulty of developing uni-metric methods, many GAs utilizes another metric for linkage learning. One fast-growing development that falls into this category is the model-building GAs, or estimation of distribution algorithms (EDAs). Because of their powerful problem-solving abilities, many EDAs have been developed, such as ecga (Harik, 1999), BMDA (Pelikan & Mühlenbein, 1999), EBNA (Etxeberria & Larra naga, 1999), BOA (Pelikan, Goldberg, & Cantu-Paz, 1999), hboa (Pelikan & Goldberg, 2001), and D 5 (Tsuji, Munetomo, & Akama, 2004). EDAs utilize interaction-detection metrics, the metrics to detect the interactions among genes, to construct a model and recognize BBs correctly within problems. 1

3 These common interaction-detection metrics are nonlinearity (Munetomo & Goldberg, 1999), simultaneity (Aporntewan & Chongstitvatana, 2003), and entropy-based (Shannon, 1948). This paper adopts dependency structure matrix genetic algorithm (DSMGA) (Yu, Goldberg, Sastry, Lima, & Pelikan, 2009), one of EDAs, to discuss interaction-detection metrics for two reasons. Firstly, most EDAs only adopt the designated metric that can not be replaced with another metric. On the contrary, DSMGA is easier to adopt different interaction-detection metrics. Hence, DSMGA is useful to compare different metrics at the same time. Secondly, DSMGA adds the concept of threshold onto these existing metrics. The threshold in DSMGA is defined to determined whether genes are interacting with each other. A pair of genes is considered as interacting if and only if the calculated metric is greater than the threshold. If an appropriate threshold is decided, DSMGA could decompose problems into sub-problems efficiently. Even though the threshold greatly influences the performance of DSMGA, its value is problem-specific and difficult to be calculated. Hence, to improve the efficiency of DSMGA, the purpose of this paper is to construct an new interactiondetection metric that is less sensitive on threshold and is expected to be closer to what GA really needs. The proposed metric, namely the differential mutual complement (DMC), is to detect interaction among genes and build an linkage-learning model for DSMGA. The idea is to keep good schemata from being disrupted by the crossover operators and to ensure the growth of good schemata. This paper discusses the proposed metric with other common interaction-detection metrics, nonlinearity, simultaneity, and entropybased. However, nonlinearity is not experimented due to its poor performance in many cases. The reason is detailed in Section Hence, the paper mainly discusses DMC with the simultaneity and entropy-based metrics. Among the above these metrics, entropy is most commonly used due to the satisfactory performance that it provides. However, entropy is originally utilized to represents an absolute limit on the best possible lossless compression of any communication, and has less meanings in GAs. Therefore, there exists a question if entropy-based metric is what GAs really need. DMC, the new metric proposed by this paper, is more meaningful for GAs than entropy, because DMC is based on both the disruption and reproduction effects of the crossover operator on significant schemata. This paper shows the comparison of DMC and mutual information (Cover & Thomas, 1991) metrics, where mutual information is the loss of entropy in the case of two genes, on the aspects of population size and sensitivity to threshold. The rest of this paper is structured as follows. Section 2 introduces the DSMGA construction, and shows the mechanism of clustering problems. Section 3 gives the idea of the new interaction-detection metric. Section 4 uses empirical results to show the advantage of proposed metric. Section 5 is the conclusion of this paper. 2 Background This section discusses some related works about DSMGA. These are the introduction of DSMGA, common interaction-detection metrics, and the importance of threshold. Section 2.1 discusses DSM clustering briefly, and more detail discussions can be referred in (Yu, Goldberg, Sastry, Lima, & Pelikan, 2009). Section 2.2 lists the three common metrics including nonlinearity, simultaneity, and entropy-based. Finally, choosing appropriate threshold plays a role on DSMGA, and the difficulty depends on the adopted metrics. This issue is discussed on Section Introduction to DSMGA DSMGA uses the dependency structure matrix (DSM) (McCord & Eppinger, 1993; Pimmler & Eppinger, 1994) techniques to decompose problems. A DSM is a matrix that contains the information of pairwise interaction between every pair of components in a system. The objective of DSM clustering is to transfer the pairwise interaction information into higher-order interaction information. Figure 1 is an example of DSM. The sign X in the first row and third column means the interaction between node A and node C. The diagonal entries have no meaning because of unnecessary relation with node itself. DSM clustering is to reorder the nodes from A to F in Figure 1, and the reordering result is shown in Figure 2. it is clearly that DSM can cluster problem into three parts: {B, D, G}, {A, C, E, H, F}, and {F}, and the result is one of clusters with decomposed groups. To implement DSM clustering, DSMGA uses the 2

with components itself. The example shows the matrix presentation before clustering. Figure 2: The example of DSM clustering after reordering.

4 Figure 1: An example of DSM. The signs of X show the dependence between two components, such A and C; the blank block means the independence of the two components, such as A and D; the diagonal black blocks shows meaningless with components itself. The example shows the matrix presentation before clustering. Figure 2: The example of DSM clustering after reordering. These eight components from A to H will be clustered into three groups. The component F is fully independent with other components. minimal description length (MDL) principle (Barron, Rissenen, & Yu, 1998; Lutz, 2002; Rissanen, 1978; Rissanen, 1998) to decide suitable clustering. The MDL-based clustering objective function is written as f DSM (M) = (n M log n c + log n c nm i=1 M i) + ( S 1 + S 2 ) (2 log n c + 1). (1) In the above equation, f DSM (M) means the description length of specific model M given by DSM clustering; n M is the total number of modules; n c is the total number of components; M i is the number of components within the i-th modules; S 1 and S 2 are two mismatch sets which is incorrect clustering on DSM. Briefly speaking, MDL-based clustering objective function is given by two issues of model description length and mismatched data description. 2.2 Interaction-Detection Metrics Interaction-detection metrics are essential for constructing DSMs. Many different metrics have been used for interaction detection in GAs. This section shows three commonly used metrics nonlinearity, simultaneity, and entropy Nonlinearity Nonlinearity is adopted by LICN and LIMD (Munetomo & Goldberg, 1999). When focusing on only two genes x i and x j, as in the case of DSMGA, the detection metric can be written as f xi =0,x j =0 + f xi =1,x j =1 f xi =1,x j =0 f xi =0,x j =1. (2) The basic idea is that if x i and x j do not interact with each other, the fitness difference from x i = 0 to x i = 1 should not be affected by the value of x j. However, this paper would not experiment with nonlinearity metric due to its poor performance. Figure 3(b) shows that nonlinearity metric has most difference with other metrics, because nonlinearity emphasizes the disruptions more than needed when the proportion of the correct schema is much greater than the other three schemata. To show how this can happen, define the fitness as a power-law of the OneMax problem, f(x) = ( i x i) t, where x i is binary and t is an parameter of order. The problem can be solved by a bit-wise hill climber. However, the nonlinearity metric detects strong interactions among all genes when k is large. Nonlinearity is far from what GAs really need and also depends on the selection scheme. Hence this paper stops further investigating this metric. 3

5 2.2.2 Simultaneity Simultaneity is adopted in the work of Aporntewan and Chongstitvatana (Aporntewan & Chongstitvatana, 2003). The idea is to consider the schema disruptions when 00 crosses with 11 or 10 crosses with 01. The detection metric can be expressed as P xi =0,x j =0P xi =1,x j =1 + P xi =0,x j =1P xi =1,x j =0, (3) where P is the proportion of the event in the current population. For simplicity, P xi =a,x j =b is abbreviated as P ab. Therefore Equation (3) can be written as P 00 P 11 + P 01 P 10, (4) where the form is similar to DMC discussed in Section 3. However, simultaneity and DMC have completely different meanings in GAs. The empirical results in Section 4 shows that the simultaneity is not suitable for DSMGA Entropy-based Entropy-based (Shannon, 1948) metics are most commonly used for interaction-detection in GAs. ecga (Harik, 1999), BMDA (Pelikan & Mühlenbein, 1999), EBNA (Etxeberria & Larra naga, 1999), and BOA (Pelikan, Goldberg, & Cantu-Paz, 1999) are some typical examples. The idea is to measure the certainty of x j given the information of x i. The metric can be written as x i,x j p xi,x j log p x i,x j p xi p xj. (5) In the case of two genes, the loss in entropy is mutual information. Hence we can use mutual information metric to detect interaction among genes. Mutual information is defined as the kullback-leibler distance (Kullback & Leibler, 1951) between the joint distribution and the product distribution. The form of mutual information shows as I(X; Y ) = D( p(x, y) p(x)p(y) ) = p(x, y) p(x, y) log p(x)p(y), (6) x y where D is the Kullback-Leibler distance, X and Y are two random variables, and x and y are the outcomes of these two random variables, respectively. From the definition of mutual information in Equation (6), the range of mutual information is between 0 and 1. The value 1 means that the pairs are completely dependent; the value 0 means that the pairs are completely independent. Note that if X and Y are independent, p(x, y) = p(x)p(y), and hence I(X; Y ) = 0. More detailed discussions can be referred to (Cover & Thomas, 1991; Yu, Goldberg, Sastry, Lima, & Pelikan, 2009). The previous research (Yu, Goldberg, Sastry, Lima, & Pelikan, 2009) adopts entropy-based metric to solve problems efficiently. However, there is another question that whether there exists a more meaningful metric for GA, and it will be discussed in Section Interaction-detection Threshold DSMGA utilizes mutual information metric to detect problems interaction and aims to divide problems into separated parts with an appropriate threshold. Because the problems is hardly to be decomposed without any given threshold, the appropriate threshold is needed for DSMGA to decompose problems properly. To alleviate computational burden, after the sampled mutual information is calculated, it is transferred into the binary domain where the metric indicates whether or not the pair of genes interact with each other. Once an appropriate threshold is calculated, a pair of genes is considered as interacting with each other if and only if the calculated mutual information is greater than the threshold. The appropriate threshold can be decided by investigating the distribution of mutual information. However, the appropriate threshold is difficult to be evaluated due to its problem-specific property. The appropriate threshold will change on different problems. The method of finding threshold is usually to 4

Figure 3: The outcomes of crossover when two genes are crossed. Take 11 as an example. 11 is disrupted when crossed only with its complement 00; it is not disrupted when crossed with 01, 10, and 11.

6 Figure 3: The outcomes of crossover when two genes are crossed. Take 11 as an example. 11 is disrupted when crossed only with its complement 00; it is not disrupted when crossed with 01, 10, and 11. On the other hand more 11 is reproduce when 01 is crossed with its complement 10. guess its values and try if it is appropriate for given problems. Hence, the purpose of this paper is to construct a new interaction-detection metric with larger range of appropriate thresholds to be more suitable for solving problems. Because mutual information metric is hard to decide appropriate thresholds and has less meanings for GA, this reason gives us an motivation for new metric discussed in next section. 3 Differential Mutual Complement Although DSMGA with entropy-based metric works well in the previous research, the meaning of entropybased metric in GAs is still unclear. This section proposes DMC as a new metric for pairwise interactiondetection and explains the concept of DMC. 3.1 Mutual: Pairwise Interaction-Detection The computational cost for determining the exact dependency model scales exponentially with problem size, so a proper mechanism is required for building an approximate model. Since it is impractical to explore the whole search space of multivariate interaction, most EDAs construct the interacting models by starting with pairwise interaction between only two variables. It takes O(l k ) to compute all dependencies among k variables in a problem of size l, so it makes sense that most EDAs start with only pairwise interactions with cost limited to O(l 2 ). For example, ecga builds the marginal product models by merging the subsets of variables according to the minimum description length metric, and it needs to detect the pairwise dependency at the very beginning of model building. Therefore, in order to have a reasonable computational cost, DMC detect interaction between two variables mutually. 3.2 Complement: Effects Between Complements The main idea of DMC focuses on how crossover operates on two schemata which both have two alleles. Define H i=x,j=y as the schema where the i-th allele is x and the j-th allele is y, and it is abbreviated as H xy for simplicity. Suppose that the i-th gene and the j-th gene are crossed over during the recombination. Take H 11 as an example. H 11 is disrupted when crossed only with its complement H 00 ; it is not disrupted when crossed with H 01, H 10, and H 11. On the other hand more H 11 is reproduce when H 01 is crossed with its complement H 10. Figure 3 shows all behaviors between four different schemata which have two alleles. 3.3 Differential: Disruption and Reproduction Without loss of generality, we focus on the behavior of crossing the schema H 11 and assume H 11 to be a local optimum. By observing the behavior of crossing the schema H 11, we infer the proportion of H 11 in the population after crossover and write an equation as: P 11 = P P 00P P 01P 10, (7) where P 11 means the proportion of H 11 in the population, and P 11 means the proportion of H 11 in the population after crossover. This equation is deduced by considering crossover operator as uniform crossover with exchange probability 1 2, so the second term, 1 2 P 00P 11, means the disruption in H 11 made by crossover 5

7 with probability 1 2. Likewise, the third term, P 01P 10, refers to the reproduction in H 11 made by crossover with probability 1 2. To combine the disruption and reproduction effects between complements together, Equation (7) is rewritten as: P 11 = P (P 00P 11 P 01 P 10 ). (8) To ensure enough growth of the assumed local optimum H 11, if P 00 P 11 P 10 P 01 is great enough, H 11 has to be protected. Therefore, this paper take P 00 P 11 P 10 P 01 from Equation (8) as a metric, namely DMC, for interaction-detection. This metric considers disruption and reproduction effect of mutual complements, and it is a differential form. The value of P 00 P 11 means the disruption quantity of H 11 (as well as H 00 ); likewise, the value of P 01 P 10 means the reproduction quantity of H 11 (as well as H 00 ). The subtraction of this two terms can be considered as the overall disruption of H 11 made by crossover. A value can be used as a threshold to detect if H 11 will be disrupted severely or not. Therefore, while the value of DMC excess the threshold, DSMGA can bind genes together to a BB and utilize BB-wise crossover. Once the BB is identified, the local optimum H 11 will be protected from being disrupting. In other words, if DSMGA do not protect the potential good schema when DMC metric excess the threshold, the schema will be disrupted severely after crossover. As a result, by using DMC metric with a specific threshold, DSMGA can reserve those potential good schemata from being disrupted. 3.4 Generalization To further generalize DMC metric and not to only assume H 11 as a local optimum (can be H 01 or H 01, too), the protection of H 01 and H 10 are taken into consideration in the similar way. The proportion of H 01 after crossover can be described as: P 01 = P (P 01P 10 P 00 P 11 ). (9) To protect the best schema from disrupting in any case, we modify DMC metric as: { P 00 P 11 P 01 P 10, if P 00 or P 11 is largest; P 01 P 10 P 00 P 11, if P 01 or P 10 is largest. (10) Because the best schema will be affected by the most frequently observed schema (H), each case of H has to be considered. There are three cases of H after selection. In the case of general hill-climbable problems such as OneMax problem, H after selection is the best schema. The problems in this case can be solved even without interaction-detection. In the case of deceptive problem such as trap problem, H after selection is the complement of the best schema. The best schema in this case will be disrupted severely after crossover, so it is necessary to bind H to a BB. In the third case, H after selection is different to the best schema by one bit. For example, while the best schema is H 11, H is H 01 or H 10. The best schema in this case will not be easily disrupted with H as Figure 3 shows. Even H is identified as a BB, the best schema will not be disrupted severely. By considering each case, it makes sense to use P 00 P 11 P 01 P 10 when P 00 or P 11 is the largest to protect H 11 and H 00. Likewise, it makes sense to use P 01 P 10 P 00 P 11 when P 01 or P 10 is the largest to protect H 01 and H Range of DMC Without loss of generality, this paper take P 00 P 11 P 01 P 10 to calculate the range of DMC. The maximum of DMC is trivial to calculate. While P 11 and P 00 are both equal to 0.5, the outcome of DMC is a maximum To gain the minimum of DMC, P 01 P 10 is maximised first, so P 01 is equal to P 10. The condition of P 00 P 11 P 01 P 10 is that P 00 or P 11 is the largest. Define P 00 is the largest. To gain the minimum 6

8 Figure 4: Problem size l = m k, where k = 3 and m = 60, for MI. (θ best, n best ) = (0.017, 1194). Figure 5: Problem size l = m k, where k = 3 and m = 60, for Simultaneity. (θ best, n best ) = (0.1257, 3553). of DMC, P 00 has to be minimised. From above, the equation P 00 = P 01 = P 10 = 1 3 is derived, so the minimum of DMC is 1 9. Therefore, the range of DMC is [ 1 9, 0.25]. After modifying DMC metric, it can be utilized to protect the best schema from disrupting after crossover. Moreover, DMC metric is physically meaningful to GA. How DMC metric works is clear. By far, this paper has proposed a new metric designed for GA, and this metric is practical with GA. The next section shows the comparison of DMC metric with MI metric. 4 Empirical Results This section compares the performance of DSMGA with three interaction-detection metrics: mutual information, simultaneity metric, and DMC. To simplify the expressions, MI, Simultaneity, and DMC in this section mean the cases of DSMGA that adopts mutual information, simultaneity metric, and DMC as interaction-detection metrics. First, we discuss the relations between thresholds and population sizes. Then, we analysis the sensitivity of the metrics. Finally, we discuss the function evaluation in resource usage. We adopt the (m, k)-trap (Goldberg, 1987) as the test problem, where k means the order of traps (problems difficulty), and m means the total number of substructures. The k-bit trap in this paper is given by: { ftrap(u) k 0.9 (1 u = k 1 ), if u < k; (11) 1.0, if u = k, where u is the number of 1 s. For example, the 3-bit trap (k = 3) is given by f 3 trap(u = 0) = 0.9, f 3 trap(u = 1) = 0.45, f 3 trap(u = 2) = 0.0, and f 3 trap(u = 3) = 1.0. The reason for using (m, k)-trap is to emphasize on problem decomposition. If the problem is not properly decomposed, any hill-climbing methods tend to give a wrong solution. Usually, both the appropriate thresholds and sufficient population sizes that can be used to solve a problem properly in polynomial time are unknown and difficult to be estimated. Therefore, we sweep 7

9 Figure 6: Problem size l = m k, where k = 3 and m = 60, for DMC. (θ best, n best ) = (0.038, 1150). the thresholds within the range of MI, Simultaneity, and DMC, where [0, 1] for MI, [0, 0.25] for Simultaneity, and [ 1 9, 0.25] for DMC, and then search the relative population size with bisection method (Rissanen, 1978) for each threshold. The relations between thresholds and population sizes, denoted as (θ, n), mean that for each threshold θ, n is the minimum requirement of population size to solve problems properly in polynomial time. We specially note that the best thresholds are the thresholds that have the smallest relative population sizes, and name such population sizes the best population sizes. The relation of best threshold and best population size is denoted as (θ best, n best ). Figures 4 to 6 show the examples of the relations between thresholds and population sizes for MI, Simultaneity, and DMC with k = 3 and m = 60. The x-axis means the threshold values, and the y-axis means the population sizes. The best threshold based on MI locates at (θ best, n best ) = (0.017, 1194), the best threshold based on Simultaneity locates at (θ best, n best ) = (0.1257, 3553), and the best threshold based on DMC locates at (θ best, n best ) = (0.038, 1150). 4.1 The Sensitivity to Threshold This section discusses the sensitivity of MI, Simultaneity, and DMC. A metric has the less sensitivity here means that if given a new problem with various m and k, we could find the available thresholds based on the metric to solve problems more easily. Where the available thresholds are the appropriate thresholds to solve problems successfully with limited population sizes (will be discussed in detail later.) We consider the sensitivity as the evaluation indicator for a new metric. We may take a guess or use self-adapting methods to find some available thresholds before the metric has been theoretically proven, and there may exist some errors when we use such methods. If the adopted metrics are sensitive, those methods could hardly be used to solve problems because of the difficulty of locating thresholds in the range of available thresholds. On the other hand, if metrics have less sensitivity, we could consider that they have better adaptability to deal with problems. In other words, the larger range of available thresholds is, the less sensitivity is. Therefore, we evaluate MI, Simultaneity, and DMC, by observing their sensitivity. To analysis the issue of sensitivity, setting a limit on population sizes for available thresholds may be necessary. The sizes are limited to be less than a available rate of the best population sizes. The available rate is 110% in common use, but the experiments in this paper cannot show the results explicitly with this rate. Therefore, this paper proposes 150% as the available rate. In Figure 4, for example, the best population size of MI is 1194, so the search space of population sizes will range from 1194 to about 1776 ( %), and the available thresholds range from about to In the case of Simultaneity, we can also find the population sizes range from 3553 to about 5563 and the available thresholds range from about to In the case of DMC, the available thresholds range from about to according to the population sizes which range from 1150 to about To compare the sensitivity between MI, Simultaneity, and DMC more explicitly, we normalize empirical results and combine them together. The range of Simultaneity is normalized from [0.0, 0.25] to [0.0, 1.0] and the range of DMC is normalized from [ 1 9, 0.25] to [0.0, 1.0]. Since MI ranges from 0 to 1 originally, its result data need not to be normalized again. We align the best thresholds for MI and Simultaneity with the best threshold for DMC by translating the threshold values except translating the relative population sizes. Figure 7 shows the result of normalizing Figures 4, 5, and 6 and combines them 8

10 Figure 7: Problem size k=3 and m=60, for MI, Simultaneity, and DMC. The available thresholds based on DMC ranges the largest. Figure 8: Problem size k=4 and m=60, for MI, Simultaneity, and DMC. The available thresholds based on DMC ranges the largest. Note that the thresholds based on Simultaneity to solve this problem have not been found in experiments. Figure 9: Problem size k=5 and m=60, for MI, Simultaneity, and DMC. The available thresholds for DMC ranges the largest. Note that the thresholds based on Simultaneity to solve this problem have not been found in experiments. 9

11 Figure 10: Problem order k = 3, for MI, Simultaneity, and DMC. The range of available thresholds for DMC is the largest. Figure 11: Problem order k = 4, for MI, Simultaneity, and DMC. The range of available thresholds for DMC is the largest. Note that the thresholds based on Simultaneity to solve problems in both the two cases, m = 30 and m = 60, have not been found. together. We compare MI, Simultaneity, and DMC from Figures 7 to 9. Figure 7 shows the difference in the ranges of available thresholds for the three metrics. We could find that DMC ranges much larger than the other metrics. When the problem difficulty increases, the range of available thresholds for MI reduces more rapidly than that for DMC does. Besides, in Figures 8 and 9, the thresholds based on Simultaneity to solve these problems, with k 4, have not been found. Simultaneity performs worse because the idea being designed based on greedy methods is hard to locate an appropriate threshold to solve problems. In other words, when k 4, the range of thresholds that could be used to solve problem may be too small to be found. The results imply that DMC is less sensitive than both MI and Simultaneity to problems with various difficulty. This paper has discussed the sensitivity of metrics with various problem difficulty k and constant m. However, both k and m will affect the available thresholds simultaneously. Furthermore, problems are usually large in real world. Finding the available thresholds without any information may waste of effort. To deal with these kinds of problems, solving simplified problems with similar difficulty is advised. In other words, we could find the available thresholds for probable difficulty with small m first, and then scale back to origin. Hence, the sensitivity to the change of problem size with various m and constant difficulty k also plays a role on evaluating metrics. Again, we normalize the empirical results and combine them together through the similar methods mentioned before. This time we adopt the results of solving problems, with m = 30, with DMC as the base, and then translate the other conditions (DMC m=60, MI m=30, MI m=60, Simultaneity m=30, and Simultaneity m=60 ) to align with the base after normalization. Figures 10 to 12 show the difference between MI, Simultaneity, and DMC with various m. The ranges of available thresholds based on MI and Simultaneity decrease much more rapidly when m increases. As the same mentioned before, in Figure 11 and 12, the thresholds based on Simultaneity to solve problems, with k 4, have not been 10

12 Figure 12: Problem order k = 5, for MI, Simultaneity, and DMC. The range of available thresholds for DMC is the largest. Note that the thresholds based on Simultaneity to solve problems in both the two cases, m = 30 and m = 60, have not been found. Table 1: The entries represent the N fe by the form (populationsize convergencetime) of DSMGA that adopt MI, Simultaneity, and DMC on best thresholds, the smaller is better. When k = 3, MI works better because of less convergence time. When k = 4, the difference of between MI and DMC decreases. When k = 5, DMC wins over MI due to the critical difference of required population size. Simultaneity performs worst in all experiments because of huge population size with k = 3 and the failure of locating appropriate thresholds with k 4. MI Simultaneity DMC k = 3, m = k = 4, m = N/A k = 5, m = N/A Table 2: The entries represent the N fe by the form (populationsize convergencetime) that adopt MI and DMC-m on best thresholds, the smaller is better. Where DMC-m is modified from DMC by increasing the population sizes on the best thresholds to be equal to MI. When k = 3, MI works better because of less convergence time. When k 4, DMC works better than MI. This table ignore the Simultaneity because of the 100% failure rate. MI DMC-m k=3, m= k=4, m= k=5, m= found in experiments. Once again, DMC is shown to have less sensitivity than MI and Simultaneity are. The results imply that deriving an available threshold based on DMC from empirical experience is simpler than MI and Simultaneity. 4.2 Function Evaluation Reduction This section focuses on the function evaluation, denoted as N fe, of DSMGA that adopts MI, Simultaneity, and DMC. We consider the multiplication of population size and convergence time (number of generations) as the indicator of N fe. Table 1 shows both population size and convergence time of DSMGA that adopts MI, Simultaneity, and DMC with various k and constant m. Simultaneity performs worst in all experiments because of huge population size with k = 3 and the failure of locating appropriate thresholds with k 4 In easier problems, with order k = 3, DMC works worse than MI does, although the best population size of DMC is smaller. The result shows that when k increases, DMC tends to have lower N fe. If we further increase the population sizes for DMC on the best thresholds to be equal to those for MI, the convergence time will decrease rapidly. Table 2 shows the empirical results. Figure 13 shows the N fe of MI, DMC, and DMC-m (DMC with population sizes increased to be the 11

13 Figure 13: N fe of DSMGA that adopt MI, DMC, and DMC-m on best thresholds. The N fe of DSMGA with DMC and DMC-m become lower than the N fe of DSMGA with MI. same as MI,) but ignore the case of Simultaneity because of the lack of success in experiments. The empirical results imply that when the difficulty increases, the N fe of DSMGA with DMC becomes lower than the N fe of DSMGA with MI and Simultaneity. 5 Conclusions Linkage learning is essential for GA to solve nearly decomposable problems successfully. EDAs use interaction-detections metrics to recognize BBs within decomposable problems. These common metrics are nonlinearity, simultaneity, and entropy-based metrics. The previous research has shown that EDAs with entropy-based metrics perform efficiently, but leaves a question if entropy-based metrics are what GAs really need. The idea gives us a motivation to develop another metric. This paper proposed a new interaction-detections metric, named the differential mutual complement (DMC). By observing the behaviors of crossover operators, the new metric is designed for more suitable on what GAs really need. The idea of DMC is to keep schemata from disrupting and to protect the potential schemata. Moreover, the proposed metric is physically meaningful for GA, and the way of how this metric works is clear. The other advantage of the proposed metric is less sensitive to the threshold which plays a role on the efficiency of DSMGA. Therefore, DMC can be more suitable for solving problems. The independent pairs and dependent pairs are the distribution over the range of the proposed metric. Therefore, finding the appropriate thresholds for dividing distributions may be an idea to design the selfadaptive methods. Besides, due to meaningfulness of the proposed metric for GA, the issue that whether DMC could be supported theoretically will be discussed as the future work. DMC has shown some promising characteristics and is more meaningful for GAs than other existing metrics. Nevertheless, further investigations are still needed to check how close DMC is to what GAs really need. Imagine that we have an optimal metric that indicates the existence of the interaction between two genes if and only if GAs need to transfer the information of these two genes together during recombination. Then we should be able to find a problem which fails GAs with any other metrics; only GAs with this optimal one would work on this problem. In addition, finding this metric might also give a hint of the limited success of uni-metric methods such as LLGA by comparing the difference of this metric and the fitness the only metric used for linkage learning in uni-metric methods. References Aporntewan, C., & Chongstitvatana, P. (2003). Building-block identification by simultaneity matrix. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2003), Barron, A., Rissenen, J., & Yu, B. (1998). The MDL principle in coding and modeling. IEEE Transactions on Information Theory, vol. 44, no.6, pp Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. Wiley, New York. Etxeberria, R., & Larra naga, P. (1999). Global optimization using bayesian networks. Proceedings of the Second Symposium on Artificial Intelligence Adaptive Systems,

14 Goldberg, D. E. (1987). Simple genetic algorithms and the minimal, deceptive problem. Genetic Algorithms and Simulated Annealing, Goldberg, D. E. (1989). Genetic algorithms in search, optimization and machine learning. Reading, MA: Addison- Wesley. Goldberg, D. E. (2002). The design of innovation: Lessons from and for competent genetic algorithms. Norwell, MA, USA: Kluwer Academic Publishers. Goldberg, D. E., Korb, B., & Deb, K. (1989). Messy genetic algorithms: motivation, analysis, and first results. Complex Systems, vol. 3, Harik, G. R. (1996). Learning linkage. Foundations of Genetic Algorithms 4, Harik, G. R. (1997). Learning gene linkage to efficiently solve problems of bounded difficulty using genetic algorithms. Doctoral dissertation, University of Michigan, Ann Arbor, ML. Harik, G. R. (February 1999). Linkage learning via probabilistic modeling in the ecga. IlliGAL Report No , University of Illinois at Urbana-Champaign, Urbana, IL. Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor, MI: University of Michigan Press. Kullback, A., & Leibler, R. A. (1951). On information and sufficiency. Annual Mathematical Statistics, vol. 22, Lutz, R. (2002). Recovering high-level structure of software systems using a minimum description length principle. Proceedings of the 13th Irish International Conference on Artificial Intelligence and Cognitive Science (AICS), McCord, K., & Eppinger, S. D. (1993). Managing the integration problem in concurrent engineering. Working Paper 3594, Cambridge, MA: MIT Sloan School of Management. Munetomo, M., & Goldberg, D. E. (1999). Identifying linkage groups by nonlinearity/non-monotonicity detection. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-1999), Pelikan, M., & Goldberg, D. E. (2001). Escaping hierarchical traps with competent genetic algorithms. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), Pelikan, M., Goldberg, D. E., & Cantu-Paz, E. (1999). BOA: The bayesian optimization algorithm. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-1999), Pelikan, M., & Mühlenbein, H. (1999). The bivariate marginal distribution algorithm. pp Pimmler, T., & Eppinger, S. D. (1994). Integration analysis of product decompositions. Proceedings of the ASME International Conference on Design Thory and Methodology, DE-68, Rissanen, J. (1978). Modeling by shortest data description. Automatica, vol.14, pp Rissanen, J. (1998). Hypothesis selection and testing by the MDL principle. The Computer Journal, vol. 42, pp Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, vol. 27, Tsuji, M., Munetomo, M., & Akama, K. (2004). Modeling dependencies of loci with string classification according to fitness differences. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2004), Yu, T.-L., Goldberg, D. E., Sastry, K., Lima, C. F., & Pelikan, M. (2009). Dependency structure matrix, genetic algorithms,and effective recombination. Evolutionary Computation, vol. 17, no. 4, pp

Convergence Time for Linkage Model Building in Estimation of Distribution Algorithms

Convergence Time for Linkage Model Building in Estimation of Distribution Algorithms Hau-Jiun Yang Tian-Li Yu TEIL Technical Report No. 2009003 January, 2009 Taiwan Evolutionary Intelligence Laboratory