A Markov Framework for the Simple Genetic Algorithm

A arkov Fraework for the Siple Genetic Algorith Thoas E. Davis*, Jose C. Principe Electrical Engineering Departent University of Florida, Gainesville, FL 326 *WL/NGS Eglin AFB, FL32542 Abstract This paper develops a theoretical fraework based on arkov chains for the siple genetic algorith (operators of reproduction, crossover, and utation. We prove the existence of a unique stationary distribution for the arkov chain when utation probability is used as a control paraeter. We also show that there is a stationary distribution liit when the control paraeter approaches zero. Finally, we also present a strong ergodicity bound to ensure that the nonstationary algorith achieves the liiting distribution, and we prove that the stationary distribution liit has nonzero coponents corresponding to all solutions. Running title: arkov odel of genetic algorith Keywords: arkov odel, genetic algorith. Send correspondence to: Dr. Jose C. Principe CSE-E444, Electrical Engineering Departent University of Florida Gainesville, FL 326 phone: 904-392-2662 Fax: 904-392-0044 eail: principe@brain.ee.ufl.edu Davis and Principe page

A arkov Fraework for the Siple Genetic Algorith Thoas E. Davis*, Jose C. Principe Electrical Engineering Departent University of Florida, Gainesville, FL 326 *WL/NGS Eglin AFB, FL32542. Introduction Siulated annealing and genetic algoriths are stochastic relaxation search techniques suitable for nonconvex optiization probles. Each produces a sequence of candidate solutions to the underlying optiization proble, and the purpose of both algoriths is to generate sequences which optiize the objective function. But the algoriths explore the search space in radically different ways due to the different nature of the next state transition echaniss, which also creates differences in the assyptotic probability distribution. There is also a big difference in the theoretical understanding of both algoriths. While in the case of siulated annealing a coplete atheatical theory exists to quantify the algorith perforance in ters of arkov chain odels [Laarhoven and Aarts, 987], [Gean and Gean, 984], Lundy and ees, 986], itra and Roeo, 985], [Riordan, 958], in the case of the genetic algorith only tiid attepts have been effected in its atheatical forulation. For instance, while we know how to guarantee asyptotic convergence in siulated annealing by constraining the control paraeter (annealing schedule bound for the genetic algorith optiality is guaranteed by including elitis, but then the quantification of convergence rate becoes probleatic. On the other hand, for the siple genetic algorith our results show that no such global optiality exists, but the assyptotic convergence is faster than siulated annealing. oreover, the nature of the genetic operators and their influence on algorith behavior are only understood in general ters [Davis, 987], [Golberger 985, 987], [Grefensette, 985, 987]. The fundaental goal of this paper is to provide a theoretical fraework for analyzing the siple genetic algorith (crossover, utation and reproduction operators based upon the asyptotic probability distribution of the population sequences it produces. We will assue knowledge about the genetic operators, and will present the fraework in ter of arkov chains. 2. A arkov Chain odel of the Siple Genetic Algorith When the genetic operators of crossover, utation and reproduction are applied to the present population to create the next generation, a stochastic dependency between successive population is created. Therefore, the sequence of populations in the genetic algorith can be viewed as a stochastic process with finite state space (a finite bit string solution space will be assued here for siplicity. oreover, the conditional dependency of each population on its predecessors is copletely described by its dependence upon the iediate predecessor population. These two observations allow us to odel the siple genetic algorith as a arkov chain [Davis, 99]. Let a cobinatorial optiization proble be characterized by a pair (S,R where S{0,} L and R is a strictly positive real valued reward function, that one wants to axiize. Let the algorith have a fixed population, and a generation be represented as Davis and Principe page 2

( ( 0, (,, ( 2 L where (i represents the nuber of occurrences of solution i in the population (i ε S. is a distribution of nondistinct objects over Ncard(S2 L bins. The set of all such distributions S { } becoes the representation of the genetic algorith search space. The cardinality of S is N' + 2 L + N The genetic algorith can be represented by the quadruple (S, 0, P q, G where P q is a state transition atrix, and G{Q} is a finite length sequence of paraeter vectors Q(p (k, p c (k. The paraeters p (k and p c (k are respectively the utation and crossover probabilities. Here we will be iposing that the utation probability be onotonically nonincreasing, and the crossover probability will be kept constant. With this odel the siple genetic algorith will evolve as a sequence of states that are an inhoogeneous arkov chain. The state transition echanis will now be analyzed for the three ost coon genetic operators. One operator algorith (reproduction Let us consider first the case of reproduction only, i.e let us set Q k (0,0. In this case the conditional probability of selecting a solution iεs fro a population described by the state vector n ε S is P ( i n n( i R( i n( j R( j Thus the conditional probability of the successor generation, given that the present generation is n, is a ultinoial distribution P ( n P ( i n ( i iεs iεs nεs' ( where! (! i iεs The transition probability atrix of the arkov chain is coposed of the array of condition probabilities P [ P i ( n ]. Therefore the one operator arkov chain is tie hoogeneous, and the set of states that represent unifor populations A εs' A are absorbing states of the chain P ( A A, [Davis, 99]. For each state A εs' A the associated row of the transition atrix has a in the principal diagonal and 0 elsewhere. It follows that there are N2 L such states, and the stationary distribution is not unique. The existence of absorbing states precludes irreducibility. The expected nuber of transitions E{k A } to arrive at an absorbing state is finite and an upper Davis and Principe page 3

bound is given by E { k A } R ax 2 < R in where R in and R ax are the extrees of R [Davis, 99]. Notice that this converge does not require any assuption on the range of R. Even when the objective function exerts zero selective pressure (i.e. when R(i R(j for all i, j belonging to S, the population still converges to an absorbing state, which is acknowledge in the genetic algorith literature as the genetic drift [Goldberger, 989]. Two operator algorith (reproduction and utation Let us now select a nondegenerated value for the utation probability, i.e 0<p (k <. We will be using utation in a role very siilar as teperature in the siulated annealing algorith. The probability of selecting a solution i belonging to S fro the population n belonging to S for the two operator genetic algorith P 2 (i n can be coputed fro the one operator case P (i n ties a factor that accounts for the probability of utation event required to transfor j into i. This probability can be expressed as H( i, j p ( p L H i, j ( where H(i,j is the Haing distance of the pair (i,j, which represent the nuber of bits which ust be altered by utation to transfor i into j. Thus, P 2 H i, j ( in p ( + α L α H i j ( (, ( p L H( i, j P ( jn ( ( jn P iεs nεs' (2 p where α, and we restrict 0 < p. Substituting eq. into eq.2, we obtain p 2 P 2 ( in α H( i, j ( n( j R( j (3 ( + α L n( k R( k Likewise one can define the ultinoial distribution for P 2 ( n as kεs P 2 ( n P 2 ( i n ( i iεs and the transition probability atrix of the arkov chain for the two operator genetic algorith Davis and Principe page 4

becoes P [ P 2 ( n ]. Since the eleents of P depend on α, the two operator genetic algorith is in general tie inhoogeneous. P 2 (i n is strictly positive for all n ε S. The following liit holds. li ( in P ( in P 2 The rows of the state transition atrix corresponding to the one operator absorbing states have a siple for given by (fro eq. 3 n A εs' A P 2 ( in α H ( i, i A i i A ( + α L (4 ( + α L i i A which allow us to write for the population P 2 ( n A α i, ( + α L (H ( i i A (5 oreover one can show [Davis, 99] that since the reward function is positive, H(i,j is a distance, and for 0 < α α L n( j R( j n( j R(α j H( i, j n( j R( j Therefore fro eq.3 and eq.4 α ( + α L P 2 ( n L ( + α (6 which shows that the arkov chain is irreducible for the two operator genetic algorith, and that it possesses a unique stationary distribution q α, independent of the initial state, given by S q α > 0 q T α q α T P The iportant observation is that utation akes the arkov description of the two operator ge- q T α Davis and Principe page 5

netic algorith irreducible, and consequently causes it to have an unique stationary distribution. The iplications are very iportant since the asyptotic state occupancy probability will be copletely deterined by the algorith paraeters and objective function without dependence to the initial state, and can therefore be used for optiization. Notice also that the zero utation liit of the conditional probabilities of the two operator case equals the corresponding one operator probabilities. The three operator genetic algorith (reproduction, utation and crossover The three operator siple genetic algorith corresponds to the case Q k (p (k, p c (k, with both probabilities non zero. An extension of the two operator case can be produced, by defining a new function I(i,j,k,l over ordered quadruples with i, j, k ε S and s ε {0,,...L-} is a bit string. The states i, j represent the first and second parent strings and k the descendent string. The bit string location s is the location randoly selected by the crossover operator, and is assued uniforly distributed. Thus I takes values {0,}, where shows that bit string k is produced by crossing bit strings i and j at the site s. One can forulate the conditional probability of constructing via reproduction and crossover only a solution k ε S given a current population described by n, as P' 2 ( kn p c P ( in P ( jn L iεs s I( ijkl,,, + ( p c P ( kn where P (i n is given by eq.. Notice that the role of P 2 (i n is analogous to the role of P (i n, and therefore a derivation parallel to the one used for the two operator algorith will yield P 3 ( in ( + α L α H( i, j P' 2 ( jn (7 for the three operator conditional probabilities, and P 3 ( n P 3 ( i n ( i (8 for the transition atrix. Fro the analysis of theses equations we conclude that the three operator arkov chain is in general tie-inhoogeneous, that in the liit iεs li P 3 ( in P 2 ( in and that α ( + α yielding as for the two operator case L P 3 ( n ( + α L Davis and Principe page 6

P 3 ( n A α i, ( + α L (H ( i i A (9 Therefore the arkov chain is irreducible, and copletely deterined by the objective function and the algorith paraeters. As a conclusion of these results we see that the asyptotic behavior of the one operator genetic algorith is doinated by the states which correspond to unifor populations (absorbing states, which are arrived at necessarily in a finite nuber of iterations. The asyptotic probability distribution depends upon the initial population, 0. However, a unique stationary distribution exists for the two and three operator algoriths with α > 0, or equivalently that their asyptotic probability distribution are independent of 0. In the liit case, both of these algoriths degenerate into the absorbing states of the one operator case. A very iportant question is whether the unique stationary distributions approach liits as. But before answering this question, one needs to find an alternate expression for the stationary distribution. 3. Deterinant for of the stationary distributions Before addressing the zero utation liit behavior of the stationary distribution, we investigate ways to express the stationary distribution q α in a for that can lead to evaluation. Since the stationary distribution is a stochastic atrix, it follows that it is a left eigenvector of the state transition atrix corresponding to eigenvalue. The additional constraint that its eleents are probabilities akes the solution unique. Therefore, T T q α ( P I 0 Using the Perron-Frobenious theory of stochastic atrices, one can conclude that the rank of the atrix ( P I is N - where N card (S is the cardinality of P. Therefore, we can substitute exactly one colun (in any colun location of the atrix equation without sacrificing its validity. We propose to replace the colun indexed by n with the vector, thus producing the syste T T q α ( P I n e (0 n T where ( P I n is the new atrix and e n is the row vector containing in colun n and 0 elsewhere. It follows that the resulting syste is full rank, and therefore we can use Craer s rule to write the coponents of the stationary distribution as Davis and Principe page 7

q α ( P I n ( P I n where P I is derived fro P I by replacing the th row with the row vector. P I is the cofactor of each eleent in colun n, so the denoinator can be written as ( n ( n e n T ( P I n ( P I n ( n The nuerator can be written as a difference of two deterinants derived fro ( P I, one with the th T T row replaced by and the second with the th T e n e row replaced by e. Therefore q α can also be written [Davis, 99], εs q α P P n nεs I ( I where P and are derived fro by replacing the th and n th P n P rows by the 0 row vector. Thus coputing the stationary distribution reduces to evaluating the characteristic polynoials of P, the transition probability atrix of the arkov chain, at λ. Using again the Perron-Frobenious theore, we can conclude that for all α > 0, the value of the deterinant is different fro zero, and that the algebraic sign of the deterinant is (- N P I (Davis 99. An iediate consequence is that both the nuerator and denoinator of q α are nonzero for α > 0, and have identical algebraic signs, so the stationary distribution is strictly positive, i.e. q α >0. The proble is that in the α 0 liit we get an indeterinate for for q α since the row of P I corresponding to the absorbing state is zero. In order to evaluate the α0 liit one needs to transfor into a for that individualize the rows for the absorbing states. P 4. The zero utation liit of the stationary distribution Let us first study the behavior of q α as a function of α. Fro eq.3 it follows that all eleents of the state transition atrix are rational functions of α with denoinator polynoial ( + α L. Thus for α > 0 L N' ( + α ( P I θ ( α and so q α is also a polynoial function of α Davis and Principe page 8

q α θ ( α (2 ( α θ n nεs as well as its first derivative [Davis, 99]. The rows of P I corresponding to the absorbing states (row n A have a siple for. The nondiagonal eleents can be calculated fro equations 5 and 9, and the principal diagonal eleents reduce to the evaluation for and subtracting, yielding n A P( n A n A ( + α L Lα + O ( α 2 ( + α L where O(. eans second order ters in the arguent. Therefore, P( n n A I Lα + O ( α 2 ( + α L α n(h i i, i A n ( + α L ( n n A n ε S - { na } (3 Let us divide further the eleents of P I that are not absorbing states into another set, the set S( na of states that can reach one of the absorbing states na in a single bit utation event. The calculation for these eleents for row na shows that their value is siply α/(+αl [Davis, 99], therefore we can write P( n n A I Lα + O ( α 2 ( + α L O ( α s ( + α L α ( + α L n n S n n A S( n A S( n A { n A } (4 Davis and Principe page 9

Notice that this results also applies to I if row does not correspond to an absorbing state. When it does ( A, then row A contains - at its principal diagonal and zeros elsewhere. Equation 4 still applies to the other absorbing state rows different fro the one considered ( A. Exactly N- such rows exist in I. Therefore the lowest order ter with nonzero coefficient P A which can exist in the nuerator of P P A I is of α N-. Siilarly, the lowest order ter with nonzero coefficient for P when is not absorbing is α N I. Therefore we can express the liiting (α0 value of q A in ters of the nonzero coefficients by substitution in eq. [Davis, 99]. The end result is to arrive at functional fors for the coefficients of the state transition atrix (eq. 2 as a function of α. q α n A εs' A n A εs' A P' A I' + P' na P' na I' + + ( O ( α ( + α N' N + ( O ( α ( + α N' N + ( O ( α ( + α N' N I' + + ( O ( α ( + α N' N A S A S S A (5 The zero utation probability liit exists as long as the denoinator of eq. 5 is non zero. The essential step is to deonstrate the existence of a priitive stochastic atrix Q, which satisfies both 0 li P' A Q' and Q' li P' A (6 We will consider here the two operator genetic algorith, but the results for the three operator case can also be obtained in the sae way. Let Q be generated fro the liit of P' A by replacing row with the row whose eleents are A Q' ( A > 0 N' N + ε S' S' A + (7 { A } α 0 + Thus the row A su is. Since all reaining rows are identical to those of the of P' A Q is stochastic atrix that obeys the condition of eq. 6. We shall prove now that it is priitive, by showing that every state ε S' S' A + { A } is accessible in soe nuber of steps fro every other state belonging to the sae set. Eq. 7 shows already that all states in S' S' A + { A } are accessible in one transition fro A. We have only to deonstrate that A can be reached in a finite nuber of steps fro nε S -S A. Let i A ε S be the bit string represented in A, and let i ε S be selected such that n(i > 0, and Davis and Principe page 0

H( i, i A H( i, i A for all i represented in n ε S -S A. There are two conditions that one ust con- i A. If i i A sider: i i A or i Q' ( A n li P A ( A n li P 2 ( i A n P ( i A n > 0 and consequently A is accessible fro n in transition. For the second case, let us consider a point i 2 such that H( i 2, i A H( i, i A If n ε S A is the one operator absorbing state defined by the condition n (i, while n 2 ε S(n is the adjacent non absorbing state defined by n 2 -, n 2 (i 2, then fro the construction of Q, Q' ( n 2 n li P 2 ( n 2 n + li P L 2 ( n n L P ( n n L P ( i n > 0 Thus n 2 is accessible fro n in one transition. If i 2 i A then by case A is accessible in one additional transition. Otherwise, the case 2 arguent is repeated, and the procedure necessarily terinates with at ost H(i,i A + applications of this rule, which proves that the state is accessible in soe finite nuber of transitions. Thus Q is priitive. We can then conclude that the deterinant li I' is different fro zero, and that its algebraic sign is (- N -N+. The copo- P' A nents of the stationary distribution are li q α li li n A εs A' 0 Notice that the liiting stationary distribution for the zero utation probability is strictly positive for the row coponents of absorbing states. A consequence is that there is always a nonzero probability associated with the absorbing states of the liiting distribution, including those for suboptial solutions. Our conjecture, backed by soe siulations (Davis 99, is that the probability can be ade as sall as desired just by increasing the initial population size (which controls the nuber of absorbing states. Thus unlike siulated annealing, the siple genetic algorith fails to provide convergence to global optiality with probability one. P' A I' P' A I' A S A S S A (8 5. An ergodicity bound for the utation probability Davis and Principe page

It was entioned that utation in the genetic algorith has a role very siilar to teperature in siulated annealing. It is also known that the teperature schedule is extreely iportant for the robust convergence of siulated annealing, because it guarantees that algorith achieves the liiting distribution (strong ergodicity. In this section we would like to propose a schedule bound for the utation in the genetic algorith that also guarantees strong ergodicity. First we will show that the schedule p ( k ensures weak ergodicity of the corresponding nonstationary siple genetic algorith.we will start by substituting in the coefficient of ergodicity τ the lower bound of eq. 6 for the two operator genetic algorith, and show that the sequence of τ (. diverges (siilarly the lowerbound of the three operator case would yield siilar results. Writing the coefficient of ergodicity as 2 k L ( P in in ( P ( ( n, P( n 2 τ ( P in in ( P ( ( n, P( n 2 L α ( + α ( 2α + α L (9 The chain is weakly ergodic if the sequence of paraeter values {α(k} satisfies k ( 2α ( k + α ( k L Coparing this with the divergent series k one concludes that α ( k + α ( k 2 k L P ( k 2 k L (20 provides a guarantee of weak ergodicity. For the sequence of control paraeters α(k exists a sequence of vectors {q k } (where q k q α for αα(k due to the existence of a stationary distribution for the tie hoogeneous genetic algorith. Further we also showed the stationary distribution vector was continuous and that its first derivative also exists and is continuous. Therefore, if the control paraeter is onotonically decreasing, then by the ean value theore one can write, Davis and Principe page 2

q k + q k dq α dα a α ( k ( α ( k + α( k where α ( k + < α ( k < α( k. Consequently dq α α ( k + α( k dα ( q k + q k k a α ( k k (2 Fro the properties of the state transition atrix, it is possible to construct a function g α which is continuous in the variable α on the closed interval 0 α [Davis, 99] as g α dq α li dα dq α dα 0 < α α0 Then it follows that g α is bounded in the closed interval by B sup g α, Substituting this result in the eq. 2, q k + k ( q ( k B ( α ( a ( The series of vector differences required for strong ergodicity can then be written as q k + k ( q ( k q k + q k < B N'B < k εs' εs' (22 This proves that the previously asserted utation probability sequence bound guarantees strong ergodicity for the two and three operator genetic algorith. It is instructive to copare this bound with the siulated annealing bound of K/log(k. Defining the ration of the two bounds by ρ(k ρ ( k 2 k L Klog ( ( k it follows that ρ(k converges to zero for k, which proves that the assyptotic converge rate Davis and Principe page 3

of the genetic algorith is superior to that of the siulated annealing. 6. Conclusions This paper presents a atheatical forulation of the siple genetic algorith (reproduction, utation and crossover as a arkov chain. It was shown that the reproduction only algorith does not possess a unique stationary distribution. In fact the arkov chain has 2 L absorbing states, one associated with each possible unifor population. The liiting behavior of the algorith is therefore controlled by the initial population, and the resulting algorith ca not guarantee convergence to global optiality. However, when the utation operator is added, it was deonstrated that a unique stationary distribution exists. utation in our forulation is a control paraeter that is analogous to teperature in the siulated annealing literature. utation and reproduction ake the genetic algorith state behavior powerful enough to visit the whole search space, with an asyptotic behavior independent of the initial population but dependent on the objective function and algorith paraeters. The two operator algorith can therefore be utilized in optiization. It is interesting that fro this point of view the addition of crossover, does not see to bring any additional advantage. oreover, we also deonstrated that in the zero utation liit a liiting distribution existed. We utilized the Perron-Frobenious theore to forulate an existence arguent for the liiting stationary distribution for tie hoogeneous two and three operator algorith. We deonstrated that the liiting stationary distribution is strictly positive, which iplies that there is always a finite probability associated with the absorbing states. This eans that the liiting distribution is not guaranteed to converge to the desired liiting behavior. Siulation results [Davis, 99] show that the liiting distribution entropy decreases with the population size, which suggests that for large populations, the probability associated with nonoptial states can be ade as sall as required. However, we were not able to derive a close for solution for the stationary distribution, so the above observation is siply a conjecture at this point. It is iportant to copare the asyptotic speed of convergence of the genetic algorith and siulated annealing. We proposed a bound that guarantees strong ergodicity for the convergence of the genetic algorith. As illustrated, the convergence of the two operator genetic algorith is superior to that of siulated annealing. This is an iportant point that can give an edge to the genetic algorith, because as is well known, the convergence of the siulated annealing is too slow for a lot of applications. Once again for our established bound, crossover does not influence the asyptotic behavior of the two operator algorith. It is an open question if the speed of the algorith for finite tie iproves with crossover. Our present research effort is directed towards the establishent of a close for solution of the stationary distribution. The essential task, which has already been started [Davis, 99], is to evaluate the deterinants in eq., and the corresponding liiting case eq. 4. It involves expanding the denoinator polynoials of eq. as a ultivariate Taylor series around the point α. This procedure can also be extended with inor odifications to the liiting case (eq. 4. There are Davis and Principe page 4

significant identities aong the coefficients of the Taylor series that can be linked to the algebra of syetric and alternating polynoials. 7. Acknowledgents Dr. Davis was partially supported by the US Air Force Araent Laboratory. 8. References Laarhoven P., Aarts E. (987, Siulated Annealing, D. Reidel Publishing Co., Dordrecht, Holland. Gean S., Gean D. (984, Stochastic relaxation, Gibbs distributions and the Bayesian restoration of iages, IEEE Trans. Patt. Anal. ach. Intel., vol PAI-6, 6:72-74. Lundy., ees A. (986, Convergence of an annealing algorith, ath Prog. 234, -24. itra D., Roeo F., Sangiovanni-Vincentelli (985, Convergence and finite tie behavior of siulated annealing, Proc. 24th Conf. on Decision and Control, Ft. Lauderdale, 76-767. Riordan J. (958, An introduction to cobinatorial analysis, Wiley, N. York. Davis L. (987, Genetic algoriths and siulated annealing, organ Kaufann, Los Altos. Goldberg D. (985, Optial population size for binary coded genetic algoriths, TCGA Report 8500, Dept. Eng echanics, U. of Alabaa, Tusculoosa. Goldberg D. (987, Genetic Algoriths in search, optiization and achine learning, Addsion-Wesley, Reading. Grefensette J. (985, Proc. Int. Conf. Genetic Algoriths and Applications Lawrence Earlbau, Hillsdale. Grefensette J. (987, Proc. Second. Int. Conf. Genetic Algoriths and Applications Lawrence Earlbau, Hillsdale. Davis T. (99, Towards an extrapolation of the siulated annealing convergence theory onto the genetic algorith, Ph.D. Dissertation, University of Florida, Gainesville. Davis and Principe page 5