Using Matrix Analysis to Approach the Simulated Annealing Algorithm

Using Matrix Analysis to Approach the Simulated Annealing Algorithm Dennis I. Merino, Edgar N. Reyes Southeastern Louisiana University Hammond, LA 7040 and Carl Steidley Texas A&M University- Corpus Christi Corpus Christi, TX 784 An Overview of the Simulated Annealing Algorithm In [3, 4], the annealing of a solid is simulated and described by using the simulated annealing algorithm, a method based in combinatorial optimization. The subject of combinatorial optimization consists of a set of problems that are central to the disciplines of computer science and engineering. Annealing is the physical process of heating a solid and following it by a controlled cooling process. A simulated annealing of a dodecahedron, in particular, was analyzed in the earlier papers. It was pointed out in these papers that simulated annealing could be extended to other solids and plane figures such as polygons as well. Since the simulated annealing algorithm is stochastic in nature and has direct connections to Boltzman machines - a neural network, in this paper we shall generalize the results in the earlier papers and use matrix analysis techniques to simulate annealing based and modeled on a class of Boltzman machines which includes as a special case the simulated annealing of the dodecahedron. For clarity, we shall briefly review some aspects of Boltzman machines and the simulated annealing algorithm. Let (U, C) be a network consisting of units, U = {u,u,...,u n }, and a set of connections, C, consisting of unordered pairs {u i,u j }. A connection {u i,u j } Cis said to join u i to u j. Intrinsic to Boltzman machines are the notions of a connection strength s and a configuration k of the network (U, C); respectively, they are mappings s : C IR and k : U IR into the set of real numbers. A Boltzman machine is a network (U, C) with a given connection strength s. An objective of a Boltzman machine is to attain an overall measurement of desirability, see [, chapter 8]. Specifically, we shall seek to find an optimal configuration k opt in the space S of all configurations k that minimizes the consensus function C : S IR where C(k) = s({u, v})k(u)k(v). () {u,v} C To optimize (), we shall use the simulated annealing algorithm. There are several ways to implement this algorithm and each way depends on a neighborhood structure and a transition mechanism. The latter term suggests a link with Markov chains. A neighborhood structure is a mapping N : S P(S) where P(S) is the family of all subsets of S. A configuration l N(k) is called a neighbor of k. To optimize the consensus (), we shall need a mechanism which allows a configuration to change. Configurations may change from one trial to the next;

in the earlier papers, the configurations which simulate the temperatures on the faces of the dodecahedron change every minute. Given a configuration k, we shall stochastically generate a neighbor l N(k), with the neighborhood structure being defined at the outset, and then it will be determined whether l will replace k. Specifically, let X(m) be the configuration on the mth trial and let P k,l (m) =P (X(m) =l X(m ) = k) be the probability of accepting configuration l on the mth trial given that the configuration of the (m )th trial is k. Under certain conditions, such as those discussed in [, page 8, 4, or 46], the sequence of configurations generated by the simulated annealing algorithm asymptotically converges to an optimal configuration. The controlled cooling process shall be represented by a sequence {c m : m =0,,, 3,...} of real numbers. Following [3], the simulated annealing is described by the algorithm below Begin Simulated Annealing Algorithm Initialize: k; an intial configuration m = 0; a counter Do: generate l in N (k); if C(l) C(k), then k = l else if P k,l (m) > Random(0, ), then k = l m := m +; Until: Calculate Stop Criterion(c m ); end; Extending Simulated Annealing In [3], the set of units U d = {u i : i =,, 3,..., } represented the faces of a dodecahedron and the set of connections was C d = {{u i,u j } : i, j }. A configuration k of (U d, C d ) played the role of temperature. If at a particular minute the temperature on face u i is k(u i ), then on the second minute the temperatures on the faces are prescribed by N d (k) where N d (k)(u i )= 0 k(uj ), () the sum being taken over all units u j, except u i and the face opposite u i. (Since a dodecahedron is a regular solid, we have the notion of opposite faces.) Note, the neighborhood structure N d in [3] has special properties, for one, N d (k) P(S) is a singleton for each configuration k; in particular, N d (k) itself is a configuration. More importantly, since we can identify S with C, N d :C C is actually a linear operator. Also, the transition probability in [3] given by P k,l (m) = { if N (k) =l 0 otherwise (3) shows that a configuration k in one trial (or a minute) changes to the configuration N d (k) in the next trial (or minute); one can say that the transition mechanism is deterministic. In [3], it was

shown that when the connection strength s d is defined by s d ({u, v})= { if u = v if u v 6 (4) and k =(k,k,..., k ), then after several trials (or minutes) the mth configuration Nd m (k) shall ( be approximately k i,..., k i ), an optimal configuration minimizing (). Note, with a suitable basis we can represent the linear operator N d :C C by the matrix 0 0 0 0 0 0 0 0 N d 0 0 0 0. 0 0 0 0 0 0 0 0 0 0 0 0 0 Let k =[k,k,...,k ] T be the column vector representing the temperatures of the faces of the dodecahedron on the first minute. Then, on the second minute the corresponding temperatures of the faces are the entries of the vector N d k. On the third minute, the temperatures are given by the vector N d (N d k)=nd (k). In general, on the (m +) th minute, the temperatures are given by Nd m k. A direct calculation reveals that lim N m m d =. (5) Hence, in each of the faces, the temperature will approach the average of the initial temperatures of all the twelve faces. 3

In this paper, we shall extend the results outlined in [3]. Let U = {u,u,...,u n } and let C = {{u i,u j } : i, j n} be a network representing a Boltzman machine, where n. For the connection strength s, we consider s({u, v})= { n n if u = v if u v. n (6) An optimal configuration k opt minimizing the consensus () with connection strength (6) necessarily and sufficiently minimizes n (k(u i ) µ(k)) (7) n over all configurations k where µ(k) = k(u n i ), see [3] for details regarding the case when n =. Next, we shall consider a neighborhood structure N whose values are not singletons but are infinite subsets of S. In particular, this implies that N is not a linear operator as was the case of the simulated annealing of the dodecahedron. Second, the transition probabilities P k,l (m) are stochastic and not deterministic as the one in (3). Let N =.. be the n-by-n matrix whose entries are all s. If necessary, we will emphasize the size of N by a subscript. For instance N is a -by- matrix. We say that an n-by-n matrix, Q, isdoubly stochastic if the entries of Q are nonnegative, and the sum of the entries in each row and column is. In the case of the simulated annealing of the dodecahedron, note that N d = (N 0 Q) where Q is the doubly stochastic matrix given by 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q =. (8) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 In extending the results outlined in [3], we define the neighborhood structure according to { N (k) = n c (N n cq) : Q is a doubly stochastic matrix and 0 c< n }. (9) For that part of the program in section describing the simulated annealing algorithm, we need to define a mechanism of generating a neighbor l N(k) ofk. Let {E r : r =,, 3,..., n!} be the 4

set of all n-by-n permutation matrices. Note, by a Theorem due to Birkhoff [, Theorem 8.7.], each doubly stochastic matrix Q can be expressed in the form n! where n! Q = α r E r (0) α r =,α r 0. This implies that given c< n, we can stochastically generate a doubly stochastic matrix Q by stochastically choosing an n!-tuple (α r ) and then constructing the matrix in (0). Furthermore, to implement the simulated annealing algorithm we shall have to define a transition probability. For one, let P k,l (m) =α () where l N(k) is given by l = ( n! N n c α r E r ). n c Suppose k is an initial configuration of a Boltzman machine (U, C). By realizing k as a column vector, then the configuration in the second trial is n c (N n c Q )k and the configuration in the third trial is ( n c (N n c Q ) )( n c (N n c Q ) ) k for some 0 c,c < n and doubly stochastic matrices Q,Q. In this paper, provided each c i satisfes 0 c i < n ɛ for some ɛ>0, we shall show that after several trials the m th configuration shall be approximately N n n(k), i.e., m lim m n c i (N n c i Q i )(k) = n N n(k). () As a special case, when n =, c i =, and each Q i is the matrix in (8) for all i, we obtain the case of the simulated annealing of the dodecahedron in [3], that is, m n c i (N n c i Q i )(k) = Nd m (k). In Appendix A we shall verify () and in Appendix B we provide an example using the software package GAP to simulate the annealing of a dodecahedron [5]. GAP (acronym for Groups, Algorithms, and Programming) is a software package suitable for use by undergraduates in advanced classes such as abstract algebra, linear algebra, and number theory. 3 Summary This paper generalizes the simulated annealing process of the dodecahedron in [3, 4]. We have chosen U = {u,u,...,u n } as the set of units and C = {{u i,u j } : i, j n} as the set of connections of a Boltzman machine. We assume that the connection strength of this neural network (U, C) is the one given by (6). Moreover, we have chosen the neighborhood structure N : S P(S) defined in (9). Given an initial configuration k, we generate a neighbor l N(k) by n! choosing stochastically or randomly an n!-tuple (α r ) of non-negative scalars satisfying α r =, α r 0 for all r and then by setting l to be the matrix in the right-hand side of (0). The transition probability P k,l we chose is the one in (). The choice for P k,l (m) is not crucial at this juncture since () shall hold for any other transition probability but we decided to have chosen one for the sake of completeness. 5

To simulate a controlled cooling process, we shall choose a sequence {c m : m =,, 3...} of real numbers satisfying lim c m = 0. The choice of this limit, namely 0, is motivated by the simulated m annealing of the dodecahedron. We quote from [3] some features of the simulated annealing of the dodecahedron: Initially, for each i {,..., } the face u i is labeled with the number k(u i ) and on the second minute the number on face u i becomes the average of the first minute s numbers except those that were on u i and on the face opposite u i, i.e. on the second minute the number on face u i is k(u 0 j ) where the sum is taken over all faces u j except the one on face u i and on the face opposite u i. j We see that when c =, the temperature on a face in the next trial or minute is given by the average of 0 temperatures of the current trial or minute. If we prescribe the temperature on a face in the next trial to be the average of the temperatures of the five neighboring faces (a face in a dodecahedron has five surrounding faces) in the current trial, then the value of c is 7. This suggests to us that a slow cooling process is simulated when the averaging of the temperatures begins with a few faces and then gradually increased to more faces. Thus, we assume that the sequence {c m : m =,, 3,...} in section describing the simulated annealing algorithm is decreasing and converges to 0. { In the general case, we shall show that the sequence of configurations, namely, k, n c (N n c Q )k, ( n c 3 (N n c 3 Q 3 ) )( n c (N n c Q ) ) k, } converges to N n n provided 0 c i n ɛ for some ɛ. Thus, if the initial configuration is k then after several trials the limiting configuration has a constant value at each unit, the value being the average of {k(u i ): i =,, 3,..., n}. For the simulated annealing, this implies that the temperatures evenly distributes over all the units. Note, we do not have any discussion about the rate of convergence which may be dependent on the choice of the transition probability. 4 Appendix A In this section we shall present a short proof supporting our claim (). Theorem Let n be an integer. Suppose 0 c i < n ɛ and Q i is a doubly stochastic matrix m for all i where ɛ is some positive real number. Then lim (N c i Q i )= m n c i n N. Proof. Notice that N = nn, and in general N l = n l N for each positive integer l. A direct computation reveals that NQ = QN = N for any n-by-n doubly stochastic matrix Q. Note, the product of two doubly stochastic matrices is again doubly stochastic; therefore, the entries of m m Q i are bounded. Rewriting the product (N c i Q i ), we obtain n c i = m n c i [(N c Q ) (N c m Q m )] 6

= m m N m c i N (m ) + c i c i N (m ) c i c i n c i i <i i <i <i 3

connections. In carrying out the calculations involved in the simulated annealing algorithm, we will use a software for computational discrete algebra, namely, GAP [5]. Given a configuration k of the network (U d, C d ), the set N (k) of neighbors of k is the set defined in (9). Let us consider the permutation matrices {E r : r =,..., 6} generated by the permutations (4, 8, ) and (8, ). This is a small subset of all -by- permutation matrices. This subset is actually a dihedral group. To stochastically generate a neighbor l N(k), see algorithm before the beginning of section and (0), we will randomly select a stochastic -by- matrix of the form 6 6 Q = α r E r where α r =,α r = i r 3 0, i r {0,,, 3}. Moreover, if l = n c (N n cq) then the transition probability shall be chosen according to (), that is, P k,l = α. If the initial configuration is k =[,, 3, 4, 5, 6, 7, 8, 9, 0,, ] and the sequence c i representing temperature which is slowly decreasing is given by c i = where i =,..., 0, then the i [ final output of the algorithm described just before section is approximately the configuration 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, ] 37 50 50 50 50 50 50 50 50 50 50 50 50. Note, the average of the numbers,.., is 6.5 which is approximately 37. This supports our claim, as stated in the theorem in appendix A, 50 that the sequence of configurations generated by the simulated annealing converges to an optimal configuration minimizing (7). The following are the commands we keyed in when we were using GAP. gap > g := Group((4, 8, ), (8, )); ; gap > gl := Elements(g); ; gap > Size(gl); 6 gap > L := [ ]; ; gap > for i in [0..3] do for j in [..3]do gap > if i <= j then Append(L, [i/j]); gap > fi; od; od gap > S := Set(L); ; T := T uples(s, 6); ; gap > c := [ ]; ; gap > for i in [..Length(T )] do gap > z := List([..6],j >T[i][j]); gap > if Sum(z) =then Append(c, [z]) ; fi; od; gap > cons := function(x) gap > local ave; gap > ave := Sum(List([..],j >x[j]))/; gap > return Sum(List([..],j > (x[j] ave) )); end; gap > r := [ ]; ; q := [ ]; ; l := [ ]; ; k := [,, 3, 4, 5, 6, 7, 8, 9, 0,, ]; ; gap > for i in [..0] do gap > j := i ; 8

References gap > coef := Random(c); gap > for m in [..] do gap > for p in [..6] do gap > r[p] :=coef[p] k[m (gl[p] ( ))]; od gap > l[m]:=sum(r); r := [ ]; gap > q[m] := (/( j)) (Sum(k) j l[m]); od; gap > if cons(q) <= cons(k) then k := q; gap > elif coef[] > AbsInt(Random([..]))/ then k := q; gap > fi; od; gap > final := [ ]; ; gap > for i in [..] do gap > final[i]:= Int(00 k[i])/00; od; gap > final; [37 ] 50 [] Aarts, E. and Korst, J., Simulated Annealing and Boltzman Machines, Wiley-Interscience Series in Discrete Mathematics and Optimization, John Wiley and Sons, 989. [] Horn, R.A. and Johnson, C.R., Matrix Analysis, Cambridge University Press, 990. [3] Reyes, E.N. and Steidley, C., A GAP Approach to the Simulated Annealing Algorithm, Computers in Education Journal of the American Society for Engineering Education, Vol. 7, No.3 (997), 43-47. [4] Reyes, E.N. and Steidley, C., A Theoretical Discussion of the Mathematics Underlying A GAP Approach to the Simulated Annealing Algorithm, to appear in the Computers in Education Journal of the American Society for Engineering Education. [5] M. Schönert et al., GAP-Groups, Algorithms, and Programming, Version 3 Release 4, Lehrstuhl D für Mathematik, RWTH, Aachen, Germany, 994. 9