Using Matrix Analysis to Approach the Simulated Annealing Algorithm

Similar documents
Alternative Characterization of Ergodicity for Doubly Stochastic Chains

Random Walks on Graphs. One Concrete Example of a random walk Motivation applications

Simulated Annealing. Local Search. Cost function. Solution space

Lecture 20 : Markov Chains

RATIONAL REALIZATION OF MAXIMUM EIGENVALUE MULTIPLICITY OF SYMMETRIC TREE SIGN PATTERNS. February 6, 2006

Boolean Inner-Product Spaces and Boolean Matrices

Markov Chains. As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006.

Intrinsic products and factorizations of matrices

Single Solution-based Metaheuristics

On asymptotic behavior of a finite Markov chain

arxiv: v1 [math.co] 10 Aug 2016

Multi-coloring and Mycielski s construction

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004

Markov Chains, Stochastic Processes, and Matrix Decompositions

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

0 Sets and Induction. Sets

Computational statistics

THE N-VALUE GAME OVER Z AND R

Markov decision processes and interval Markov chains: exploiting the connection

Finite-Horizon Statistics for Markov chains

CHAPTER 6. Markov Chains

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

6 Markov Chain Monte Carlo (MCMC)

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Aunu Integer Sequence as Non-Associative Structure and Their Graph Theoretic Properties

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University

M17 MAT25-21 HOMEWORK 6

P(X 0 = j 0,... X nk = j k )

RANK AND PERIMETER PRESERVER OF RANK-1 MATRICES OVER MAX ALGEBRA

Chapter 7 Network Flow Problems, I


Kernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman

Vertex colorings of graphs without short odd cycles

The concentration of the chromatic number of random graphs

On Expected Gaussian Random Determinants

Modeling and Stability Analysis of a Communication Network System

Average Reward Parameters

RESEARCH ARTICLE. An extension of the polytope of doubly stochastic matrices

Row and Column Distributions of Letter Matrices

A hierarchical network formation model

Theoretical Computer Science

Reaching a Consensus in a Dynamically Changing Environment - Convergence Rates, Measurement Delays and Asynchronous Events

Quivers of Period 2. Mariya Sardarli Max Wimberley Heyi Zhu. November 26, 2014

POSITIVE PARTIAL REALIZATION PROBLEM FOR LINEAR DISCRETE TIME SYSTEMS

MATRIX GENERATORS FOR THE REE GROUPS 2 G 2 (q)

b jσ(j), Keywords: Decomposable numerical range, principal character AMS Subject Classification: 15A60

Reinforcement Learning

Linear equations in linear algebra

A New Approximation Algorithm for the Asymmetric TSP with Triangle Inequality By Markus Bläser

Some Results Concerning Uniqueness of Triangle Sequences

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

What is a vector in hyperbolic geometry? And, what is a hyperbolic linear transformation?

Increments of Random Partitions

Orthogonal Arrays & Codes

On the number of cycles in a graph with restricted cycle lengths

INVARIANTS OF TWIST-WISE FLOW EQUIVALENCE

Lecture 2: From Classical to Quantum Model of Computation

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

21-301, Spring 2019 Homework 4 Solutions

THE DYNAMICS OF SUCCESSIVE DIFFERENCES OVER Z AND R

5. Simulated Annealing 5.1 Basic Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini

RESEARCH STATEMENT-ERIC SAMANSKY

Solving Homogeneous Systems with Sub-matrices

The spectra of super line multigraphs

Reaching a Consensus in a Dynamically Changing Environment A Graphical Approach

arxiv: v1 [math.pr] 21 Mar 2014

Analytic Number Theory Solutions

Signal Recovery from Permuted Observations

Higher Spin Alternating Sign Matrices

ORDERS OF ELEMENTS IN A GROUP

Equivalence of Regular Expressions and FSMs

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Relative Improvement by Alternative Solutions for Classes of Simple Shortest Path Problems with Uncertain Data

6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

THE REAL NUMBERS Chapter #4

Strictly Positive Definite Functions on a Real Inner Product Space

Eigenvalues, random walks and Ramanujan graphs

Geometric Mapping Properties of Semipositive Matrices

Stochastic modelling of epidemic spread

Lecture 1: Brief Review on Stochastic Processes

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

STEINER 2-DESIGNS S(2, 4, 28) WITH NONTRIVIAL AUTOMORPHISMS. Vedran Krčadinac Department of Mathematics, University of Zagreb, Croatia

k-protected VERTICES IN BINARY SEARCH TREES

Chapter 2: Linear Independence and Bases

On a problem of Bermond and Bollobás

The Nearest Doubly Stochastic Matrix to a Real Matrix with the same First Moment

Exercise Solutions to Functional Analysis

The Terwilliger Algebras of Group Association Schemes

Notes taken by Graham Taylor. January 22, 2005

Peter J. Dukes. 22 August, 2012

Online solution of the average cost Kullback-Leibler optimization problem

Vector Spaces ปร ภ ม เวกเตอร

Solving the Hamiltonian Cycle problem using symbolic determinants

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

An estimation of the spectral radius of a product of block matrices

On Backward Product of Stochastic Matrices

SCHUR IDEALS AND HOMOMORPHISMS OF THE SEMIDEFINITE CONE

Transcription:

Using Matrix Analysis to Approach the Simulated Annealing Algorithm Dennis I. Merino, Edgar N. Reyes Southeastern Louisiana University Hammond, LA 7040 and Carl Steidley Texas A&M University- Corpus Christi Corpus Christi, TX 784 An Overview of the Simulated Annealing Algorithm In [3, 4], the annealing of a solid is simulated and described by using the simulated annealing algorithm, a method based in combinatorial optimization. The subject of combinatorial optimization consists of a set of problems that are central to the disciplines of computer science and engineering. Annealing is the physical process of heating a solid and following it by a controlled cooling process. A simulated annealing of a dodecahedron, in particular, was analyzed in the earlier papers. It was pointed out in these papers that simulated annealing could be extended to other solids and plane figures such as polygons as well. Since the simulated annealing algorithm is stochastic in nature and has direct connections to Boltzman machines - a neural network, in this paper we shall generalize the results in the earlier papers and use matrix analysis techniques to simulate annealing based and modeled on a class of Boltzman machines which includes as a special case the simulated annealing of the dodecahedron. For clarity, we shall briefly review some aspects of Boltzman machines and the simulated annealing algorithm. Let (U, C) be a network consisting of units, U = {u,u,...,u n }, and a set of connections, C, consisting of unordered pairs {u i,u j }. A connection {u i,u j } Cis said to join u i to u j. Intrinsic to Boltzman machines are the notions of a connection strength s and a configuration k of the network (U, C); respectively, they are mappings s : C IR and k : U IR into the set of real numbers. A Boltzman machine is a network (U, C) with a given connection strength s. An objective of a Boltzman machine is to attain an overall measurement of desirability, see [, chapter 8]. Specifically, we shall seek to find an optimal configuration k opt in the space S of all configurations k that minimizes the consensus function C : S IR where C(k) = s({u, v})k(u)k(v). () {u,v} C To optimize (), we shall use the simulated annealing algorithm. There are several ways to implement this algorithm and each way depends on a neighborhood structure and a transition mechanism. The latter term suggests a link with Markov chains. A neighborhood structure is a mapping N : S P(S) where P(S) is the family of all subsets of S. A configuration l N(k) is called a neighbor of k. To optimize the consensus (), we shall need a mechanism which allows a configuration to change. Configurations may change from one trial to the next;

in the earlier papers, the configurations which simulate the temperatures on the faces of the dodecahedron change every minute. Given a configuration k, we shall stochastically generate a neighbor l N(k), with the neighborhood structure being defined at the outset, and then it will be determined whether l will replace k. Specifically, let X(m) be the configuration on the mth trial and let P k,l (m) =P (X(m) =l X(m ) = k) be the probability of accepting configuration l on the mth trial given that the configuration of the (m )th trial is k. Under certain conditions, such as those discussed in [, page 8, 4, or 46], the sequence of configurations generated by the simulated annealing algorithm asymptotically converges to an optimal configuration. The controlled cooling process shall be represented by a sequence {c m : m =0,,, 3,...} of real numbers. Following [3], the simulated annealing is described by the algorithm below Begin Simulated Annealing Algorithm Initialize: k; an intial configuration m = 0; a counter Do: generate l in N (k); if C(l) C(k), then k = l else if P k,l (m) > Random(0, ), then k = l m := m +; Until: Calculate Stop Criterion(c m ); end; Extending Simulated Annealing In [3], the set of units U d = {u i : i =,, 3,..., } represented the faces of a dodecahedron and the set of connections was C d = {{u i,u j } : i, j }. A configuration k of (U d, C d ) played the role of temperature. If at a particular minute the temperature on face u i is k(u i ), then on the second minute the temperatures on the faces are prescribed by N d (k) where N d (k)(u i )= 0 k(uj ), () the sum being taken over all units u j, except u i and the face opposite u i. (Since a dodecahedron is a regular solid, we have the notion of opposite faces.) Note, the neighborhood structure N d in [3] has special properties, for one, N d (k) P(S) is a singleton for each configuration k; in particular, N d (k) itself is a configuration. More importantly, since we can identify S with C, N d :C C is actually a linear operator. Also, the transition probability in [3] given by P k,l (m) = { if N (k) =l 0 otherwise (3) shows that a configuration k in one trial (or a minute) changes to the configuration N d (k) in the next trial (or minute); one can say that the transition mechanism is deterministic. In [3], it was

shown that when the connection strength s d is defined by s d ({u, v})= { if u = v if u v 6 (4) and k =(k,k,..., k ), then after several trials (or minutes) the mth configuration Nd m (k) shall ( be approximately k i,..., k i ), an optimal configuration minimizing (). Note, with a suitable basis we can represent the linear operator N d :C C by the matrix 0 0 0 0 0 0 0 0 N d 0 0 0 0. 0 0 0 0 0 0 0 0 0 0 0 0 0 Let k =[k,k,...,k ] T be the column vector representing the temperatures of the faces of the dodecahedron on the first minute. Then, on the second minute the corresponding temperatures of the faces are the entries of the vector N d k. On the third minute, the temperatures are given by the vector N d (N d k)=nd (k). In general, on the (m +) th minute, the temperatures are given by Nd m k. A direct calculation reveals that lim N m m d =. (5) Hence, in each of the faces, the temperature will approach the average of the initial temperatures of all the twelve faces. 3

In this paper, we shall extend the results outlined in [3]. Let U = {u,u,...,u n } and let C = {{u i,u j } : i, j n} be a network representing a Boltzman machine, where n. For the connection strength s, we consider s({u, v})= { n n if u = v if u v. n (6) An optimal configuration k opt minimizing the consensus () with connection strength (6) necessarily and sufficiently minimizes n (k(u i ) µ(k)) (7) n over all configurations k where µ(k) = k(u n i ), see [3] for details regarding the case when n =. Next, we shall consider a neighborhood structure N whose values are not singletons but are infinite subsets of S. In particular, this implies that N is not a linear operator as was the case of the simulated annealing of the dodecahedron. Second, the transition probabilities P k,l (m) are stochastic and not deterministic as the one in (3). Let N =.. be the n-by-n matrix whose entries are all s. If necessary, we will emphasize the size of N by a subscript. For instance N is a -by- matrix. We say that an n-by-n matrix, Q, isdoubly stochastic if the entries of Q are nonnegative, and the sum of the entries in each row and column is. In the case of the simulated annealing of the dodecahedron, note that N d = (N 0 Q) where Q is the doubly stochastic matrix given by 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q =. (8) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 In extending the results outlined in [3], we define the neighborhood structure according to { N (k) = n c (N n cq) : Q is a doubly stochastic matrix and 0 c< n }. (9) For that part of the program in section describing the simulated annealing algorithm, we need to define a mechanism of generating a neighbor l N(k) ofk. Let {E r : r =,, 3,..., n!} be the 4

set of all n-by-n permutation matrices. Note, by a Theorem due to Birkhoff [, Theorem 8.7.], each doubly stochastic matrix Q can be expressed in the form n! where n! Q = α r E r (0) α r =,α r 0. This implies that given c< n, we can stochastically generate a doubly stochastic matrix Q by stochastically choosing an n!-tuple (α r ) and then constructing the matrix in (0). Furthermore, to implement the simulated annealing algorithm we shall have to define a transition probability. For one, let P k,l (m) =α () where l N(k) is given by l = ( n! N n c α r E r ). n c Suppose k is an initial configuration of a Boltzman machine (U, C). By realizing k as a column vector, then the configuration in the second trial is n c (N n c Q )k and the configuration in the third trial is ( n c (N n c Q ) )( n c (N n c Q ) ) k for some 0 c,c < n and doubly stochastic matrices Q,Q. In this paper, provided each c i satisfes 0 c i < n ɛ for some ɛ>0, we shall show that after several trials the m th configuration shall be approximately N n n(k), i.e., m lim m n c i (N n c i Q i )(k) = n N n(k). () As a special case, when n =, c i =, and each Q i is the matrix in (8) for all i, we obtain the case of the simulated annealing of the dodecahedron in [3], that is, m n c i (N n c i Q i )(k) = Nd m (k). In Appendix A we shall verify () and in Appendix B we provide an example using the software package GAP to simulate the annealing of a dodecahedron [5]. GAP (acronym for Groups, Algorithms, and Programming) is a software package suitable for use by undergraduates in advanced classes such as abstract algebra, linear algebra, and number theory. 3 Summary This paper generalizes the simulated annealing process of the dodecahedron in [3, 4]. We have chosen U = {u,u,...,u n } as the set of units and C = {{u i,u j } : i, j n} as the set of connections of a Boltzman machine. We assume that the connection strength of this neural network (U, C) is the one given by (6). Moreover, we have chosen the neighborhood structure N : S P(S) defined in (9). Given an initial configuration k, we generate a neighbor l N(k) by n! choosing stochastically or randomly an n!-tuple (α r ) of non-negative scalars satisfying α r =, α r 0 for all r and then by setting l to be the matrix in the right-hand side of (0). The transition probability P k,l we chose is the one in (). The choice for P k,l (m) is not crucial at this juncture since () shall hold for any other transition probability but we decided to have chosen one for the sake of completeness. 5

To simulate a controlled cooling process, we shall choose a sequence {c m : m =,, 3...} of real numbers satisfying lim c m = 0. The choice of this limit, namely 0, is motivated by the simulated m annealing of the dodecahedron. We quote from [3] some features of the simulated annealing of the dodecahedron: Initially, for each i {,..., } the face u i is labeled with the number k(u i ) and on the second minute the number on face u i becomes the average of the first minute s numbers except those that were on u i and on the face opposite u i, i.e. on the second minute the number on face u i is k(u 0 j ) where the sum is taken over all faces u j except the one on face u i and on the face opposite u i. j We see that when c =, the temperature on a face in the next trial or minute is given by the average of 0 temperatures of the current trial or minute. If we prescribe the temperature on a face in the next trial to be the average of the temperatures of the five neighboring faces (a face in a dodecahedron has five surrounding faces) in the current trial, then the value of c is 7. This suggests to us that a slow cooling process is simulated when the averaging of the temperatures begins with a few faces and then gradually increased to more faces. Thus, we assume that the sequence {c m : m =,, 3,...} in section describing the simulated annealing algorithm is decreasing and converges to 0. { In the general case, we shall show that the sequence of configurations, namely, k, n c (N n c Q )k, ( n c 3 (N n c 3 Q 3 ) )( n c (N n c Q ) ) k, } converges to N n n provided 0 c i n ɛ for some ɛ. Thus, if the initial configuration is k then after several trials the limiting configuration has a constant value at each unit, the value being the average of {k(u i ): i =,, 3,..., n}. For the simulated annealing, this implies that the temperatures evenly distributes over all the units. Note, we do not have any discussion about the rate of convergence which may be dependent on the choice of the transition probability. 4 Appendix A In this section we shall present a short proof supporting our claim (). Theorem Let n be an integer. Suppose 0 c i < n ɛ and Q i is a doubly stochastic matrix m for all i where ɛ is some positive real number. Then lim (N c i Q i )= m n c i n N. Proof. Notice that N = nn, and in general N l = n l N for each positive integer l. A direct computation reveals that NQ = QN = N for any n-by-n doubly stochastic matrix Q. Note, the product of two doubly stochastic matrices is again doubly stochastic; therefore, the entries of m m Q i are bounded. Rewriting the product (N c i Q i ), we obtain n c i = m n c i [(N c Q ) (N c m Q m )] 6

= m m N m c i N (m ) + c i c i N (m ) c i c i n c i i <i i <i <i 3

connections. In carrying out the calculations involved in the simulated annealing algorithm, we will use a software for computational discrete algebra, namely, GAP [5]. Given a configuration k of the network (U d, C d ), the set N (k) of neighbors of k is the set defined in (9). Let us consider the permutation matrices {E r : r =,..., 6} generated by the permutations (4, 8, ) and (8, ). This is a small subset of all -by- permutation matrices. This subset is actually a dihedral group. To stochastically generate a neighbor l N(k), see algorithm before the beginning of section and (0), we will randomly select a stochastic -by- matrix of the form 6 6 Q = α r E r where α r =,α r = i r 3 0, i r {0,,, 3}. Moreover, if l = n c (N n cq) then the transition probability shall be chosen according to (), that is, P k,l = α. If the initial configuration is k =[,, 3, 4, 5, 6, 7, 8, 9, 0,, ] and the sequence c i representing temperature which is slowly decreasing is given by c i = where i =,..., 0, then the i [ final output of the algorithm described just before section is approximately the configuration 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, ] 37 50 50 50 50 50 50 50 50 50 50 50 50. Note, the average of the numbers,.., is 6.5 which is approximately 37. This supports our claim, as stated in the theorem in appendix A, 50 that the sequence of configurations generated by the simulated annealing converges to an optimal configuration minimizing (7). The following are the commands we keyed in when we were using GAP. gap > g := Group((4, 8, ), (8, )); ; gap > gl := Elements(g); ; gap > Size(gl); 6 gap > L := [ ]; ; gap > for i in [0..3] do for j in [..3]do gap > if i <= j then Append(L, [i/j]); gap > fi; od; od gap > S := Set(L); ; T := T uples(s, 6); ; gap > c := [ ]; ; gap > for i in [..Length(T )] do gap > z := List([..6],j >T[i][j]); gap > if Sum(z) =then Append(c, [z]) ; fi; od; gap > cons := function(x) gap > local ave; gap > ave := Sum(List([..],j >x[j]))/; gap > return Sum(List([..],j > (x[j] ave) )); end; gap > r := [ ]; ; q := [ ]; ; l := [ ]; ; k := [,, 3, 4, 5, 6, 7, 8, 9, 0,, ]; ; gap > for i in [..0] do gap > j := i ; 8

References gap > coef := Random(c); gap > for m in [..] do gap > for p in [..6] do gap > r[p] :=coef[p] k[m (gl[p] ( ))]; od gap > l[m]:=sum(r); r := [ ]; gap > q[m] := (/( j)) (Sum(k) j l[m]); od; gap > if cons(q) <= cons(k) then k := q; gap > elif coef[] > AbsInt(Random([..]))/ then k := q; gap > fi; od; gap > final := [ ]; ; gap > for i in [..] do gap > final[i]:= Int(00 k[i])/00; od; gap > final; [37 ] 50 [] Aarts, E. and Korst, J., Simulated Annealing and Boltzman Machines, Wiley-Interscience Series in Discrete Mathematics and Optimization, John Wiley and Sons, 989. [] Horn, R.A. and Johnson, C.R., Matrix Analysis, Cambridge University Press, 990. [3] Reyes, E.N. and Steidley, C., A GAP Approach to the Simulated Annealing Algorithm, Computers in Education Journal of the American Society for Engineering Education, Vol. 7, No.3 (997), 43-47. [4] Reyes, E.N. and Steidley, C., A Theoretical Discussion of the Mathematics Underlying A GAP Approach to the Simulated Annealing Algorithm, to appear in the Computers in Education Journal of the American Society for Engineering Education. [5] M. Schönert et al., GAP-Groups, Algorithms, and Programming, Version 3 Release 4, Lehrstuhl D für Mathematik, RWTH, Aachen, Germany, 994. 9