DUAL-MODE DYNAMICS NEURAL NETWORKS FOR COMBINATORIAL OPTIMIZATION. Jun Park. A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL

Size: px

Start display at page:

Download "DUAL-MODE DYNAMICS NEURAL NETWORKS FOR COMBINATORIAL OPTIMIZATION. Jun Park. A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL"

Justin Weaver
5 years ago
Views:

1 DUAL-MODE DYNAMICS NEURAL NETWORKS FOR COMBINATORIAL OPTIMIZATION by Jun Park A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Electrical Engineering - Systems) August 1994 Copyright 1994 Jun Park

2 Acknowledgments First, I express my appreciation and respect to my advisor, Professor Sukhan Lee. Through the stimulating and productive discussions in detail, he has given me a lot of essential ideas, and guided me in the right direction to complete this dissertation successfully. Also, his inexhaustive passion and dedication to research gives me a notion of what kind of researcher I ought to be. It has been my great pleasure and privilege to have him as my advisor. I also would like to thank Professor Bart Kosko and Professor Behrokh Khoshnevis, my dissertation committee, for their constructive comments and valuable suggestions. Also, I express my thanks to Professor Keith Jenkins and Professor Ken Goldberg for serving on my qualifying committee. It has also been my pleasure to have so sincere and cooperative colleagues: Yeong Woo Choi, Chunsik Yi, Shunich Shimoji, Judy Chen, Carlos Luck, Soo Kwang Ro, Andrew H. Fagg and all the previous group members. I thank them all for their cooperation and encouragement. Special thanks should be given to Electronics and Telecommunications Research Institute for the nancial support for my study at University of Southern California. Finally, I would like to express my gratitute to all my family members. Their constant support and encouragement helped me very much to overcome various diculties during the period of my study. I wish to express my appreciation and love to my wife, Miran, and to my daughter and son, Dahyun and Jihyun. Especially, I hope I give a pleasure to my mother with this dissertation. ii

3 Contents Acknowledgments List of Figures List of Tables Abstract ii vi vii viii 1 Introduction Combinatorial Optimization Problem and Neural Network : : : : Related Works : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Approach of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : Organization of the Thesis : : : : : : : : : : : : : : : : : : : : : : 9 2 Dual-Mode Dynamics Neural Networks Network Conguration Space and Equilibrium Manifold : : : : : : : : : : : : : : : : : : : : : : : : : Network Structure : : : : : : : : : : : : : : : : : : : : : : : : : : Dual-Mode Dynamics : : : : : : : : : : : : : : : : : : : : : : : : : Discrete Model Dual-Mode Dynamics : : : : : : : : : : : : Continuous Model Dual-Mode Dynamics : : : : : : : : : : Symmetry Preserving Recurrent Backpropagation : : : : : Binary Value Solution vs. Continuous State Variable : : : : : : : Asymmetric Weight vs. Symmetric Weight : : : : : : : : : : : : : 36 3 Problem Solving with Dual-Mode Dynamics Neural Networks General Design Procedure : : : : : : : : : : : : : : : : : : : : : : N-Queen Problem : : : : : : : : : : : : : : : : : : : : : : : : : : : D2NN for the N-Queen Problem : : : : : : : : : : : : : : : Simulation Results and Discussions : : : : : : : : : : : : : Knapsack Packing Problem : : : : : : : : : : : : : : : : : : : : : 53 iii

4 3.3.1 D2NN for the Knapsack Packing Problem : : : : : : : : : Simulation Results and Discussions : : : : : : : : : : : : : Traveling Salesman Problem : : : : : : : : : : : : : : : : : : : : : D2NN for the Traveling Salesman Problem : : : : : : : : : Simulation Results and Discussions : : : : : : : : : : : : : 68 4 Conclusion Summary of the Research Contributions : : : : : : : : : : : : : : Suggestions for Future Research : : : : : : : : : : : : : : : : : : : 77 Appendix 79 A Convergence Property of Knapsack Packing D2NN 79 Bibliography 81 iv

5 List of Figures 1.1 The relation between the external objective function, network energy function, the state dynamics, and the weight dynamics a) in conventional approaches, and in) Dual-Mode Dynamics Neural Networks. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The network conguration space and the equilibrium manifold. : : The structure of Dual-Mode Dynamics Neural Network. : : : : : : The schematic view of the dual-mode dynamics based on the equilibrium manifold in the network conguration space. : : : : : : : : The hyperquadrant-to-vertex mapping : : : : : : : : : : : : : : : The state evolution in D2NN for two variable problem with three inequality constraints. : : : : : : : : : : : : : : : : : : : : : : : : Typical behaviors of a state dynamics for a 100 neuron network: a) with asymmetric weights and b) with symmetric weights. : : : The computational ow in the continuous model of D2NN. : : : : The computational cost to nd the solution for N-Queen problem. The solid line indicates the average number of the cumulative state dynamics iterations and the dashed line indicates the average number of the weight dynamics iterations out of 10 trials for each problem size. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The examples of solutions found be D2NN for 20 and 40 queen problems. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : For N-Queen Problem, the initial weight can be obtained from (2N? 1) (2N? 1) board. For 5 Queen, e.g., the weight for (1,1) unit is indicated by the solid box, and for (4,3) unit by the dashed box on 9 9 board shown above. Note that there are 8(N? 1) inhibitory weights out of (2N? 1) (2N? 1) board cells. The eects of the inhibitory and excitatory weights are balanced by the values given in the text. : : : : : : : : : : : : : : : : : : : : : : : D2NN structure for the knapsack packing problem. : : : : : : : : 58 v

6 3.6 Two representation scheme for the traveling salesman problem: a) the city-order representation b) the city-city representation. : : : The optimal tour for the given problem with 10 cities. : : : : : : : Four semi-optimal tours obtained by D2NN for the given traveling salesman problem with 10 cities. : : : : : : : : : : : : : : : : : : : Tours found by D2NN for the traveling salesman problems with 20 cities. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 72 vi

7 List of Tables 3.1 The computational cost to nd solutions for each size of N-Queen problem. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Comparison of the random initial weight assignment and the heuristic assignment for 8 and 16 Queen problems. The learning rate () is set to 0:01 for all the cases. : : : : : : : : : : : : : : : : : : : : The comparison of the D2NN with other neural network approaches for the N-Queen problem. The performance is compared in terms of the success rate to nd the solution along the number of Queens The computational cost of D2NN to nd optimal solutions for the knapsack packing problem. : : : : : : : : : : : : : : : : : : : : : : Comparison of D2NN with the greedy algorithm and Hellstrom & Kanal's approach in terms of the rate to nd the optimal solution Comparison of performance and computation time(sec) for the different approaches on n = m = 30 problems. The simulation has been performed on Sun 4/50 workstation and the data with the asterisk( ) are from Ohlsson et al.[31]. : : : : : : : : : : : : : : : The computational cost to nd a solution for the traveling salesman problem with 10 cities. The results are obtained with the time limit of 2000 weight dynamics for the target cost 2.75 and of 1000 weight dynamics for the target cost 2.0. : : : : : : : : : : : : The computational cost to nd a solution for two traveling salesman problem instances with 20 cities. : : : : : : : : : : : : : : : : 71 vii

8 Abstract This thesis presents a new approach to solving combinatorial optimization problems based on a novel dynamic neural network featuring a dual-mode of network dynamics, the state dynamics and the weight dynamics. The network is referred to here as the dual-mode dynamics neural network (D2NN). The combinatorial optimization problem usually has a huge number of elements in its conguration space, so that we cannot explore them exhaustively. Recently, the neural network approaches have been studied for the solution of the combinatorial optimization problem. The computational characteristic of neural network{the distributed and collective computation over a massively parallel architecture, emulating nonlinear dynamics{has invoked high expectation to overcome the curse of combinatorial search complexity in optimization. Several eective approaches have been applied to various combinatorial optimization problems, and have shown that the preliminary results are promising. There are two major diculties, however, in the neural network approaches to optimization problems. First, the objective function for a given problem must have the form that can be mapped onto the network, and secondly, due to the local minima problem, the quality of the solution is quite sensitive to various factors, such as the initial state and the parameters in the objective function. The proposed scheme overcomes these diculties 1) by maintaining the objective viii

9 function separately from the network energy function, rather than mapping it onto the network, and 2) by introducing a weight dynamics utilizing the objective function to avoid the local minima problem. The state dynamics denes state trajectories in a direction to minimize the network energy specied by the current weights and states, whereas the weight dynamics generates weight trajectories in a direction to minimize a preassigned external objective function at a current state. D2NN is operated in such a way that the two modes of network dynamics alternately govern the network until an equilibrium is reached. The D2NN has been applied to N-Queen problem, the knapsack problem, and the traveling salesman problem and indicates the superior performance. ix

10 Chapter 1 Introduction 1.1 Combinatorial Optimization Problem and Neural Network The central problem in the engineering eld is the optimization problem. Its concern is to nd the best conguration or a set of parameters to achieve some goal. If the variables in the optimization problem are discrete rather than continuous, we call it the combinatorial optimization problem. In the combinatorial optimization problem, the number of elements in the conguration space is factorially large, therefore, we cannot explore them exhaustively. For example, in the traveling salesman problem with 30 cities, the number of feasible tours is approximately Dierent heuristics have been devised for dierent problems to nd a good solution, rather than the globally optimal solution. Recently, articial neural networks have been applied to solve combinatorial optimization problems. In their pioneering work, Hopeld and Tank [18] showed a feasibility to solve combinatorial optimization problems with neural networks. 1

11 For the Traveling Salesman Problem (TSP) - one of the classical combinatorial optimization problem, they mapped a properly dened objective function onto the Hopeld network with symmetric weights and no self-loop, and showed that the solution can be computed collectively as the network dynamics works on. The underlying idea in Hopeld and Tank's work has been adopted from the associative memory model [16, 17, 22, 23]. Given a new pattern, the associative memory responses by producing one of the stored patterns which most closely resembles the given pattern. The mechanism of this process can be explained by the following. First, each of the stored patterns is placed at one of the local minima of the network energy function. Secondly, as the network dynamics works on, the state of the network goes down along the energy function surface, from the initial state corresponding to the given input pattern. Then, nally, the state converges to a local minimum of the network energy which is near the initial state. Note that the network energy is minimized while retrieving the stored pattern. This characteristic of energy minimization in the associative memory is adopted by Hopeld and Tank. For a given optimization problem, a properly dened objective function is mapped onto the network in such a way that the objective function is minimized as the network dynamics works on. A state at equilibrium is expected to represent a good quality of solution, although not necessarily the globally optimal solution. The computational characteristic of neural networks{the distributed and collective computation over a massively parallel architecture, emulating nonlinear dynamics{has invoked high expectation to overcome the curse of combinatorial search complexity in optimization problems. And thus, much eort has been devoted to obtain a neural network solution for a wide variety of combinatorial 2

12 optimization problems, including the traveling salesman problem [6, 44, 33], the Hamiltonian cycle problem [30, 38], the knapsack problem [15, 31], the N-Queen problem [3, 39, 40, 41], the scheduling problem [10, 13, 49], the graph partitioning problem [32, 36, 43], etc. 1.2 Related Works Most neural network approaches to combinatorial optimization, in general, adopt the following steps: 1) a formulation of an objective function representing both the cost to be minimized and the constraints to be satised, and 2) an assignment of proper weights to the network in such a way that the resulting network dynamics makes the network states converge to the minimum of the network energy function (representing a solution), as shown in Figure 1.1 a). When we formulate an objective function for a given problem, we should make the objective function have a certain form which can be mapped onto the network. In case that we want to map the problem onto the Hopeld network, the objective function should have the same form as the network energy function 1 of the Hopeld network as: E =? 1 2 NX NX NX w ij x i x j? w i0x i ; (1.1) i=1 j=1 i=1 where x i ; i = 1; : : : ; N is the output of the ith unit, and w ij is the weight between the ith and the jth units, and w i0 is the bias term of the ith unit. For many 1 We refer to the network energy function as the function of the state and the weights as in (1.1), and refer to the external objective function (or simply objective function) as the unconstrained function (the sum of penalty functions associated with constraints and the optimization measure) to be minimized for the given optimization problem. 3

13 optimization problems, however, it is dicult or even impossible to build an objective function as in (1.1). For example, an inequality constraint, a common component in optimization problems, is hard to express with the form given by (1.1). Even when we can nd an objective function in a proper form, we still have to select a proper set of parameters. The objective function usually has a set of coecients which determines the relative weighting among components in the optimization problem. Depending on these coecients, the network energy landscape is determined, and thus the performance of the network is highly dependent on the selection of these coecients. These coecients are usually selected by trial and error. As the problem size grows, however, it becomes very hard to nd a suitable set of coecients. We have another crucial problem, so called, the local minima problem. The objective function formulated in the form in (1.1) generally has many local minima. While these local minima have been utilized in the associative memory model, they may cause a critical problem in solving optimization problems, since the nal state stuck into a local minimum often represents a poor quality of solution or even an invalid solution. Also, the performance of the network is quite dependent on the selection of the initial state. If we start with an initial state which is placed in the basin of the global or the near-global minimum of the network energy function, the nal state will represent a solution of good quality. Otherwise, a poor solution is likely to be obtained [47]. To overcome this local minima problem, several eective approaches have been reported. Earlier, simulated annealing [21, 45] has been devised for solving combinatorial optimization problem. The key idea is from the analogy with 4

14 statistical thermodynamics. When liquid freezes and crystallizes with very slow cooling, the material structure achieves the minimum state of the thermodynamic energy. This is because there are always chances for the material structure to escape from the local minima with the help of thermal noises. Similarly, the state or conguration in simulated annealing is rearranged not only to decrease the objective function, but also to move in a direction to increase the objective function with some probability. As the temperature goes down, the probability to increase the objective function gradually reduces, and the nal system state is expected to reach the globally minimum or a semi-minimum of the objective function. In Mean eld annealing [44, 43, 32], simulated annealing is combined with the Mean Field network which is equivalent to the Hopeld network [4]. At the high articial temperature, the surface of the Hopeld network energy is smoothed out and the basins of local minima tend to disappear. The state of the network is likely to approach the vicinity of the global minimum. As the temperature goes down, the energy function recovers its original shape and the state is expected to converge to the global or near global minimum. The tabu learning [6, 12] is another approach for solving non-convex optimization problems. In the tabu learning, the auxiliary energy function is added to the Hopeld network energy. This auxiliary energy function is continuously increasing in a neighborhood of the current state, and thus penalizing the states that have already been visited. If the state is stuck to a local minimum, the auxiliary function around that minimum begins to increase and pushes the state out of that local minimum toward the space not yet visited. Besides the above approaches, many ideas have been proposed to improve 5

15 the performance of the neural optimization networks. To handle the inequality constraint, Tagliarini and Page [39] used the slack variables to transform the integer inequality constraints into the integer equality constraints, and Abe [1, 2] proposed the slack variables with a special activation function to handle the noninteger inequality constraints. Metha and Fulop [30] derived some conditions for the coecients in the objective function to produce the valid solution in solving the Hamiltonian cycle problem. Sun and Fu [38] proposed a algorithmic method which used the coordinate Newton method to speed up the computation time to reach the valid solution. Xu and Tsai [48] proposed the city-city representation scheme for the traveling salesman problem and combined OPT2 algorithm to solve the subtour problem. Dierent mapping schemes have also been proposed for dierent problems [11, 19]. Although these approaches are shown to be eective for improving the quality of solution, there still remain several problems. Due to the constraint for formulating the objective function, it is hard to apply the neural network approaches to some class of problems. For example, the optimization problems with arbitrary inequality constraints or with high-order optimization measure are not easily mapped onto the network. Also, the quality of solution is very sensitive to various factors. The annealing process usually takes a long computation time, and we need a well devised annealing schedule to get a good solution. When we add the auxiliary function on the network energy function to avoid the local minima problem as in the tabu learning, we should select carefully a set of parameters to control the auxiliary function. Otherwise, the nal solution may represent a poor quality of solution, even though it is good for modied network energy function. Besides, the coecients in the objective function and the initial 6

16 state have a crucial inuence on the quality of the nal solution. Unfortunately, there are no guidelines to determine suitable values for these factors. 1.3 Approach of the Thesis This thesis presents a new approach to the solution of combinatorial optimization problems based on Hopeld type recurrent neural networks, focusing on the aforementioned local minima problem. In the proposed approach, we design the network dynamics to be governed not only by the state dynamics but also by the weight dynamics. Thus, the network is named \the Dual-Mode Dynamics Neural Network (D2NN)" [25, 27, 24, 26, 28]. In Dual-Mode Dynamics Neural Network, the external objective function for a given optimization problem is not mapped onto the network but maintained separately from the network energy function. The weight dynamics is introduced to avoid the local minima problem, and is guided by the external objective function. In other words, there exist two kinds of energy functions in D2NN: the external objective function which is specic to the given optimization problem and the network energy which is a function of the network state and the weights. Also, there are two types of dynamics: the state dynamics which is governed by the network energy function and the weight dynamics which is governed by the external objective function, as shown in Figure 1.1b). The state dynamics is same as the Hopeld network dynamics. With the symmetric weight, the state dynamics is guaranteed to converge to an equilibrium [14, 16, 20]. The weight dynamics is set in such a way as to drive the network states in a direction to minimize the external objective function whenever 7

17 Figure 1.1: The relation between the external objective function, network energy function, the state dynamics, and the weight dynamics a) in conventional approaches, and in) Dual-Mode Dynamics Neural Networks. 8

18 the state dynamics reaches an equilibrium. The repetition of the state dynamics and the weight dynamics leads to a solution, since the weight dynamics provides a means of escaping from a local minimum of the network energy function by changing the network energy prole, and pushes the equilibrium state of the state dynamics toward the minimum of the external objective function. 1.4 Organization of the Thesis The thesis is organized as follows: Chapter 1 reviews the neural optimization networks and describes the problem statement as well as the approach of the thesis. Chapter 2 describes the Dual-Mode Dynamics Neural Network in details. First, the fundamental idea is claried through the discussion in the framework of the network conguration space and the equilibrium manifold. Then, the discrete and continuous model of Dual-Mode Dynamics Neural Network are described. Also, the issues of the binary solution vs. the continuous state variable and asymmetric weights vs. symmetric weights are discussed. Chapter 3 presents the problem solving with the Dual-Mode Dynamics Neural Network. After describing the general procedure to design the Dual-Mode Dynamics Neural Network for a given problem, the details of the Dual-Mode Dynamics Neural Networks for the N-Queen problem, the knapsack packing problem, and the traveling salesman problem are explained. Simulation results on these problems are presented and discussed as well. Finally, Chapter 4 summarizes the thesis contributions and proposes the future research issues. 9

19 Chapter 2 Dual-Mode Dynamics Neural Networks 2.1 Network Conguration Space and Equilibrium Manifold Let us consider a recurrent network represented by the following dynamics: nx _u i =?u i + w ij x j + i (2.1) j=1 x i = f(u i ) 1 = for i = 1; : : : ; n; (2.2) 1 + e?u i where n is the number of neurons of the network, and u i and x i represent respectively the state and output of the ith neuron. With a xed set of symmetric weights, it has been proven that the network dynamics, (2.1), is guaranteed to converge to an equilibrium state [7, 20, 17]. The network dynamics, (2.1), can 10

20 also be interpreted in terms of the network energy, E, in such a way that Equation (2.1) describes the evolution of network state along the surface of the network energy function, E, in a direction to minimize the energy and reach the bottom of the basin including the initial state. Equation (2.1) indicates that the network energy function, E, is a function of the state as well as the weight of the network. Figure 2.1a illustrates the network energy function, E, dened over the Cartesian product of the state and weight space, fwxg, called the network conguration space. Note that the state dynamics of a network depends on the selection of a particular set of network weights. In general, network dynamics can be characterized, at a network conguration, in terms of both weight dynamics and state dynamics, based on the network energy function dened over the network conguration space. To apply the general network dynamics consisting of weight and state dynamics to optimization problems, we consider only, so called, the equilibrium manifold of a network. The equilibrium manifold of a network is dened in the network conguration space as a set of points representing the steady states of the network corresponding to given weights, and is represented by the trace of the bottom of valley of the network energy function as shown schematically in Figure 2.1b. In Figure 2.1b, it is shown that the performance of most conventional approaches to optimization based on neural networks is sensitive to the selection of network weights and initial states. For instance, in Figure 2.1b, let x opt represent the optimal solution to be obtained for a given problem. Obviously, to obtain x opt at the network equilibrium, it is necessary that we select w B as the network weight. But, to assign w B as the network weight is not sucient for obtaining 11

21 Figure 2.1: The network conguration space and the equilibrium manifold. 12

22 the solution, since, for example, if the initial state is given at P, the network will settle down at T instead of S. For conventional approaches to be successful for optimization, the proper assignment of network weights and initial states seems essential. Unfortunately, the systematic way of assigning proper weights and initial states to a network is yet to be established for optimization. The dual-mode dynamics neural network (D2NN) is proposed to solve the problem of weight and initial state assignment associated with conventional approaches for optimization by combining the state dynamics with the weight dynamics. In D2NN, the network can start with arbitrarily chosen weights and initial state. That is, although the network starts with P or Q in Figure 2.1b, which dictates the network to reach T or R, respectively, at the equilibrium of state dynamics, D2NN allows the network to be evolved toward S (representing x opt ) along the equilibrium manifold by automatically modifying network weights and initial states through the weight dynamics and state dynamics. 2.2 Network Structure The structure of Dual-Mode Dynamics Neural Network(D2NN) is shown in Figure 2.2. D2NN is composed of two layers: the base layer and the supervisory layer. The base layer consists of a set of base units with symmetric connections among them. A base unit is either a visible unit or a hidden unit. For a given problem, the conguration space (or the solution space) is mapped into a set of visible units, and hidden units help the visible units to produce the desired solution. The supervisory layer consists of a set of supervisory units without intra-layer connections. One supervisory unit is assigned to each constraint in 13

23 the given problem. For the objective measure to be optimized, we set a target value to achieve, and treat it as one of the constraints. The connection between the supervisory unit and the visible unit is determined by the corresponding constraints, and does not change during computation. The external objective function is formulated with the supervisory units so that it has the minimum when all the constraints are satised. Depending on the external objective function, the connection between the base layer and the supervisory layer can be of a higher order, and we can select the desired form of the activation function at the supervisory unit. The base layer is same as the Hopeld network, and thus, the state dynamics which governs the base layer is guaranteed to converge to an equilibrium with symmetric weights. At the equilibrium of the state dynamics, each supervisory unit examines the visible units connected to it to see whether the corresponding constraint is satised or not. If not satised, the weight dynamics changes the weight in the base layer in a direction to reduce the external objective function, while maintaining the symmetry of the weight. The state dynamics and the weight dynamics govern the network alternately and leads to a solution, since the weight dynamics changes the network energy prole, and thus, pushes the equilibrium state of the state dynamics toward the minimum of the external cost function. 14

24 Figure 2.2: The structure of Dual-Mode Dynamics Neural Network. 15

25 2.3 Dual-Mode Dynamics Discrete Model Dual-Mode Dynamics In the discrete model dual-mode dynamics neural network, we use a discrete Hopeld network as the base layer. Therefore, the state dynamics which governs the base layer is the discrete Hopeld network dynamics. Let x i ; i = 1; : : : ; N be the output of the base unit i, where N is the number of the base units. Also, let w ij be the weight between x i and x j, and w i0 be a bias term for the base unit i. The state dynamics of the base unit is: x i = (I x i ) = ( X j w ij x j + w i0); for i = 1; : : : ; N; (2.3) where I x i is the input of ith base unit and () is the binary activation function of the base unit, i.e., (x) = 8>< >: 1 if x 0; 0 otherwise. With the no self-loop, symmetric weight which is randomly or heuristically assigned at initial, the state dynamics is guaranteed to converge to an equilibrium by an asynchronous operation, At equilibrium, the output of the supervisory unit is: s k = k (x) for k = 1; : : : ; K; (2.4) where K is the number of the supervisory units and k () represents the relationship between the kth supervisory unit s k and the base units, x in the corresponding constraint. The external objective function is dened on the supervisory layer 16

26 by: C = KX C k (s k ) = KX k=1 k=1 C k ( k (x)) (2.5) where C k () is the component of the objective function associated with each constraint. At the equilibrium state of the state dynamics, the weight dynamics is invoked, and updates the weight in the direction to decrease the objective function. At rst, we dene s k, x i by: s k x i 4 = dc k for k = 1; : : : ; K; (2.6) ds k 4 x i KX k=1 dc k ds i dx i di x i for i = 1; : : : ; n; (2.7) where n is the number of visible units. Note that x i is equal to 0 for i = (n+1); : : : ; N, since the hidden units are not directly connected to the supervisory unit. For the binary threshold activation function (), it is intractable to compute dx i =di x i as it is. We treat () as a linear activation function and set 0 () to 1 in (2.7). This is a reasonable approximation since both are nondecreasing functions. That is, for a desired 4x i, 4I x i should have the same sign of 4x i in both functions 1. Therefore, x i = KX k=1 s i : (2.8) In case the supervisory unit s k is a linear combination of the base unit output, 1 This approximation is more eective than the approximation with the sigmoid function, since it gives rise to the larger jw ij j when the input to a neuron is deeply saturated. The same idea has been applied to the error backpropagation learning to speed up the learning process [46]. 17

27 i.e., s k = X i w sx ki x i ; (2.9) then, x i = X k s k w sx ki : (2.10) Using (2.6) through (2.8), the equations for the weight dynamics are obtained as: 4w ij ij x x ij x? ( x i x j + x j x i); (2.11) 4w i0 i0? x i (2.12) where (> 0) is the learning rate. The overall operation of the discrete model of D2NN is as follows: Step 1. Initialization 1. Assign randomly (or heuristically) the weight in the base layer. 2. Select the initial state randomly. Step 2. Dual-Mode Dynamics 1. Run the base layer network asynchronously by (2.3) until an equilibrium is reached. 2. If the equilibrium state represents a solution, then go to Step 3. 18

28 3. Update the weight by (2.6) through (2.12). 4. If the time limit is not expired, then go to Step 2. Step 3. Stop Continuous Model Dual-Mode Dynamics In this subsection, we develop the continuous model dual-mode dynamics which claries the theoretical aspects of dual-mode dynamics neural networks. We rst derive the weight dynamics equation for the continuous model, and then, discuss its geometrical interpretation in the framework of the network conguration space and the equilibrium manifold. In the continuous model, the base layer is the continuous Hopeld network. Therefore, the state dynamics is same as the general dynamics for recurrent networks in (2.1), but with a xed symmetric weights, i.e., w ij = w ji. At equilibrium of the state dynamics, all _u i become zero, and thus, the equilibrium manifold equation is: u = W x + (2.13) where u; x; are the vectors of the state, output and bias, respectively, and W is the weight matrix. Let u be the state variation due to suciently small variations of weights and biases, W and. Then, u + u = (W + W )(x + x) + + : (2.14) 19

29 From (2.13) and (2.14), disregarding the O( 2 ) term, (I? W G)u = W x + (2.15) where G = diagfdx i =du i g = diagfx i (1? x i )g, and x i (dx i =du i )u i, i.e., x Gu. When the current equilibrium state does not represent the desired solution, we can obtain the desired state variation, u, in the direction to minimize the objective function, based on the gradient of the objective function with respect to the state. Then, with (2.15), we can get the weight variation, W, which achieves the desired state variation, u at the next equilibrium. However, we have to keep the symmetry of W to guarantee the convergence of the state dynamics. Let us rearrange the right side of (2.15) with the vectorized form of the upper triangular elements in W, w v as follows: = K w v (2.16) where = (I? W G) u; (2.17) K = [K 1 K 2 : : : K i : : : K n I n ]; (2.18) 20

30 K in(n?i+1) = x i x i+1 x i+2 x n 0 x i x i x i ; (2.19) w v = [w T v 1 : : : w T v i : : : w T v n T ] T ; (2.20) w vi = [w ii ; w i(i+1) ; : : : ; w in ] T ; (2.21) and I n is the n n identity matrix. Equation (2.16) is under-determined, that is, there are (n(n + 1)=2 + n) variables and n constraints. Thus, we may have the innite number of solutions. However, we assume a small weight variation in deriving (2.16). So, we choose the pseudo-inverse solution, which gives the minimum length of w v, kw v k. Let K + be the pseudo-inverse of K. Then, w v = K + (2.22) = K T (KK T )?1 : (2.23) Note that KK T is always invertible, because the rank of K is always n due to the last component,i n. This pseudo-inverse solution of w v requires the matrix inversion of (KK T ). But, from (2.18) and (2.19), KK T = diagfs? x 2 i g + xxt (2.24) where S = P i x 2 i + 1, and all the diagonal elements in diagfs? x 2 i g are positive. 21

31 Thus, by applying the general inversion identity: [A + ab T ]?1 = A?1? A?1 ab T A?1 1 + b T A?1 a ; we can compute (KK T )?1 easily as: (KK T )?1 ij = ij S? x 2 i? 1 D x i x j (S? x 2 i )(S? x 2 j) ; (2.25) where D = 1 + P i x 2 i S?x 2 i Then, from (2.18), (2.19) and (2.23), w ii w i(i+1) w i(i+2). w in =. Let L = (KK T )?1 = [L 1 ; L 2 ; L 3 ; ; L n ] T x i x i+1 x i x i+2 0 x i x n 0 0 x i L 1 L 2 L 3. L n (2.26) and n = L 1 L 2. L n : (2.27) From (2.26) and (2.27), the nal symmetry preserving weight update rule is 22

32 obtained as follows: w ij = 8 >< >: x i L j + x j L i x i L i i 6= j i = j i = L i : (2.28) Figure 2.3 illustrates the dual-mode dynamics based on the equilibrium manifold in the network conguration space. In Figure 2.3, the w 1 -axis and w 2 -axis represent the weight space, and u-axis represents the state space. Also, the curved surface represents the equilibrium manifold. Let A(w A ; u A ) represent a current network conguration at equilibrium and u d be the desired state variation which is computed from the objective function. Then, the question is how to obtain the weight variation, W, which achieves u d at the next equilibrium. So, we rst approximate the equilibrium manifold around A by a tangential hyperplane (Eq. (2.15)). Since the dimension of the weight space is much higher than that of the state space, there are innite number of solutions (represented by L sol in Figure 2.3) to achieve u d in this linearized equilibrium manifold. We choose a pseudo-inverse solution, S(w B ; u A + u d ), which gives rise to the minimum weight variation, i.e., min k w k, among them (Eq. (2.23)). Then, with the new weight, w B, and the current state, u A, the state dynamics starts again and moves the network conguration from S(w B ; u A ) to B(w B ; u B ) which belongs to the real equilibrium manifold at next equilibrium. Note that, with a suciently small k u d k, the next state, u B, is always obtained in the same direction with u d. State Variation of Hidden Units For a given optimization problem, one neuron which is called a visible unit is 23

33 Figure 2.3: The schematic view of the dual-mode dynamics based on the equilibrium manifold in the network conguration space. 24

34 assigned to one variable in the solution space. There may be also extra neurons which do not correspond to variables, so called hidden units. Since the hidden unit is not assigned a variable, and thus, its output is not included in the objective function, the desired variation for a hidden unit is not automatically given from the gradient of the objective function. Hence, there exists freedom to select the hidden unit variation, which can be utilized to achieve certain desirable performance. The simple way to x the hidden unit variation is to set them all to zero. By separating the visible unit and the hidden unit in (2.17), = I v 0 0 I h ? 4 W vv W vh W hv W hh G v 0 0 G h C A 6 4 u v u h (2.29) = A v u v + A h u h ; (2.30) where A v = I v? W vv G v?w hv G v ; A h = 2 6 4?W vh G h I h? W hh G h ; (2.31) and the subscripts, v and h, stand for the visible unit and the hidden unit, respectively, and u v and u h represent the variations of the visible units and hidden units, respectively. With u h = 0, w v = K + j uh =0 (2.32) = K T (KK T )?1 A v u v : (2.33) In this case, we actually minimize the norm of the state variation. That is, the 25

35 squared norm of the state variation is: ku k 2 = ku v k 2 + ku h k 2 ; (2.34) and u v is given from the gradient of the objective function. Therefore, ku k 2 is minimized with u h = 0. Since the weight variation is a function of hidden unit variations, we can also select the hidden unit variation to minimize the weight variation. Let us dene V as: V = 1 2 kw v k 2 (2.35) Then, V = 1 2 [KT (KK T )?1 ] T [K T (KK T )?1 ] (2.36) = 1 2 T (KK T )?1 : (2.37) Note that V is a quadratic function of u h as well as of. So, by setting the gradient of V with respect to u h to zero, i.e., A T h (KKT )?1 (A v u v + A h u h ) = 0; (2.38) we can select u h to minimize V as: u h =?[A T h (KKT )?1 A h ]?1 A T h (KKT )?1 A v u v : (2.39) 26

36 With this hidden unit variation, the resultant weight variation is w v = K T (KK T )?1 [I? A h (A T h (KKT )?1 A h )?1 A T h (KKT )?1 ]A v u v : (2.40) By both the state variation and the weight variation, the network changes its conguration along the equilibrium manifold in the network conguration space. So, we can select the hidden unit variation which minimizes the variation of network conguration in the network conguration space. Let us modify V in (2.35) to include the state variation as: V = 1 2 kw v k ku k2 : (2.41) Then, the resultant weight variation in (2.40) is also slightly modied as: w v = K T (KK T )?1 [I?A h (I h +A T h (KK T )?1 A h )?1 A T h (KK T )?1 ]A v u v : (2.42) The introduction of hidden units can be helpful since it accompanies with the increase of the number of weights by O(n 2 ), while with the increase of the number of state variations to achieve by O(n). Since the pseudo-inverse solution in (2.23) gives the minimum weight variation for a given state variation, we will use (2.33) for simulations in the later chapters, since it does not require the matrix inversion of (A T h (KK T )?1 A h ) or (I h + A T h (KK T )?1 A h ) as in (2.40) or (2.42) Symmetry Preserving Recurrent Backpropagation Pineda [34, 35] and Almeida [5] have independently pointed out that backpropagation can be extended to arbitrary networks, and developed the backpropagation 27

37 algorithm for recurrent neural networks, called recurrent backpropagation. The goal of this recurrent backpropagation is conceptually same as that of the weight dynamics in D2NN. With recurrent backpropagation, we try to change the weight to minimize the error function, while with the weight dynamics, we try to update weights in the direction to minimize the objective function which is equivalent to the error function in recurrent backpropagation. So, recurrent backpropagation can be also used as the weight dynamics in D2NN. Since recurrent backpropagation has been derived on the assumption that networks always converge to stable states, we have to guarantee the state dynamics stability all the time. With general asymmetric weights, however, it is hard to maintain the state dynamics stability, as will be discussed in Section 2.5. Therefore, by imposing the condition of symmetric weights, we modify the original recurrent backpropagation to derive the symmetry preserving recurrent backpropagation as follows. For convenience, let us rewrite the network dynamics in (2.1) and (2.2): _u i =?u i + x i = f(u i ) nx j=1 w ij x j + i (2.43) 1 = for i = 1; : : : ; n: (2.44) 1 + e?u i With the symmetric weight, the network always converges into a xed point, and the equilibrium manifold equation is: u i = X j w ij x j + j for i = 1; : : : ; n: (2.45) The goal is to adjust the weight so that the next equilibrium state be formed 28

38 along the direction to decrease the objective function. This is accomplished by computing the gradient of the objective function with respect to the weight, and by updating the weight in the anti-parallel direction of that gradient, that is, w rs =? X i C rs (2.46) where C i f 0 (u i ) (2.47) and C is the objective function. To i =@w rs, let us dierentiate the equilibrium manifold equation in (2.45) with respect to w rs (= w sr ) to rs = ir x s + is x r + X j w ij f 0 (u j rs : (2.48) where ij is the Kronecker delta, i.e., ij = 1 if i = j; 0 otherwise. Collecting terms, this can be written as where X j L rs = ir x s + is x r (2.49) L ij = ij? w ij f 0 (u j ): (2.50) Inverting the linear equations, (2.49) rs = (L?1 ) kr x s + (L?1 ) ks x r : (2.51) 29

39 By substituting (2.51) into (2.46), we obtain w rs =? y r x s? y s x r (2.52) where y r = X k y s = X k C k (L?1 ) kr (2.53) C k (L?1 ) ks : (2.54) Equation (2.52) species the symmetry preserving weight update rule, and requires the matrix inversion to get y r and y s in (2.53) and (2.54). However, we can undo the inversion in (2.53) and (2.54), and obtain linear equations for y k X k L ki y k = C i (2.55) or using (2.50), y i = X k w ki f 0 (u i ) y k + C i : (2.56) This equation has the same form as the original equilibrium manifold equation in (2.45), and can be solved in the same way, by the evolution of an auxillary network with a state dynamics analogous to (2.43): _y i =?y i + nx k=1 w ki f 0 (u i )y k + C i : (2.57) The auxillary network has the same topology as the original network, with the connection w ij from the jth unit to the ith unit replaced by w ji f 0 (u i ), a simple linear activation function f(y) = y, and a bias term C i. Almeida [5] showed that 30

40 the convergence of the original network is a sucient condition of the convergence of the auxillary network. Note that the convergence of the original network is guaranteed by the symmetry of the weight. The whole computational ow is thus: 1. Relax the original network with (2.43). 2. Calculate C i in (2.47) from the objective function C. 3. Relax the auxillary network with (2.57) to nd y i. 4. Update the weight using (2.52). 2.4 Binary Value Solution vs. Continuous State Variable In combinatorial optimization, the solution space is of binary values, while the state variable is continuous in the computation process of the neural optimization network. When we apply the neural network approaches to combinatorial optimization, we actually try to get binary value solutions through a continuous state space. In the discrete model of D2NN, the output of each neuron in the base layer is automatically of binary value because the binary threshold function is used as the activation function. Meanwhile, in the continuous model of D2NN, we need some schemes to get binary value solutions from continuous state variables. Hopeld and Tank used a sigmoid function with a steep slope as an activation function of neurons so that the output of each neuron was expected to be formed near 1 or 0 at equilibrium. However, with a xed steep slope in the activation 31

41 function, the state dynamics loses easily its momentum, and the state tends to stay at one of local minima or on a plateau of the network energy function. Therefore, the nal state usually represents a solution of poor quality, as reported by Wilson and Pawley [47]. The annealing process helps to avoid the above phenomena to some degree. While the state dynamics runs actively with the initial high temperature, the state dynamics eectively becomes discrete as the temperature goes towards zero, and thus, the nal state will be of binary value. The variation of temperature should be carefully scheduled since it has a crucial inuence on the quality of the nal solution. In practice, a good annealing schedule is problem-dependent, and is usually obtained by try-and-error, which takes rather a long a priori processing time. In the continuous model of D2NN, we use the hyperquadrant-to-vertex mapping to get the binary value solution from the continuous state variable. The hyperquadrant-to-vertex mapping is formally dened as: Hyperquadrant-to-vertex mapping: A point in the n dimensional unit hypercube, x = [x 1 x 2 x n ]; x i 2 [0; 1] for i = 1; : : : ; n, is mapped to x B = [x B 1 x B 2 xb n ], where x B i = 8 >< >: 1 if x i 0:5, 0 otherwise for i = 1; : : : ; n. An example of the hyperquadrant-to-vertex mapping is illustrated in Figure 2.4 for the two dimensional case. With this hyperquadrant-to-vertex mapping, each vertex of a unit hypercube represents all the points in the hyperquadrant 32

42 Figure 2.4: The hyperquadrant-to-vertex mapping 33

43 which contains that vertex. We apply the hyperquadrant-to-vertex mapping when we check whether a state represents a solution or not. In other words, the vertex which is obtained from the current output of neurons by the hyperquadrant-to-vertex mapping is tested to see if it satises all the constraints for the given problem. Thus, the goal of computation is slightly modied as to search a hyperquadrant which contains the vertex representing the solution, rather than to search the vertex of the binary value solution itself. This saves a considerable computation time since the state is expected to evolve into any point in the hyperquadrant which corresponds to the solution, rather than to the vertex itself representing the solution. The hyperquadrant-to-vertex mapping is also used in the computation of the desired state variation. In case we use the gradient of the objective function at the current output of neurons, it may happen that the desired state variation is zero even though the current output of neurons or the corresponding vertex does not represent the solution. And thus, there would be no way to guide the weight dynamics. Therefore, when we calculate the state variation, we use the the gradient of the objective function at the vertex which corresponds to the current output of neurons, rather than at the current output itself. Then, if the state evolves into a hyperquadrant whose vertex does not represent the solution for the given problem, the state will be pushed out to another hyperquadrant along the opposite direction of the gradient of objective function at the vertex. Meanwhile, there is no repulsive impetus in the hyperquadrant whose vertex represents the solution. So, the state moves through the equilibrium manifold, governed by the gradients at the vertices, and the weight dynamics will stop when the state evolves into the hyperquadrant which contains the vertex 34

44 Figure 2.5: The state evolution in D2NN for two variable problem with three inequality constraints. 35

45 representing the binary value solution. Figure 2.5 illustrates the state evolution in D2NN for the simple two dimensional problem. Based on three inequality constraints which are represented as the shaded area, the objective function is formed such a way that it becomes zero at the vertex V 4 (representing a solution) and generates the repulsive impetus at other vertices, V 1, V 2, and V 3, illustrated by the shaded big arrows. From an initial state, A, the state is guided by the repulsive impetus at V 1, and reaches the upper-left quadrant. And then, the repulsive impetus at V 1 and V 2 will guide the state alternately, and the state will eventually evolve into the lower-right quadrant which corresponds to the solution, V 4. Note that the initial state, A, has momentum due to the hyperquadrant-to-vertex mapping even though it satises the constraints. 2.5 Asymmetric Weight vs. Symmetric Weight With asymmetric weights, the state dynamics either converges onto an isolated xed point, or generates oscillations or chaotic behaviors. The typical state dynamics for 100 neuron networks is shown in Figure 2.6. With asymmetric weights, the state dynamics results in oscillation after quite a long period of chaotic behavior, while, with symmetric weights, a xed point is reached after a short transient period. If you could utilize the general behaviors such as chaotic behaviors or limit cycles, the neural computation would be enormously powerful. Since we use only an isolated xed point as an output of the system, however, we need to guarantee the convergence of the state dynamics. On the stability of asymmetric recurrent neural networks, several sucient 36

Hill climbing: Simulated annealing and Tabu search

Hill climbing: Simulated annealing and Tabu search Heuristic algorithms Giovanni Righini University of Milan Department of Computer Science (Crema) Hill climbing Instead of repeating local search, it is