DUAL-MODE DYNAMICS NEURAL NETWORKS FOR COMBINATORIAL OPTIMIZATION. Jun Park. A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL
|
|
- Justin Weaver
- 5 years ago
- Views:
Transcription
1 DUAL-MODE DYNAMICS NEURAL NETWORKS FOR COMBINATORIAL OPTIMIZATION by Jun Park A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Electrical Engineering - Systems) August 1994 Copyright 1994 Jun Park
2 Acknowledgments First, I express my appreciation and respect to my advisor, Professor Sukhan Lee. Through the stimulating and productive discussions in detail, he has given me a lot of essential ideas, and guided me in the right direction to complete this dissertation successfully. Also, his inexhaustive passion and dedication to research gives me a notion of what kind of researcher I ought to be. It has been my great pleasure and privilege to have him as my advisor. I also would like to thank Professor Bart Kosko and Professor Behrokh Khoshnevis, my dissertation committee, for their constructive comments and valuable suggestions. Also, I express my thanks to Professor Keith Jenkins and Professor Ken Goldberg for serving on my qualifying committee. It has also been my pleasure to have so sincere and cooperative colleagues: Yeong Woo Choi, Chunsik Yi, Shunich Shimoji, Judy Chen, Carlos Luck, Soo Kwang Ro, Andrew H. Fagg and all the previous group members. I thank them all for their cooperation and encouragement. Special thanks should be given to Electronics and Telecommunications Research Institute for the nancial support for my study at University of Southern California. Finally, I would like to express my gratitute to all my family members. Their constant support and encouragement helped me very much to overcome various diculties during the period of my study. I wish to express my appreciation and love to my wife, Miran, and to my daughter and son, Dahyun and Jihyun. Especially, I hope I give a pleasure to my mother with this dissertation. ii
3 Contents Acknowledgments List of Figures List of Tables Abstract ii vi vii viii 1 Introduction Combinatorial Optimization Problem and Neural Network : : : : Related Works : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Approach of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : Organization of the Thesis : : : : : : : : : : : : : : : : : : : : : : 9 2 Dual-Mode Dynamics Neural Networks Network Conguration Space and Equilibrium Manifold : : : : : : : : : : : : : : : : : : : : : : : : : Network Structure : : : : : : : : : : : : : : : : : : : : : : : : : : Dual-Mode Dynamics : : : : : : : : : : : : : : : : : : : : : : : : : Discrete Model Dual-Mode Dynamics : : : : : : : : : : : : Continuous Model Dual-Mode Dynamics : : : : : : : : : : Symmetry Preserving Recurrent Backpropagation : : : : : Binary Value Solution vs. Continuous State Variable : : : : : : : Asymmetric Weight vs. Symmetric Weight : : : : : : : : : : : : : 36 3 Problem Solving with Dual-Mode Dynamics Neural Networks General Design Procedure : : : : : : : : : : : : : : : : : : : : : : N-Queen Problem : : : : : : : : : : : : : : : : : : : : : : : : : : : D2NN for the N-Queen Problem : : : : : : : : : : : : : : : Simulation Results and Discussions : : : : : : : : : : : : : Knapsack Packing Problem : : : : : : : : : : : : : : : : : : : : : 53 iii
4 3.3.1 D2NN for the Knapsack Packing Problem : : : : : : : : : Simulation Results and Discussions : : : : : : : : : : : : : Traveling Salesman Problem : : : : : : : : : : : : : : : : : : : : : D2NN for the Traveling Salesman Problem : : : : : : : : : Simulation Results and Discussions : : : : : : : : : : : : : 68 4 Conclusion Summary of the Research Contributions : : : : : : : : : : : : : : Suggestions for Future Research : : : : : : : : : : : : : : : : : : : 77 Appendix 79 A Convergence Property of Knapsack Packing D2NN 79 Bibliography 81 iv
5 List of Figures 1.1 The relation between the external objective function, network energy function, the state dynamics, and the weight dynamics a) in conventional approaches, and in) Dual-Mode Dynamics Neural Networks. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The network conguration space and the equilibrium manifold. : : The structure of Dual-Mode Dynamics Neural Network. : : : : : : The schematic view of the dual-mode dynamics based on the equilibrium manifold in the network conguration space. : : : : : : : : The hyperquadrant-to-vertex mapping : : : : : : : : : : : : : : : The state evolution in D2NN for two variable problem with three inequality constraints. : : : : : : : : : : : : : : : : : : : : : : : : Typical behaviors of a state dynamics for a 100 neuron network: a) with asymmetric weights and b) with symmetric weights. : : : The computational ow in the continuous model of D2NN. : : : : The computational cost to nd the solution for N-Queen problem. The solid line indicates the average number of the cumulative state dynamics iterations and the dashed line indicates the average number of the weight dynamics iterations out of 10 trials for each problem size. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The examples of solutions found be D2NN for 20 and 40 queen problems. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : For N-Queen Problem, the initial weight can be obtained from (2N? 1) (2N? 1) board. For 5 Queen, e.g., the weight for (1,1) unit is indicated by the solid box, and for (4,3) unit by the dashed box on 9 9 board shown above. Note that there are 8(N? 1) inhibitory weights out of (2N? 1) (2N? 1) board cells. The eects of the inhibitory and excitatory weights are balanced by the values given in the text. : : : : : : : : : : : : : : : : : : : : : : : D2NN structure for the knapsack packing problem. : : : : : : : : 58 v
6 3.6 Two representation scheme for the traveling salesman problem: a) the city-order representation b) the city-city representation. : : : The optimal tour for the given problem with 10 cities. : : : : : : : Four semi-optimal tours obtained by D2NN for the given traveling salesman problem with 10 cities. : : : : : : : : : : : : : : : : : : : Tours found by D2NN for the traveling salesman problems with 20 cities. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 72 vi
7 List of Tables 3.1 The computational cost to nd solutions for each size of N-Queen problem. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Comparison of the random initial weight assignment and the heuristic assignment for 8 and 16 Queen problems. The learning rate () is set to 0:01 for all the cases. : : : : : : : : : : : : : : : : : : : : The comparison of the D2NN with other neural network approaches for the N-Queen problem. The performance is compared in terms of the success rate to nd the solution along the number of Queens The computational cost of D2NN to nd optimal solutions for the knapsack packing problem. : : : : : : : : : : : : : : : : : : : : : : Comparison of D2NN with the greedy algorithm and Hellstrom & Kanal's approach in terms of the rate to nd the optimal solution Comparison of performance and computation time(sec) for the different approaches on n = m = 30 problems. The simulation has been performed on Sun 4/50 workstation and the data with the asterisk( ) are from Ohlsson et al.[31]. : : : : : : : : : : : : : : : The computational cost to nd a solution for the traveling salesman problem with 10 cities. The results are obtained with the time limit of 2000 weight dynamics for the target cost 2.75 and of 1000 weight dynamics for the target cost 2.0. : : : : : : : : : : : : The computational cost to nd a solution for two traveling salesman problem instances with 20 cities. : : : : : : : : : : : : : : : : 71 vii
8 Abstract This thesis presents a new approach to solving combinatorial optimization problems based on a novel dynamic neural network featuring a dual-mode of network dynamics, the state dynamics and the weight dynamics. The network is referred to here as the dual-mode dynamics neural network (D2NN). The combinatorial optimization problem usually has a huge number of elements in its conguration space, so that we cannot explore them exhaustively. Recently, the neural network approaches have been studied for the solution of the combinatorial optimization problem. The computational characteristic of neural network{the distributed and collective computation over a massively parallel architecture, emulating nonlinear dynamics{has invoked high expectation to overcome the curse of combinatorial search complexity in optimization. Several eective approaches have been applied to various combinatorial optimization problems, and have shown that the preliminary results are promising. There are two major diculties, however, in the neural network approaches to optimization problems. First, the objective function for a given problem must have the form that can be mapped onto the network, and secondly, due to the local minima problem, the quality of the solution is quite sensitive to various factors, such as the initial state and the parameters in the objective function. The proposed scheme overcomes these diculties 1) by maintaining the objective viii
9 function separately from the network energy function, rather than mapping it onto the network, and 2) by introducing a weight dynamics utilizing the objective function to avoid the local minima problem. The state dynamics denes state trajectories in a direction to minimize the network energy specied by the current weights and states, whereas the weight dynamics generates weight trajectories in a direction to minimize a preassigned external objective function at a current state. D2NN is operated in such a way that the two modes of network dynamics alternately govern the network until an equilibrium is reached. The D2NN has been applied to N-Queen problem, the knapsack problem, and the traveling salesman problem and indicates the superior performance. ix
10 Chapter 1 Introduction 1.1 Combinatorial Optimization Problem and Neural Network The central problem in the engineering eld is the optimization problem. Its concern is to nd the best conguration or a set of parameters to achieve some goal. If the variables in the optimization problem are discrete rather than continuous, we call it the combinatorial optimization problem. In the combinatorial optimization problem, the number of elements in the conguration space is factorially large, therefore, we cannot explore them exhaustively. For example, in the traveling salesman problem with 30 cities, the number of feasible tours is approximately Dierent heuristics have been devised for dierent problems to nd a good solution, rather than the globally optimal solution. Recently, articial neural networks have been applied to solve combinatorial optimization problems. In their pioneering work, Hopeld and Tank [18] showed a feasibility to solve combinatorial optimization problems with neural networks. 1
11 For the Traveling Salesman Problem (TSP) - one of the classical combinatorial optimization problem, they mapped a properly dened objective function onto the Hopeld network with symmetric weights and no self-loop, and showed that the solution can be computed collectively as the network dynamics works on. The underlying idea in Hopeld and Tank's work has been adopted from the associative memory model [16, 17, 22, 23]. Given a new pattern, the associative memory responses by producing one of the stored patterns which most closely resembles the given pattern. The mechanism of this process can be explained by the following. First, each of the stored patterns is placed at one of the local minima of the network energy function. Secondly, as the network dynamics works on, the state of the network goes down along the energy function surface, from the initial state corresponding to the given input pattern. Then, nally, the state converges to a local minimum of the network energy which is near the initial state. Note that the network energy is minimized while retrieving the stored pattern. This characteristic of energy minimization in the associative memory is adopted by Hopeld and Tank. For a given optimization problem, a properly dened objective function is mapped onto the network in such a way that the objective function is minimized as the network dynamics works on. A state at equilibrium is expected to represent a good quality of solution, although not necessarily the globally optimal solution. The computational characteristic of neural networks{the distributed and collective computation over a massively parallel architecture, emulating nonlinear dynamics{has invoked high expectation to overcome the curse of combinatorial search complexity in optimization problems. And thus, much eort has been devoted to obtain a neural network solution for a wide variety of combinatorial 2
12 optimization problems, including the traveling salesman problem [6, 44, 33], the Hamiltonian cycle problem [30, 38], the knapsack problem [15, 31], the N-Queen problem [3, 39, 40, 41], the scheduling problem [10, 13, 49], the graph partitioning problem [32, 36, 43], etc. 1.2 Related Works Most neural network approaches to combinatorial optimization, in general, adopt the following steps: 1) a formulation of an objective function representing both the cost to be minimized and the constraints to be satised, and 2) an assignment of proper weights to the network in such a way that the resulting network dynamics makes the network states converge to the minimum of the network energy function (representing a solution), as shown in Figure 1.1 a). When we formulate an objective function for a given problem, we should make the objective function have a certain form which can be mapped onto the network. In case that we want to map the problem onto the Hopeld network, the objective function should have the same form as the network energy function 1 of the Hopeld network as: E =? 1 2 NX NX NX w ij x i x j? w i0x i ; (1.1) i=1 j=1 i=1 where x i ; i = 1; : : : ; N is the output of the ith unit, and w ij is the weight between the ith and the jth units, and w i0 is the bias term of the ith unit. For many 1 We refer to the network energy function as the function of the state and the weights as in (1.1), and refer to the external objective function (or simply objective function) as the unconstrained function (the sum of penalty functions associated with constraints and the optimization measure) to be minimized for the given optimization problem. 3
13 optimization problems, however, it is dicult or even impossible to build an objective function as in (1.1). For example, an inequality constraint, a common component in optimization problems, is hard to express with the form given by (1.1). Even when we can nd an objective function in a proper form, we still have to select a proper set of parameters. The objective function usually has a set of coecients which determines the relative weighting among components in the optimization problem. Depending on these coecients, the network energy landscape is determined, and thus the performance of the network is highly dependent on the selection of these coecients. These coecients are usually selected by trial and error. As the problem size grows, however, it becomes very hard to nd a suitable set of coecients. We have another crucial problem, so called, the local minima problem. The objective function formulated in the form in (1.1) generally has many local minima. While these local minima have been utilized in the associative memory model, they may cause a critical problem in solving optimization problems, since the nal state stuck into a local minimum often represents a poor quality of solution or even an invalid solution. Also, the performance of the network is quite dependent on the selection of the initial state. If we start with an initial state which is placed in the basin of the global or the near-global minimum of the network energy function, the nal state will represent a solution of good quality. Otherwise, a poor solution is likely to be obtained [47]. To overcome this local minima problem, several eective approaches have been reported. Earlier, simulated annealing [21, 45] has been devised for solving combinatorial optimization problem. The key idea is from the analogy with 4
14 statistical thermodynamics. When liquid freezes and crystallizes with very slow cooling, the material structure achieves the minimum state of the thermodynamic energy. This is because there are always chances for the material structure to escape from the local minima with the help of thermal noises. Similarly, the state or conguration in simulated annealing is rearranged not only to decrease the objective function, but also to move in a direction to increase the objective function with some probability. As the temperature goes down, the probability to increase the objective function gradually reduces, and the nal system state is expected to reach the globally minimum or a semi-minimum of the objective function. In Mean eld annealing [44, 43, 32], simulated annealing is combined with the Mean Field network which is equivalent to the Hopeld network [4]. At the high articial temperature, the surface of the Hopeld network energy is smoothed out and the basins of local minima tend to disappear. The state of the network is likely to approach the vicinity of the global minimum. As the temperature goes down, the energy function recovers its original shape and the state is expected to converge to the global or near global minimum. The tabu learning [6, 12] is another approach for solving non-convex optimization problems. In the tabu learning, the auxiliary energy function is added to the Hopeld network energy. This auxiliary energy function is continuously increasing in a neighborhood of the current state, and thus penalizing the states that have already been visited. If the state is stuck to a local minimum, the auxiliary function around that minimum begins to increase and pushes the state out of that local minimum toward the space not yet visited. Besides the above approaches, many ideas have been proposed to improve 5
15 the performance of the neural optimization networks. To handle the inequality constraint, Tagliarini and Page [39] used the slack variables to transform the integer inequality constraints into the integer equality constraints, and Abe [1, 2] proposed the slack variables with a special activation function to handle the noninteger inequality constraints. Metha and Fulop [30] derived some conditions for the coecients in the objective function to produce the valid solution in solving the Hamiltonian cycle problem. Sun and Fu [38] proposed a algorithmic method which used the coordinate Newton method to speed up the computation time to reach the valid solution. Xu and Tsai [48] proposed the city-city representation scheme for the traveling salesman problem and combined OPT2 algorithm to solve the subtour problem. Dierent mapping schemes have also been proposed for dierent problems [11, 19]. Although these approaches are shown to be eective for improving the quality of solution, there still remain several problems. Due to the constraint for formulating the objective function, it is hard to apply the neural network approaches to some class of problems. For example, the optimization problems with arbitrary inequality constraints or with high-order optimization measure are not easily mapped onto the network. Also, the quality of solution is very sensitive to various factors. The annealing process usually takes a long computation time, and we need a well devised annealing schedule to get a good solution. When we add the auxiliary function on the network energy function to avoid the local minima problem as in the tabu learning, we should select carefully a set of parameters to control the auxiliary function. Otherwise, the nal solution may represent a poor quality of solution, even though it is good for modied network energy function. Besides, the coecients in the objective function and the initial 6
16 state have a crucial inuence on the quality of the nal solution. Unfortunately, there are no guidelines to determine suitable values for these factors. 1.3 Approach of the Thesis This thesis presents a new approach to the solution of combinatorial optimization problems based on Hopeld type recurrent neural networks, focusing on the aforementioned local minima problem. In the proposed approach, we design the network dynamics to be governed not only by the state dynamics but also by the weight dynamics. Thus, the network is named \the Dual-Mode Dynamics Neural Network (D2NN)" [25, 27, 24, 26, 28]. In Dual-Mode Dynamics Neural Network, the external objective function for a given optimization problem is not mapped onto the network but maintained separately from the network energy function. The weight dynamics is introduced to avoid the local minima problem, and is guided by the external objective function. In other words, there exist two kinds of energy functions in D2NN: the external objective function which is specic to the given optimization problem and the network energy which is a function of the network state and the weights. Also, there are two types of dynamics: the state dynamics which is governed by the network energy function and the weight dynamics which is governed by the external objective function, as shown in Figure 1.1b). The state dynamics is same as the Hopeld network dynamics. With the symmetric weight, the state dynamics is guaranteed to converge to an equilibrium [14, 16, 20]. The weight dynamics is set in such a way as to drive the network states in a direction to minimize the external objective function whenever 7
17 Figure 1.1: The relation between the external objective function, network energy function, the state dynamics, and the weight dynamics a) in conventional approaches, and in) Dual-Mode Dynamics Neural Networks. 8
18 the state dynamics reaches an equilibrium. The repetition of the state dynamics and the weight dynamics leads to a solution, since the weight dynamics provides a means of escaping from a local minimum of the network energy function by changing the network energy prole, and pushes the equilibrium state of the state dynamics toward the minimum of the external objective function. 1.4 Organization of the Thesis The thesis is organized as follows: Chapter 1 reviews the neural optimization networks and describes the problem statement as well as the approach of the thesis. Chapter 2 describes the Dual-Mode Dynamics Neural Network in details. First, the fundamental idea is claried through the discussion in the framework of the network conguration space and the equilibrium manifold. Then, the discrete and continuous model of Dual-Mode Dynamics Neural Network are described. Also, the issues of the binary solution vs. the continuous state variable and asymmetric weights vs. symmetric weights are discussed. Chapter 3 presents the problem solving with the Dual-Mode Dynamics Neural Network. After describing the general procedure to design the Dual-Mode Dynamics Neural Network for a given problem, the details of the Dual-Mode Dynamics Neural Networks for the N-Queen problem, the knapsack packing problem, and the traveling salesman problem are explained. Simulation results on these problems are presented and discussed as well. Finally, Chapter 4 summarizes the thesis contributions and proposes the future research issues. 9
19 Chapter 2 Dual-Mode Dynamics Neural Networks 2.1 Network Conguration Space and Equilibrium Manifold Let us consider a recurrent network represented by the following dynamics: nx _u i =?u i + w ij x j + i (2.1) j=1 x i = f(u i ) 1 = for i = 1; : : : ; n; (2.2) 1 + e?u i where n is the number of neurons of the network, and u i and x i represent respectively the state and output of the ith neuron. With a xed set of symmetric weights, it has been proven that the network dynamics, (2.1), is guaranteed to converge to an equilibrium state [7, 20, 17]. The network dynamics, (2.1), can 10
20 also be interpreted in terms of the network energy, E, in such a way that Equation (2.1) describes the evolution of network state along the surface of the network energy function, E, in a direction to minimize the energy and reach the bottom of the basin including the initial state. Equation (2.1) indicates that the network energy function, E, is a function of the state as well as the weight of the network. Figure 2.1a illustrates the network energy function, E, dened over the Cartesian product of the state and weight space, fwxg, called the network conguration space. Note that the state dynamics of a network depends on the selection of a particular set of network weights. In general, network dynamics can be characterized, at a network conguration, in terms of both weight dynamics and state dynamics, based on the network energy function dened over the network conguration space. To apply the general network dynamics consisting of weight and state dynamics to optimization problems, we consider only, so called, the equilibrium manifold of a network. The equilibrium manifold of a network is dened in the network conguration space as a set of points representing the steady states of the network corresponding to given weights, and is represented by the trace of the bottom of valley of the network energy function as shown schematically in Figure 2.1b. In Figure 2.1b, it is shown that the performance of most conventional approaches to optimization based on neural networks is sensitive to the selection of network weights and initial states. For instance, in Figure 2.1b, let x opt represent the optimal solution to be obtained for a given problem. Obviously, to obtain x opt at the network equilibrium, it is necessary that we select w B as the network weight. But, to assign w B as the network weight is not sucient for obtaining 11
21 Figure 2.1: The network conguration space and the equilibrium manifold. 12
22 the solution, since, for example, if the initial state is given at P, the network will settle down at T instead of S. For conventional approaches to be successful for optimization, the proper assignment of network weights and initial states seems essential. Unfortunately, the systematic way of assigning proper weights and initial states to a network is yet to be established for optimization. The dual-mode dynamics neural network (D2NN) is proposed to solve the problem of weight and initial state assignment associated with conventional approaches for optimization by combining the state dynamics with the weight dynamics. In D2NN, the network can start with arbitrarily chosen weights and initial state. That is, although the network starts with P or Q in Figure 2.1b, which dictates the network to reach T or R, respectively, at the equilibrium of state dynamics, D2NN allows the network to be evolved toward S (representing x opt ) along the equilibrium manifold by automatically modifying network weights and initial states through the weight dynamics and state dynamics. 2.2 Network Structure The structure of Dual-Mode Dynamics Neural Network(D2NN) is shown in Figure 2.2. D2NN is composed of two layers: the base layer and the supervisory layer. The base layer consists of a set of base units with symmetric connections among them. A base unit is either a visible unit or a hidden unit. For a given problem, the conguration space (or the solution space) is mapped into a set of visible units, and hidden units help the visible units to produce the desired solution. The supervisory layer consists of a set of supervisory units without intra-layer connections. One supervisory unit is assigned to each constraint in 13
23 the given problem. For the objective measure to be optimized, we set a target value to achieve, and treat it as one of the constraints. The connection between the supervisory unit and the visible unit is determined by the corresponding constraints, and does not change during computation. The external objective function is formulated with the supervisory units so that it has the minimum when all the constraints are satised. Depending on the external objective function, the connection between the base layer and the supervisory layer can be of a higher order, and we can select the desired form of the activation function at the supervisory unit. The base layer is same as the Hopeld network, and thus, the state dynamics which governs the base layer is guaranteed to converge to an equilibrium with symmetric weights. At the equilibrium of the state dynamics, each supervisory unit examines the visible units connected to it to see whether the corresponding constraint is satised or not. If not satised, the weight dynamics changes the weight in the base layer in a direction to reduce the external objective function, while maintaining the symmetry of the weight. The state dynamics and the weight dynamics govern the network alternately and leads to a solution, since the weight dynamics changes the network energy prole, and thus, pushes the equilibrium state of the state dynamics toward the minimum of the external cost function. 14
24 Figure 2.2: The structure of Dual-Mode Dynamics Neural Network. 15
25 2.3 Dual-Mode Dynamics Discrete Model Dual-Mode Dynamics In the discrete model dual-mode dynamics neural network, we use a discrete Hopeld network as the base layer. Therefore, the state dynamics which governs the base layer is the discrete Hopeld network dynamics. Let x i ; i = 1; : : : ; N be the output of the base unit i, where N is the number of the base units. Also, let w ij be the weight between x i and x j, and w i0 be a bias term for the base unit i. The state dynamics of the base unit is: x i = (I x i ) = ( X j w ij x j + w i0); for i = 1; : : : ; N; (2.3) where I x i is the input of ith base unit and () is the binary activation function of the base unit, i.e., (x) = 8>< >: 1 if x 0; 0 otherwise. With the no self-loop, symmetric weight which is randomly or heuristically assigned at initial, the state dynamics is guaranteed to converge to an equilibrium by an asynchronous operation, At equilibrium, the output of the supervisory unit is: s k = k (x) for k = 1; : : : ; K; (2.4) where K is the number of the supervisory units and k () represents the relationship between the kth supervisory unit s k and the base units, x in the corresponding constraint. The external objective function is dened on the supervisory layer 16
26 by: C = KX C k (s k ) = KX k=1 k=1 C k ( k (x)) (2.5) where C k () is the component of the objective function associated with each constraint. At the equilibrium state of the state dynamics, the weight dynamics is invoked, and updates the weight in the direction to decrease the objective function. At rst, we dene s k, x i by: s k x i 4 = dc k for k = 1; : : : ; K; (2.6) ds k 4 x i KX k=1 dc k ds i dx i di x i for i = 1; : : : ; n; (2.7) where n is the number of visible units. Note that x i is equal to 0 for i = (n+1); : : : ; N, since the hidden units are not directly connected to the supervisory unit. For the binary threshold activation function (), it is intractable to compute dx i =di x i as it is. We treat () as a linear activation function and set 0 () to 1 in (2.7). This is a reasonable approximation since both are nondecreasing functions. That is, for a desired 4x i, 4I x i should have the same sign of 4x i in both functions 1. Therefore, x i = KX k=1 s i : (2.8) In case the supervisory unit s k is a linear combination of the base unit output, 1 This approximation is more eective than the approximation with the sigmoid function, since it gives rise to the larger jw ij j when the input to a neuron is deeply saturated. The same idea has been applied to the error backpropagation learning to speed up the learning process [46]. 17
27 i.e., s k = X i w sx ki x i ; (2.9) then, x i = X k s k w sx ki : (2.10) Using (2.6) through (2.8), the equations for the weight dynamics are obtained as: 4w ij ij x x ij x? ( x i x j + x j x i); (2.11) 4w i0 i0? x i (2.12) where (> 0) is the learning rate. The overall operation of the discrete model of D2NN is as follows: Step 1. Initialization 1. Assign randomly (or heuristically) the weight in the base layer. 2. Select the initial state randomly. Step 2. Dual-Mode Dynamics 1. Run the base layer network asynchronously by (2.3) until an equilibrium is reached. 2. If the equilibrium state represents a solution, then go to Step 3. 18
28 3. Update the weight by (2.6) through (2.12). 4. If the time limit is not expired, then go to Step 2. Step 3. Stop Continuous Model Dual-Mode Dynamics In this subsection, we develop the continuous model dual-mode dynamics which claries the theoretical aspects of dual-mode dynamics neural networks. We rst derive the weight dynamics equation for the continuous model, and then, discuss its geometrical interpretation in the framework of the network conguration space and the equilibrium manifold. In the continuous model, the base layer is the continuous Hopeld network. Therefore, the state dynamics is same as the general dynamics for recurrent networks in (2.1), but with a xed symmetric weights, i.e., w ij = w ji. At equilibrium of the state dynamics, all _u i become zero, and thus, the equilibrium manifold equation is: u = W x + (2.13) where u; x; are the vectors of the state, output and bias, respectively, and W is the weight matrix. Let u be the state variation due to suciently small variations of weights and biases, W and. Then, u + u = (W + W )(x + x) + + : (2.14) 19
29 From (2.13) and (2.14), disregarding the O( 2 ) term, (I? W G)u = W x + (2.15) where G = diagfdx i =du i g = diagfx i (1? x i )g, and x i (dx i =du i )u i, i.e., x Gu. When the current equilibrium state does not represent the desired solution, we can obtain the desired state variation, u, in the direction to minimize the objective function, based on the gradient of the objective function with respect to the state. Then, with (2.15), we can get the weight variation, W, which achieves the desired state variation, u at the next equilibrium. However, we have to keep the symmetry of W to guarantee the convergence of the state dynamics. Let us rearrange the right side of (2.15) with the vectorized form of the upper triangular elements in W, w v as follows: = K w v (2.16) where = (I? W G) u; (2.17) K = [K 1 K 2 : : : K i : : : K n I n ]; (2.18) 20
30 K in(n?i+1) = x i x i+1 x i+2 x n 0 x i x i x i ; (2.19) w v = [w T v 1 : : : w T v i : : : w T v n T ] T ; (2.20) w vi = [w ii ; w i(i+1) ; : : : ; w in ] T ; (2.21) and I n is the n n identity matrix. Equation (2.16) is under-determined, that is, there are (n(n + 1)=2 + n) variables and n constraints. Thus, we may have the innite number of solutions. However, we assume a small weight variation in deriving (2.16). So, we choose the pseudo-inverse solution, which gives the minimum length of w v, kw v k. Let K + be the pseudo-inverse of K. Then, w v = K + (2.22) = K T (KK T )?1 : (2.23) Note that KK T is always invertible, because the rank of K is always n due to the last component,i n. This pseudo-inverse solution of w v requires the matrix inversion of (KK T ). But, from (2.18) and (2.19), KK T = diagfs? x 2 i g + xxt (2.24) where S = P i x 2 i + 1, and all the diagonal elements in diagfs? x 2 i g are positive. 21
31 Thus, by applying the general inversion identity: [A + ab T ]?1 = A?1? A?1 ab T A?1 1 + b T A?1 a ; we can compute (KK T )?1 easily as: (KK T )?1 ij = ij S? x 2 i? 1 D x i x j (S? x 2 i )(S? x 2 j) ; (2.25) where D = 1 + P i x 2 i S?x 2 i Then, from (2.18), (2.19) and (2.23), w ii w i(i+1) w i(i+2). w in =. Let L = (KK T )?1 = [L 1 ; L 2 ; L 3 ; ; L n ] T x i x i+1 x i x i+2 0 x i x n 0 0 x i L 1 L 2 L 3. L n (2.26) and n = L 1 L 2. L n : (2.27) From (2.26) and (2.27), the nal symmetry preserving weight update rule is 22
32 obtained as follows: w ij = 8 >< >: x i L j + x j L i x i L i i 6= j i = j i = L i : (2.28) Figure 2.3 illustrates the dual-mode dynamics based on the equilibrium manifold in the network conguration space. In Figure 2.3, the w 1 -axis and w 2 -axis represent the weight space, and u-axis represents the state space. Also, the curved surface represents the equilibrium manifold. Let A(w A ; u A ) represent a current network conguration at equilibrium and u d be the desired state variation which is computed from the objective function. Then, the question is how to obtain the weight variation, W, which achieves u d at the next equilibrium. So, we rst approximate the equilibrium manifold around A by a tangential hyperplane (Eq. (2.15)). Since the dimension of the weight space is much higher than that of the state space, there are innite number of solutions (represented by L sol in Figure 2.3) to achieve u d in this linearized equilibrium manifold. We choose a pseudo-inverse solution, S(w B ; u A + u d ), which gives rise to the minimum weight variation, i.e., min k w k, among them (Eq. (2.23)). Then, with the new weight, w B, and the current state, u A, the state dynamics starts again and moves the network conguration from S(w B ; u A ) to B(w B ; u B ) which belongs to the real equilibrium manifold at next equilibrium. Note that, with a suciently small k u d k, the next state, u B, is always obtained in the same direction with u d. State Variation of Hidden Units For a given optimization problem, one neuron which is called a visible unit is 23
33 Figure 2.3: The schematic view of the dual-mode dynamics based on the equilibrium manifold in the network conguration space. 24
34 assigned to one variable in the solution space. There may be also extra neurons which do not correspond to variables, so called hidden units. Since the hidden unit is not assigned a variable, and thus, its output is not included in the objective function, the desired variation for a hidden unit is not automatically given from the gradient of the objective function. Hence, there exists freedom to select the hidden unit variation, which can be utilized to achieve certain desirable performance. The simple way to x the hidden unit variation is to set them all to zero. By separating the visible unit and the hidden unit in (2.17), = I v 0 0 I h ? 4 W vv W vh W hv W hh G v 0 0 G h C A 6 4 u v u h (2.29) = A v u v + A h u h ; (2.30) where A v = I v? W vv G v?w hv G v ; A h = 2 6 4?W vh G h I h? W hh G h ; (2.31) and the subscripts, v and h, stand for the visible unit and the hidden unit, respectively, and u v and u h represent the variations of the visible units and hidden units, respectively. With u h = 0, w v = K + j uh =0 (2.32) = K T (KK T )?1 A v u v : (2.33) In this case, we actually minimize the norm of the state variation. That is, the 25
35 squared norm of the state variation is: ku k 2 = ku v k 2 + ku h k 2 ; (2.34) and u v is given from the gradient of the objective function. Therefore, ku k 2 is minimized with u h = 0. Since the weight variation is a function of hidden unit variations, we can also select the hidden unit variation to minimize the weight variation. Let us dene V as: V = 1 2 kw v k 2 (2.35) Then, V = 1 2 [KT (KK T )?1 ] T [K T (KK T )?1 ] (2.36) = 1 2 T (KK T )?1 : (2.37) Note that V is a quadratic function of u h as well as of. So, by setting the gradient of V with respect to u h to zero, i.e., A T h (KKT )?1 (A v u v + A h u h ) = 0; (2.38) we can select u h to minimize V as: u h =?[A T h (KKT )?1 A h ]?1 A T h (KKT )?1 A v u v : (2.39) 26
36 With this hidden unit variation, the resultant weight variation is w v = K T (KK T )?1 [I? A h (A T h (KKT )?1 A h )?1 A T h (KKT )?1 ]A v u v : (2.40) By both the state variation and the weight variation, the network changes its conguration along the equilibrium manifold in the network conguration space. So, we can select the hidden unit variation which minimizes the variation of network conguration in the network conguration space. Let us modify V in (2.35) to include the state variation as: V = 1 2 kw v k ku k2 : (2.41) Then, the resultant weight variation in (2.40) is also slightly modied as: w v = K T (KK T )?1 [I?A h (I h +A T h (KK T )?1 A h )?1 A T h (KK T )?1 ]A v u v : (2.42) The introduction of hidden units can be helpful since it accompanies with the increase of the number of weights by O(n 2 ), while with the increase of the number of state variations to achieve by O(n). Since the pseudo-inverse solution in (2.23) gives the minimum weight variation for a given state variation, we will use (2.33) for simulations in the later chapters, since it does not require the matrix inversion of (A T h (KK T )?1 A h ) or (I h + A T h (KK T )?1 A h ) as in (2.40) or (2.42) Symmetry Preserving Recurrent Backpropagation Pineda [34, 35] and Almeida [5] have independently pointed out that backpropagation can be extended to arbitrary networks, and developed the backpropagation 27
37 algorithm for recurrent neural networks, called recurrent backpropagation. The goal of this recurrent backpropagation is conceptually same as that of the weight dynamics in D2NN. With recurrent backpropagation, we try to change the weight to minimize the error function, while with the weight dynamics, we try to update weights in the direction to minimize the objective function which is equivalent to the error function in recurrent backpropagation. So, recurrent backpropagation can be also used as the weight dynamics in D2NN. Since recurrent backpropagation has been derived on the assumption that networks always converge to stable states, we have to guarantee the state dynamics stability all the time. With general asymmetric weights, however, it is hard to maintain the state dynamics stability, as will be discussed in Section 2.5. Therefore, by imposing the condition of symmetric weights, we modify the original recurrent backpropagation to derive the symmetry preserving recurrent backpropagation as follows. For convenience, let us rewrite the network dynamics in (2.1) and (2.2): _u i =?u i + x i = f(u i ) nx j=1 w ij x j + i (2.43) 1 = for i = 1; : : : ; n: (2.44) 1 + e?u i With the symmetric weight, the network always converges into a xed point, and the equilibrium manifold equation is: u i = X j w ij x j + j for i = 1; : : : ; n: (2.45) The goal is to adjust the weight so that the next equilibrium state be formed 28
38 along the direction to decrease the objective function. This is accomplished by computing the gradient of the objective function with respect to the weight, and by updating the weight in the anti-parallel direction of that gradient, that is, w rs =? X i C rs (2.46) where C i f 0 (u i ) (2.47) and C is the objective function. To i =@w rs, let us dierentiate the equilibrium manifold equation in (2.45) with respect to w rs (= w sr ) to rs = ir x s + is x r + X j w ij f 0 (u j rs : (2.48) where ij is the Kronecker delta, i.e., ij = 1 if i = j; 0 otherwise. Collecting terms, this can be written as where X j L rs = ir x s + is x r (2.49) L ij = ij? w ij f 0 (u j ): (2.50) Inverting the linear equations, (2.49) rs = (L?1 ) kr x s + (L?1 ) ks x r : (2.51) 29
39 By substituting (2.51) into (2.46), we obtain w rs =? y r x s? y s x r (2.52) where y r = X k y s = X k C k (L?1 ) kr (2.53) C k (L?1 ) ks : (2.54) Equation (2.52) species the symmetry preserving weight update rule, and requires the matrix inversion to get y r and y s in (2.53) and (2.54). However, we can undo the inversion in (2.53) and (2.54), and obtain linear equations for y k X k L ki y k = C i (2.55) or using (2.50), y i = X k w ki f 0 (u i ) y k + C i : (2.56) This equation has the same form as the original equilibrium manifold equation in (2.45), and can be solved in the same way, by the evolution of an auxillary network with a state dynamics analogous to (2.43): _y i =?y i + nx k=1 w ki f 0 (u i )y k + C i : (2.57) The auxillary network has the same topology as the original network, with the connection w ij from the jth unit to the ith unit replaced by w ji f 0 (u i ), a simple linear activation function f(y) = y, and a bias term C i. Almeida [5] showed that 30
40 the convergence of the original network is a sucient condition of the convergence of the auxillary network. Note that the convergence of the original network is guaranteed by the symmetry of the weight. The whole computational ow is thus: 1. Relax the original network with (2.43). 2. Calculate C i in (2.47) from the objective function C. 3. Relax the auxillary network with (2.57) to nd y i. 4. Update the weight using (2.52). 2.4 Binary Value Solution vs. Continuous State Variable In combinatorial optimization, the solution space is of binary values, while the state variable is continuous in the computation process of the neural optimization network. When we apply the neural network approaches to combinatorial optimization, we actually try to get binary value solutions through a continuous state space. In the discrete model of D2NN, the output of each neuron in the base layer is automatically of binary value because the binary threshold function is used as the activation function. Meanwhile, in the continuous model of D2NN, we need some schemes to get binary value solutions from continuous state variables. Hopeld and Tank used a sigmoid function with a steep slope as an activation function of neurons so that the output of each neuron was expected to be formed near 1 or 0 at equilibrium. However, with a xed steep slope in the activation 31
41 function, the state dynamics loses easily its momentum, and the state tends to stay at one of local minima or on a plateau of the network energy function. Therefore, the nal state usually represents a solution of poor quality, as reported by Wilson and Pawley [47]. The annealing process helps to avoid the above phenomena to some degree. While the state dynamics runs actively with the initial high temperature, the state dynamics eectively becomes discrete as the temperature goes towards zero, and thus, the nal state will be of binary value. The variation of temperature should be carefully scheduled since it has a crucial inuence on the quality of the nal solution. In practice, a good annealing schedule is problem-dependent, and is usually obtained by try-and-error, which takes rather a long a priori processing time. In the continuous model of D2NN, we use the hyperquadrant-to-vertex mapping to get the binary value solution from the continuous state variable. The hyperquadrant-to-vertex mapping is formally dened as: Hyperquadrant-to-vertex mapping: A point in the n dimensional unit hypercube, x = [x 1 x 2 x n ]; x i 2 [0; 1] for i = 1; : : : ; n, is mapped to x B = [x B 1 x B 2 xb n ], where x B i = 8 >< >: 1 if x i 0:5, 0 otherwise for i = 1; : : : ; n. An example of the hyperquadrant-to-vertex mapping is illustrated in Figure 2.4 for the two dimensional case. With this hyperquadrant-to-vertex mapping, each vertex of a unit hypercube represents all the points in the hyperquadrant 32
42 Figure 2.4: The hyperquadrant-to-vertex mapping 33
43 which contains that vertex. We apply the hyperquadrant-to-vertex mapping when we check whether a state represents a solution or not. In other words, the vertex which is obtained from the current output of neurons by the hyperquadrant-to-vertex mapping is tested to see if it satises all the constraints for the given problem. Thus, the goal of computation is slightly modied as to search a hyperquadrant which contains the vertex representing the solution, rather than to search the vertex of the binary value solution itself. This saves a considerable computation time since the state is expected to evolve into any point in the hyperquadrant which corresponds to the solution, rather than to the vertex itself representing the solution. The hyperquadrant-to-vertex mapping is also used in the computation of the desired state variation. In case we use the gradient of the objective function at the current output of neurons, it may happen that the desired state variation is zero even though the current output of neurons or the corresponding vertex does not represent the solution. And thus, there would be no way to guide the weight dynamics. Therefore, when we calculate the state variation, we use the the gradient of the objective function at the vertex which corresponds to the current output of neurons, rather than at the current output itself. Then, if the state evolves into a hyperquadrant whose vertex does not represent the solution for the given problem, the state will be pushed out to another hyperquadrant along the opposite direction of the gradient of objective function at the vertex. Meanwhile, there is no repulsive impetus in the hyperquadrant whose vertex represents the solution. So, the state moves through the equilibrium manifold, governed by the gradients at the vertices, and the weight dynamics will stop when the state evolves into the hyperquadrant which contains the vertex 34
44 Figure 2.5: The state evolution in D2NN for two variable problem with three inequality constraints. 35
45 representing the binary value solution. Figure 2.5 illustrates the state evolution in D2NN for the simple two dimensional problem. Based on three inequality constraints which are represented as the shaded area, the objective function is formed such a way that it becomes zero at the vertex V 4 (representing a solution) and generates the repulsive impetus at other vertices, V 1, V 2, and V 3, illustrated by the shaded big arrows. From an initial state, A, the state is guided by the repulsive impetus at V 1, and reaches the upper-left quadrant. And then, the repulsive impetus at V 1 and V 2 will guide the state alternately, and the state will eventually evolve into the lower-right quadrant which corresponds to the solution, V 4. Note that the initial state, A, has momentum due to the hyperquadrant-to-vertex mapping even though it satises the constraints. 2.5 Asymmetric Weight vs. Symmetric Weight With asymmetric weights, the state dynamics either converges onto an isolated xed point, or generates oscillations or chaotic behaviors. The typical state dynamics for 100 neuron networks is shown in Figure 2.6. With asymmetric weights, the state dynamics results in oscillation after quite a long period of chaotic behavior, while, with symmetric weights, a xed point is reached after a short transient period. If you could utilize the general behaviors such as chaotic behaviors or limit cycles, the neural computation would be enormously powerful. Since we use only an isolated xed point as an output of the system, however, we need to guarantee the convergence of the state dynamics. On the stability of asymmetric recurrent neural networks, several sucient 36
Hill climbing: Simulated annealing and Tabu search
Hill climbing: Simulated annealing and Tabu search Heuristic algorithms Giovanni Righini University of Milan Department of Computer Science (Crema) Hill climbing Instead of repeating local search, it is
More informationA.I.: Beyond Classical Search
A.I.: Beyond Classical Search Random Sampling Trivial Algorithms Generate a state randomly Random Walk Randomly pick a neighbor of the current state Both algorithms asymptotically complete. Overview Previously
More information7.1 Basis for Boltzmann machine. 7. Boltzmann machines
7. Boltzmann machines this section we will become acquainted with classical Boltzmann machines which can be seen obsolete being rarely applied in neurocomputing. It is interesting, after all, because is
More informationComputational Intelligence Lecture 6: Associative Memory
Computational Intelligence Lecture 6: Associative Memory Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Fall 2011 Farzaneh Abdollahi Computational Intelligence
More informationNeural Networks for Machine Learning. Lecture 11a Hopfield Nets
Neural Networks for Machine Learning Lecture 11a Hopfield Nets Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Hopfield Nets A Hopfield net is composed of binary threshold
More informationHertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). (v ji (1 x i ) + (1 v ji )x i )
Symmetric Networks Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). How can we model an associative memory? Let M = {v 1,..., v m } be a
More informationHopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296
Hopfield Networks and Boltzmann Machines Christian Borgelt Artificial Neural Networks and Deep Learning 296 Hopfield Networks A Hopfield network is a neural network with a graph G = (U,C) that satisfies
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationIn: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,
More informationLin-Kernighan Heuristic. Simulated Annealing
DM63 HEURISTICS FOR COMBINATORIAL OPTIMIZATION Lecture 6 Lin-Kernighan Heuristic. Simulated Annealing Marco Chiarandini Outline 1. Competition 2. Variable Depth Search 3. Simulated Annealing DM63 Heuristics
More informationNovel determination of dierential-equation solutions: universal approximation method
Journal of Computational and Applied Mathematics 146 (2002) 443 457 www.elsevier.com/locate/cam Novel determination of dierential-equation solutions: universal approximation method Thananchai Leephakpreeda
More informationError Empirical error. Generalization error. Time (number of iteration)
Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp
More informationIn biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.
In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required. In humans, association is known to be a prominent feature of memory.
More information1 Introduction Duality transformations have provided a useful tool for investigating many theories both in the continuum and on the lattice. The term
SWAT/102 U(1) Lattice Gauge theory and its Dual P. K. Coyle a, I. G. Halliday b and P. Suranyi c a Racah Institute of Physics, Hebrew University of Jerusalem, Jerusalem 91904, Israel. b Department ofphysics,
More informationStochastic Networks Variations of the Hopfield model
4 Stochastic Networks 4. Variations of the Hopfield model In the previous chapter we showed that Hopfield networks can be used to provide solutions to combinatorial problems that can be expressed as the
More information6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM
6. Application to the Traveling Salesman Problem 92 6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM The properties that have the most significant influence on the maps constructed by Kohonen s algorithm
More informationOptimization Methods via Simulation
Optimization Methods via Simulation Optimization problems are very important in science, engineering, industry,. Examples: Traveling salesman problem Circuit-board design Car-Parrinello ab initio MD Protein
More information1 Heuristics for the Traveling Salesman Problem
Praktikum Algorithmen-Entwurf (Teil 9) 09.12.2013 1 1 Heuristics for the Traveling Salesman Problem We consider the following problem. We want to visit all the nodes of a graph as fast as possible, visiting
More informationSummary. AIMA sections 4.3,4.4. Hill-climbing Simulated annealing Genetic algorithms (briey) Local search in continuous spaces (very briey)
AIMA sections 4.3,4.4 Summary Hill-climbing Simulated annealing Genetic (briey) in continuous spaces (very briey) Iterative improvement In many optimization problems, path is irrelevant; the goal state
More informationARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD
ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided
More informationSIMU L TED ATED ANNEA L NG ING
SIMULATED ANNEALING Fundamental Concept Motivation by an analogy to the statistical mechanics of annealing in solids. => to coerce a solid (i.e., in a poor, unordered state) into a low energy thermodynamic
More informationFinding optimal configurations ( combinatorial optimization)
CS 1571 Introduction to AI Lecture 10 Finding optimal configurations ( combinatorial optimization) Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Constraint satisfaction problem (CSP) Constraint
More informationArtificial Neural Networks Examination, March 2004
Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationApproximation Algorithms for Maximum. Coverage and Max Cut with Given Sizes of. Parts? A. A. Ageev and M. I. Sviridenko
Approximation Algorithms for Maximum Coverage and Max Cut with Given Sizes of Parts? A. A. Ageev and M. I. Sviridenko Sobolev Institute of Mathematics pr. Koptyuga 4, 630090, Novosibirsk, Russia fageev,svirg@math.nsc.ru
More information3.4 Relaxations and bounds
3.4 Relaxations and bounds Consider a generic Discrete Optimization problem z = min{c(x) : x X} with an optimal solution x X. In general, the algorithms generate not only a decreasing sequence of upper
More informationThe Traveling Salesman Problem: A Neural Network Perspective. Jean-Yves Potvin
1 The Traveling Salesman Problem: A Neural Network Perspective Jean-Yves Potvin Centre de Recherche sur les Transports Université de Montréal C.P. 6128, Succ. A, Montréal (Québec) Canada H3C 3J7 potvin@iro.umontreal.ca
More informationAppendix A.1 Derivation of Nesterov s Accelerated Gradient as a Momentum Method
for all t is su cient to obtain the same theoretical guarantees. This method for choosing the learning rate assumes that f is not noisy, and will result in too-large learning rates if the objective is
More informationTravelling Salesman Problem
Travelling Salesman Problem Fabio Furini November 10th, 2014 Travelling Salesman Problem 1 Outline 1 Traveling Salesman Problem Separation Travelling Salesman Problem 2 (Asymmetric) Traveling Salesman
More informationIN THIS PAPER, we consider a class of continuous-time recurrent
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 51, NO. 4, APRIL 2004 161 Global Output Convergence of a Class of Continuous-Time Recurrent Neural Networks With Time-Varying Thresholds
More informationField indeced pattern simulation and spinodal. point in nematic liquid crystals. Chun Zheng Frank Lonberg Robert B. Meyer. June 18, 1995.
Field indeced pattern simulation and spinodal point in nematic liquid crystals Chun Zheng Frank Lonberg Robert B. Meyer June 18, 1995 Abstract We explore the novel periodic Freeredicksz Transition found
More informationArtificial Intelligence Heuristic Search Methods
Artificial Intelligence Heuristic Search Methods Chung-Ang University, Jaesung Lee The original version of this content is created by School of Mathematics, University of Birmingham professor Sandor Zoltan
More informationNeural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science
Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Hopfield Networks Rudolf Kruse Neural Networks
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationG : Statistical Mechanics Notes for Lecture 3 I. MICROCANONICAL ENSEMBLE: CONDITIONS FOR THERMAL EQUILIBRIUM Consider bringing two systems into
G25.2651: Statistical Mechanics Notes for Lecture 3 I. MICROCANONICAL ENSEMBLE: CONDITIONS FOR THERMAL EQUILIBRIUM Consider bringing two systems into thermal contact. By thermal contact, we mean that the
More informationON COST MATRICES WITH TWO AND THREE DISTINCT VALUES OF HAMILTONIAN PATHS AND CYCLES
ON COST MATRICES WITH TWO AND THREE DISTINCT VALUES OF HAMILTONIAN PATHS AND CYCLES SANTOSH N. KABADI AND ABRAHAM P. PUNNEN Abstract. Polynomially testable characterization of cost matrices associated
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationMethods for finding optimal configurations
CS 1571 Introduction to AI Lecture 9 Methods for finding optimal configurations Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Search for the optimal configuration Optimal configuration search:
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More information= w 2. w 1. B j. A j. C + j1j2
Local Minima and Plateaus in Multilayer Neural Networks Kenji Fukumizu and Shun-ichi Amari Brain Science Institute, RIKEN Hirosawa 2-, Wako, Saitama 35-098, Japan E-mail: ffuku, amarig@brain.riken.go.jp
More information5. Simulated Annealing 5.1 Basic Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini
5. Simulated Annealing 5.1 Basic Concepts Fall 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Real Annealing and Simulated Annealing Metropolis Algorithm Template of SA A Simple Example References
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More information2 1. Introduction. Neuronal networks often exhibit a rich variety of oscillatory behavior. The dynamics of even a single cell may be quite complicated
GEOMETRIC ANALYSIS OF POPULATION RHYTHMS IN SYNAPTICALLY COUPLED NEURONAL NETWORKS J. Rubin and D. Terman Dept. of Mathematics; Ohio State University; Columbus, Ohio 43210 Abstract We develop geometric
More informationAdaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts.
Adaptive linear quadratic control using policy iteration Steven J. Bradtke Computer Science Department University of Massachusetts Amherst, MA 01003 bradtke@cs.umass.edu B. Erik Ydstie Department of Chemical
More informationin a Chaotic Neural Network distributed randomness of the input in each neuron or the weight in the
Heterogeneity Enhanced Order in a Chaotic Neural Network Shin Mizutani and Katsunori Shimohara NTT Communication Science Laboratories, 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 69-237 Japan shin@cslab.kecl.ntt.co.jp
More informationCongurations of periodic orbits for equations with delayed positive feedback
Congurations of periodic orbits for equations with delayed positive feedback Dedicated to Professor Tibor Krisztin on the occasion of his 60th birthday Gabriella Vas 1 MTA-SZTE Analysis and Stochastics
More informationR. Schaback. numerical method is proposed which rst minimizes each f j separately. and then applies a penalty strategy to gradually force the
A Multi{Parameter Method for Nonlinear Least{Squares Approximation R Schaback Abstract P For discrete nonlinear least-squares approximation problems f 2 (x)! min for m smooth functions f : IR n! IR a m
More informationPROBLEM SOLVING AND SEARCH IN ARTIFICIAL INTELLIGENCE
Artificial Intelligence, Computational Logic PROBLEM SOLVING AND SEARCH IN ARTIFICIAL INTELLIGENCE Lecture 4 Metaheuristic Algorithms Sarah Gaggl Dresden, 5th May 2017 Agenda 1 Introduction 2 Constraint
More informationSpurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics
UNIVERSITY OF CAMBRIDGE Numerical Analysis Reports Spurious Chaotic Solutions of Dierential Equations Sigitas Keras DAMTP 994/NA6 September 994 Department of Applied Mathematics and Theoretical Physics
More informationThe Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel
The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.
More informationsquashing functions allow to deal with decision-like tasks. Attracted by Backprop's interpolation capabilities, mainly because of its possibility of g
SUCCESSES AND FAILURES OF BACKPROPAGATION: A THEORETICAL INVESTIGATION P. Frasconi, M. Gori, and A. Tesi Dipartimento di Sistemi e Informatica, Universita di Firenze Via di Santa Marta 3-50139 Firenze
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationLinearly-solvable Markov decision problems
Advances in Neural Information Processing Systems 2 Linearly-solvable Markov decision problems Emanuel Todorov Department of Cognitive Science University of California San Diego todorov@cogsci.ucsd.edu
More informationLecture 35 Minimization and maximization of functions. Powell s method in multidimensions Conjugate gradient method. Annealing methods.
Lecture 35 Minimization and maximization of functions Powell s method in multidimensions Conjugate gradient method. Annealing methods. We know how to minimize functions in one dimension. If we start at
More informationBounded Approximation Algorithms
Bounded Approximation Algorithms Sometimes we can handle NP problems with polynomial time algorithms which are guaranteed to return a solution within some specific bound of the optimal solution within
More informationJohn P.F.Sum and Peter K.S.Tam. Hong Kong Polytechnic University, Hung Hom, Kowloon.
Note on the Maxnet Dynamics John P.F.Sum and Peter K.S.Tam Department of Electronic Engineering, Hong Kong Polytechnic University, Hung Hom, Kowloon. April 7, 996 Abstract A simple method is presented
More informationScheduling Adaptively Parallel Jobs. Bin Song. Submitted to the Department of Electrical Engineering and Computer Science. Master of Science.
Scheduling Adaptively Parallel Jobs by Bin Song A. B. (Computer Science and Mathematics), Dartmouth College (996) Submitted to the Department of Electrical Engineering and Computer Science in partial fulllment
More informationEcient Higher-order Neural Networks. for Classication and Function Approximation. Joydeep Ghosh and Yoan Shin. The University of Texas at Austin
Ecient Higher-order Neural Networks for Classication and Function Approximation Joydeep Ghosh and Yoan Shin Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX
More informationAnt Colony Optimization: an introduction. Daniel Chivilikhin
Ant Colony Optimization: an introduction Daniel Chivilikhin 03.04.2013 Outline 1. Biological inspiration of ACO 2. Solving NP-hard combinatorial problems 3. The ACO metaheuristic 4. ACO for the Traveling
More informationCOMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann
More informationAverage Reward Parameters
Simulation-Based Optimization of Markov Reward Processes: Implementation Issues Peter Marbach 2 John N. Tsitsiklis 3 Abstract We consider discrete time, nite state space Markov reward processes which depend
More informationChapter 0 Introduction Suppose this was the abstract of a journal paper rather than the introduction to a dissertation. Then it would probably end wit
Chapter 0 Introduction Suppose this was the abstract of a journal paper rather than the introduction to a dissertation. Then it would probably end with some cryptic AMS subject classications and a few
More informationNeural Networks. Hopfield Nets and Auto Associators Fall 2017
Neural Networks Hopfield Nets and Auto Associators Fall 2017 1 Story so far Neural networks for computation All feedforward structures But what about.. 2 Loopy network Θ z = ቊ +1 if z > 0 1 if z 0 y i
More informationAn Adaptive Bayesian Network for Low-Level Image Processing
An Adaptive Bayesian Network for Low-Level Image Processing S P Luttrell Defence Research Agency, Malvern, Worcs, WR14 3PS, UK. I. INTRODUCTION Probability calculus, based on the axioms of inference, Cox
More informationUsing a Hopfield Network: A Nuts and Bolts Approach
Using a Hopfield Network: A Nuts and Bolts Approach November 4, 2013 Gershon Wolfe, Ph.D. Hopfield Model as Applied to Classification Hopfield network Training the network Updating nodes Sequencing of
More information1. Introduction Let the least value of an objective function F (x), x2r n, be required, where F (x) can be calculated for any vector of variables x2r
DAMTP 2002/NA08 Least Frobenius norm updating of quadratic models that satisfy interpolation conditions 1 M.J.D. Powell Abstract: Quadratic models of objective functions are highly useful in many optimization
More informationA Generalized Homogeneous and Self-Dual Algorithm. for Linear Programming. February 1994 (revised December 1994)
A Generalized Homogeneous and Self-Dual Algorithm for Linear Programming Xiaojie Xu Yinyu Ye y February 994 (revised December 994) Abstract: A generalized homogeneous and self-dual (HSD) infeasible-interior-point
More informationNon-Convex Optimization. CS6787 Lecture 7 Fall 2017
Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More information4.1 Eigenvalues, Eigenvectors, and The Characteristic Polynomial
Linear Algebra (part 4): Eigenvalues, Diagonalization, and the Jordan Form (by Evan Dummit, 27, v ) Contents 4 Eigenvalues, Diagonalization, and the Jordan Canonical Form 4 Eigenvalues, Eigenvectors, and
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationonly nite eigenvalues. This is an extension of earlier results from [2]. Then we concentrate on the Riccati equation appearing in H 2 and linear quadr
The discrete algebraic Riccati equation and linear matrix inequality nton. Stoorvogel y Department of Mathematics and Computing Science Eindhoven Univ. of Technology P.O. ox 53, 56 M Eindhoven The Netherlands
More informationMODELLING OF FLEXIBLE MECHANICAL SYSTEMS THROUGH APPROXIMATED EIGENFUNCTIONS L. Menini A. Tornambe L. Zaccarian Dip. Informatica, Sistemi e Produzione
MODELLING OF FLEXIBLE MECHANICAL SYSTEMS THROUGH APPROXIMATED EIGENFUNCTIONS L. Menini A. Tornambe L. Zaccarian Dip. Informatica, Sistemi e Produzione, Univ. di Roma Tor Vergata, via di Tor Vergata 11,
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationSimulated Annealing for Constrained Global Optimization
Monte Carlo Methods for Computation and Optimization Final Presentation Simulated Annealing for Constrained Global Optimization H. Edwin Romeijn & Robert L.Smith (1994) Presented by Ariel Schwartz Objective
More informationground state degeneracy ground state energy
Searching Ground States in Ising Spin Glass Systems Steven Homer Computer Science Department Boston University Boston, MA 02215 Marcus Peinado German National Research Center for Information Technology
More informationMath 1270 Honors ODE I Fall, 2008 Class notes # 14. x 0 = F (x; y) y 0 = G (x; y) u 0 = au + bv = cu + dv
Math 1270 Honors ODE I Fall, 2008 Class notes # 1 We have learned how to study nonlinear systems x 0 = F (x; y) y 0 = G (x; y) (1) by linearizing around equilibrium points. If (x 0 ; y 0 ) is an equilibrium
More informationESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI
ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI Department of Computer Science APPROVED: Vladik Kreinovich,
More informationAI Programming CS F-20 Neural Networks
AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols
More informationOn-line Bin-Stretching. Yossi Azar y Oded Regev z. Abstract. We are given a sequence of items that can be packed into m unit size bins.
On-line Bin-Stretching Yossi Azar y Oded Regev z Abstract We are given a sequence of items that can be packed into m unit size bins. In the classical bin packing problem we x the size of the bins and try
More information`First Come, First Served' can be unstable! Thomas I. Seidman. Department of Mathematics and Statistics. University of Maryland Baltimore County
revision2: 9/4/'93 `First Come, First Served' can be unstable! Thomas I. Seidman Department of Mathematics and Statistics University of Maryland Baltimore County Baltimore, MD 21228, USA e-mail: hseidman@math.umbc.edui
More informationFeatured Articles Advanced Research into AI Ising Computer
156 Hitachi Review Vol. 65 (2016), No. 6 Featured Articles Advanced Research into AI Ising Computer Masanao Yamaoka, Ph.D. Chihiro Yoshimura Masato Hayashi Takuya Okuyama Hidetaka Aoki Hiroyuki Mizuno,
More information1 What a Neural Network Computes
Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists
More informationNeural Networks Lecture 6: Associative Memory II
Neural Networks Lecture 6: Associative Memory II H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi Neural
More informationAugust Progress Report
PATH PREDICTION FOR AN EARTH-BASED DEMONSTRATION BALLOON FLIGHT DANIEL BEYLKIN Mentor: Jerrold Marsden Co-Mentors: Claire Newman and Philip Du Toit August Progress Report. Progress.. Discrete Mechanics
More informationFundamentals of Metaheuristics
Fundamentals of Metaheuristics Part I - Basic concepts and Single-State Methods A seminar for Neural Networks Simone Scardapane Academic year 2012-2013 ABOUT THIS SEMINAR The seminar is divided in three
More informationNotes on Dantzig-Wolfe decomposition and column generation
Notes on Dantzig-Wolfe decomposition and column generation Mette Gamst November 11, 2010 1 Introduction This note introduces an exact solution method for mathematical programming problems. The method is
More informationGlobal Analysis of Piecewise Linear Systems Using Impact Maps and Surface Lyapunov Functions
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 48, NO 12, DECEMBER 2003 2089 Global Analysis of Piecewise Linear Systems Using Impact Maps and Surface Lyapunov Functions Jorge M Gonçalves, Alexandre Megretski,
More informationNew Integer Programming Formulations of the Generalized Travelling Salesman Problem
American Journal of Applied Sciences 4 (11): 932-937, 2007 ISSN 1546-9239 2007 Science Publications New Integer Programming Formulations of the Generalized Travelling Salesman Problem Petrica C. Pop Department
More informationComparison of Simulation Algorithms for the Hopfield Neural Network: An Application of Economic Dispatch
Turk J Elec Engin, VOL.8, NO.1 2000, c TÜBİTAK Comparison of Simulation Algorithms for the Hopfield Neural Network: An Application of Economic Dispatch Tankut Yalçınöz and Halis Altun Department of Electrical
More informationPHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN
PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION A Thesis by MELTEM APAYDIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the
More informationMVE165/MMG630, Applied Optimization Lecture 6 Integer linear programming: models and applications; complexity. Ann-Brith Strömberg
MVE165/MMG630, Integer linear programming: models and applications; complexity Ann-Brith Strömberg 2011 04 01 Modelling with integer variables (Ch. 13.1) Variables Linear programming (LP) uses continuous
More informationTraining Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow
Plan training single neuron with continuous activation function training 1-layer of continuous neurons training multi-layer network - back-propagation method single neuron with continuous activation function
More information21. Set cover and TSP
CS/ECE/ISyE 524 Introduction to Optimization Spring 2017 18 21. Set cover and TSP ˆ Set covering ˆ Cutting problems and column generation ˆ Traveling salesman problem Laurent Lessard (www.laurentlessard.com)
More informationMathematics Research Report No. MRR 003{96, HIGH RESOLUTION POTENTIAL FLOW METHODS IN OIL EXPLORATION Stephen Roberts 1 and Stephan Matthai 2 3rd Febr
HIGH RESOLUTION POTENTIAL FLOW METHODS IN OIL EXPLORATION Stephen Roberts and Stephan Matthai Mathematics Research Report No. MRR 003{96, Mathematics Research Report No. MRR 003{96, HIGH RESOLUTION POTENTIAL
More informationGaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada.
In Advances in Neural Information Processing Systems 8 eds. D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, MIT Press, 1996. Gaussian Processes for Regression Christopher K. I. Williams Neural Computing
More informationAn average case analysis of a dierential attack. on a class of SP-networks. Distributed Systems Technology Centre, and
An average case analysis of a dierential attack on a class of SP-networks Luke O'Connor Distributed Systems Technology Centre, and Information Security Research Center, QUT Brisbane, Australia Abstract
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More information