Microcanonical Mean Field Annealing: A New Algorithm for Increasing the Convergence Speed of Mean Field Annealing.

Similar documents
5. Simulated Annealing 5.1 Basic Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini

Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). (v ji (1 x i ) + (1 v ji )x i )

Motivation. Lecture 23. The Stochastic Neuron ( ) Properties of Logistic Sigmoid. Logistic Sigmoid With Varying T. Logistic Sigmoid T = 0.

7.1 Basis for Boltzmann machine. 7. Boltzmann machines

HOPFIELD neural networks (HNNs) are a class of nonlinear

Stochastic Networks Variations of the Hopfield model

Simulated Annealing. Local Search. Cost function. Solution space

Learning in Boltzmann Trees. Lawrence Saul and Michael Jordan. Massachusetts Institute of Technology. Cambridge, MA January 31, 1995.

SIMU L TED ATED ANNEA L NG ING

Public Key Exchange by Neural Networks

( ) ( ) ( ) ( ) Simulated Annealing. Introduction. Pseudotemperature, Free Energy and Entropy. A Short Detour into Statistical Mechanics.

Single Solution-based Metaheuristics

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

arxiv: v2 [nlin.ao] 19 May 2015

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Dynamical Systems and Deep Learning: Overview. Abbas Edalat

Simulated Annealing SURFACE. Syracuse University

Motivation, Basic Concepts, Basic Methods, Travelling Salesperson Problem (TSP), Algorithms

IN THIS PAPER, we consider a class of continuous-time recurrent

5. Simulated Annealing 5.2 Advanced Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini

Metaheuristics. 2.3 Local Search 2.4 Simulated annealing. Adrian Horga

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Lecture 4: Simulated Annealing. An Introduction to Meta-Heuristics, Produced by Qiangfu Zhao (Since 2012), All rights reserved

Simulated Annealing. 2.1 Introduction

Markov Chains and MCMC

The Phase Transition of the 2D-Ising Model

Understanding temperature and chemical potential using computer simulations

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Elements of Reinforcement Learning

Introduction to the Renormalization Group

Introduction to Simulated Annealing 22c:145

Hopfield Neural Network

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

Teaching Statistical and Thermal Physics Using Computer Simulations

Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines

fraction dt (0 < dt 1) from its present value to the goal net value: Net y (s) = Net y (s-1) + dt (GoalNet y (s) - Net y (s-1)) (2)

OPTIMIZATION BY SIMULATED ANNEALING: A NECESSARY AND SUFFICIENT CONDITION FOR CONVERGENCE. Bruce Hajek* University of Illinois at Champaign-Urbana

Bayesian Networks Inference with Probabilistic Graphical Models

Restricted Boltzmann Machines for Collaborative Filtering

Using Variable Threshold to Increase Capacity in a Feedback Neural Network

Heuristic Optimisation

Neural Nets and Symbolic Reasoning Hopfield Networks

Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network

Deep Belief Networks are compact universal approximators

Markov Chain Monte Carlo. Simulated Annealing.

Thermodynamical Approach to the Traveling Salesman Problem: An Efficient Simulation Algorithm I

Shigetaka Fujita. Rokkodai, Nada, Kobe 657, Japan. Haruhiko Nishimura. Yashiro-cho, Kato-gun, Hyogo , Japan. Abstract

Quantum Annealing in spin glasses and quantum computing Anders W Sandvik, Boston University

CS:4420 Artificial Intelligence

Optimization Methods via Simulation

Monte Carlo (MC) Simulation Methods. Elisa Fadda

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

A =, where d n w = dw

Fractional Belief Propagation

A Method for the Efficient Design of Boltzmann Machines for Classification Problems

A Monte Carlo Implementation of the Ising Model in Python

Computing the Initial Temperature of Simulated Annealing

A New Hybrid System for Recognition of Handwritten-Script

Monte Carlo methods for sampling-based Stochastic Optimization

Pengju

Monte Carlo Simulation of the Ising Model. Abstract

A Structural Matching Algorithm Using Generalized Deterministic Annealing

AN OPTIMIZED FAST VOLTAGE STABILITY INDICATOR

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

SCRAM: Statistically Converging Recurrent Associative Memory

Q-Learning in Continuous State Action Spaces

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

A Brief Review of Probability, Bayesian Statistics, and Information Theory

Monte Carlo Simulations of Protein Folding using Lattice Models

Artificial Intelligence Heuristic Search Methods

Markov chain Monte Carlo methods for visual tracking

Variational Inference. Sargur Srihari

Analysis of Neural Networks with Chaotic Dynamics

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries

Parameter estimation using simulated annealing for S- system models of biochemical networks. Orland Gonzalez

A.I.: Beyond Classical Search

1. In Activity 1-1, part 3, how do you think graph a will differ from graph b? 3. Draw your graph for Prediction 2-1 below:

RESEARCH STATEMENT-ERIC SAMANSKY

Spin Glas Dynamics and Stochastic Optimization Schemes. Karl Heinz Hoffmann TU Chemnitz

Learning Energy-Based Models of High-Dimensional Data

Phase Transition & Approximate Partition Function In Ising Model and Percolation In Two Dimension: Specifically For Square Lattices

The Rectified Gaussian Distribution

Quantum annealing by ferromagnetic interaction with the mean-field scheme

Machine Learning, Midterm Exam

A Cross Entropy Based Multi-Agent Approach to Traffic Assignment Problems

CS 354R: Computer Game Technology

Week #1 The Exponential and Logarithm Functions Section 1.2

Q520: Answers to the Homework on Hopfield Networks. 1. For each of the following, answer true or false with an explanation:

Prediction of Beta Sheets in Proteins

Learning Tetris. 1 Tetris. February 3, 2009

Mathematics for Artificial Intelligence

6 Markov Chain Monte Carlo (MCMC)

Cellular Automata. ,C ) (t ) ,..., C i +[ K / 2] Cellular Automata. x > N : C x ! N. = C x. x < 1: C x. = C N+ x.

Hopfield Network for Associative Memory

7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory

Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil

Transcription:

Microcanonical Mean Field Annealing: A New Algorithm for Increasing the Convergence Speed of Mean Field Annealing. Hyuk Jae Lee and Ahmed Louri Department of Electrical and Computer Engineering University of Arizona Tucson, AZ, 85721 Abstract In this paper, we consider the convergence speed of mean field annealing (MFA). We combine MFA with microcanonical simulation(mcs) method and propose a new algorithm called microcanonical mean field annealing(mcmfa). In the proposed algorithm, cooling speed is controlled by current temperature so that computation in the MFA can be reduced without degradation of performance. In addition, the solution quality of MCMFA is not affected by the initial temperature. The properties of MCMFA are analyzed with a simple example and simulated with Hopfield neural networks(hnn). In order to compare MCMFA with MFA, both algorithms are applied to graph bipartitioning problems. Simulation results show that MCMFA produces better solution than MFA. 1 Introduction Optimal solution is the main purpose of solving engineering problems. However, problems often have a large degree of freedom, SO that optimal solution cannot be obtained in a reasonable amount of time [l]. A significant progress was made in this area with the invention of two new algorithms: simulated annealing(sa) and microcanonical simulation(mcs) [2,3]. Both algorithms use the analogy between optimization problem and magnetic material in the statistical physics. SA has received more attention and successfully solves the optimization problem using the theory of interpreting magnetic material [I]. Another recent achievement was made by Hopfield neural networks(hnn) [4]. In the HNN, a neuron is modeled as an independent variable of the optimization problem just as an atom in the SA and the role of neuron is similar to that of atom. Therefore the search method of SA can be applied to HNN for updating the state of neuron. However, the probabilistic search rule of SA is difficult to implement in HNN, and often requires an excessive computation. One of the promising suggestion is a mean field annealing(mfa) which is more amenable to implementation by HNN and requires less computation than SA with nearly equal quality of solution [5]. MFA starts at high temperature, and slowly reduces the temperature. However, an excessive computation is required by slow cooling from high initial temperature. Previous research shows the existence of critical temperature, T,, near which neurons begin to move to their final sta.tes. For the speed-up of MFA, it is desirable to increase the cooling speed above T,, and decrease it near T,. This adaptive cooling requires knowledge of T,. Although efforts were devoted to find T, for specific examples, there is no general technique to estimate T, by analyzing the structure of HNN. However during the simulation of MFA, it is easy to determine T, by observing the energy function. CH 3065-0/91/0000-0941$1.00 QIEEE T

MCS is an algorithm that computes the energy function of HNN. We apply this property to control the cooling speed of MFA and propose a new algorithm called microcanonical mean field a.nnealing(mcmfa). MCMFA adapts the cooling speed based on the current temperature, so that the amount of computation time is reduced without degradation of performance. Therefore MCMFA requires less computation than MFA. The adaptive cooling schedule also solves the problem of choosing the initial temperature. These properties are examined with analysis and simulation of MCMFA. MCMFA is applied to graph bipartitioning problems and results are compared with MFA. 2 Microcanonical Mean Field Annealing MCS uses an imaginary particle called demon for simulating magnetic material [3]. The demon has deterministic energy, Ed, and flips the state of neuron. Statistical physics shows that the demon s energy will become Boltzmann distribution Prob[Ed] a (1) where T represents the temperature of demon [3]. Then the deterministic update rule of MCS should be changed to probabilistic rule as follows: Prob[s; = +l/si = -11 = Prob[E,j > H(s; = +1) - H(si = -1)] PTob[s; = - l/~i = +1] = Prob[Ed > H(s; = -1) - H(si = +1)] ( 2) where si represents the state of ith neuron and H(s) represents the energy function of HNN. The update rule of Eq. 2 is similar to that of SA. However, in the MCS, the total energy is conserved. The decrease of HNN energy causes the increase of demon energy as follows: p < Ed > <=p < Ed > -A < H >, (3) where is the number of demon and <> represents the average value. In the equilibrium state, the average demon energy is the same as temperature. Therefore Eq. 3 becomes: < Ed >= T. (4) T <= T - A < H > /p. (5) We combine the cooling schedule of MFA with Eq. 5 to propose a new algorithm called microcanonical mean field annealing(mcmfa). 0 Microcanonical Mean Field Annealing Algorithm: 1. Initialization : T = T;,,itial. 2. while(t > Tminimum) (a) select neuron i for update. (b) update ith neuron as follows: 1 < s; >= 1 + e-a<h>/t 942

(c) adjust temperature as follows: T <=T -CY- A < H > /p. (7) 3. all states are equilibrated to +1 or -1 according to their signs. The update rule of Eq. 6 is same as that of MFA. The difference is t..e cooling schedule of Eq. 7. In the MFA, a determines the cooling speed and quality of solution. The MCMFA adds A < H > term to the cooling schedule of MFA. p determines the influence of A < H > term on the cooling speed. If /3 is small, A < H > plays an important role on the cooling speed. As the value of P increases, the effect of A < H > term decreases. If p is infinity, MCMFA behaves as MFA. 3 Analysis and Simulation of MCMFA In this section, we analyze the dynamics of MCMFA. We try simple case where wij = 20 > 0 for all i, j. Although this is not the general case, we can have a qualitative description of the behavior of general system from this specific example. In our case, the mean values of neuron states are all the same. < si >=< s > for all i. (8) In the equilibrium state, the temperature of neural networks should be identical to the temperature of demon. Therefore the mean value of final state is where N is the number of neurons. The average energy function becomes Since our purpose is qualitative understanding of final state, a specific external input$;, does not make an important effect on our analysis. For the sake of convenient analysis, all external inputs are assumed to be 0. Then 1 < H >= --N2w < s >2. (11) 2 From the energy conservation property, we have This equation can be solved graphically as shown in Fig. 1. The straight line corresponds to right hand side of Eq. 12 and the curve represents the left hand side. The initial state is reflected on the straight line and the final state is the intersection of two graphs. From Fig. 1, we can expect the behavior of MCMFA as follows. First, if the intersection of straight line and x-axis is greater than T,, the final energy of HNN is 0. The final temperature of demon is the intersection of the line and x-axis. Second, if this intersection is less than T,, the final energy of HNN is less than 0. If the slope of the straight line, p, is small, the final temperature is near T,. As /3 grows larger, the final temperature becomes lesser. If p is infinity, then the behavior of MCMFA is the same as that of MFA. MCMFA is simulated with HNN consisting of 100 neurons and 100 edges. The initial energy of HNN is chosen to be 0 and p is chosen to be 1. Five temperatures are chosen to be initial 943 7

temperatures and the change of temperature is shown in Fig. 2. If the initial temperature is less than T,, temperature converges near T,. If the initial temperature is 2 which is greater than T,, the temperature remains unchanged. This result corresponds to the graphical solution of Fig. 1. The effect of P is also simulated and the result is shown in Fig. 3. The initial energy of HNN is chosen to be 0 and demon temperature is chosen to be 0.1. If P is relatively small, the final temperature converges to critical temperature. As the value of P increases, the convergent temperature decreases. This simulation result is also same as what is expected from previous analysis. 4 Application In this section, we apply MCMFA to graph bipartitioning problems. We investigate the quality of solution and convergence speed. We also apply MFA to graph bipartitioning problems and results are compared with MCMFA. The energy function we use for graph bipartitioning problems is where cij = 1 if nodes i, j of the graph are connected = 0 otherwise. Our graph for the simulation has 100 neurons and 100 edges. These 100 edges are generated randomly. We tried this problem 100 times with different interconnections. The results are averaged over 100 trials. The detail of cooling schedule is described in what follows. MFA : The cooling process starts with initial temperature 2 and stops at 0. The initial temperature is chosen to be well above the critical temperature. A neuron is selected and its state is updated by Eq. 6. After state of neuron is updated, temperature is decreased by small amount a. T=T-a (14) In general, the convergence speed and solution quality depend on a. We try various values of a to compare the convergence speed and solution quality. MCMFA : The initial temperature and stop condition are same as those of MFA. But Eq. 7 is used for cooling schedule instead of Eq. 14. As for P, we start with @initial = 0.1 and increase it exponentially as cooling proceeds. P = e @initia/. (15) In order to compare MCMFA with MFA, we examine the quality of solution vs. the convergence speed. We observe the final value of energy function for the quality of solution and the number of iterations for the convergent speed. Fig. 4 shows that MCMFA produces better solutions than MFA. To reduce the energy of HNN to -42.0, MCMFA needs about 1400 iterations while MFA requires more than 2000 iterations. As for -43.0, MCMFA requires about 3000 iterations and MFA requires 5000 iterations. In order to investigate the effect of Tinitiar, we try same problem with Tinitiai = 0.4. Fig. 5 shows the final value of energy function vs. number of iterations. With small Tinitial, MFA converges quickly to poor solution. Thus the performance of MFA is better for the number of iterations is less than 2000. However, MFA cannot make any improvement by increasing the amount of computation. As for MCMFA, Tiniti.r does not make a large effect on the final solution. The 944

quality VS. iteration curve is similar to Fig. 4. MCMFA produces a solution with Tinitial = 0.4 good as with Tinitjal = 2.0. This simulation result shows that MCMFA does not have to start at temperature above T,. Therefore, the problem of choosing Tinitial disappeared in the MCMFA. 5 Conclusions In order to reduce the amount of computation, a new algorithm called MCMFA is proposed. We use the properties of MCS and combine it with MFA so that cooling speed can be scheduled adaptively during the annealing process. In the MCMFA, the cooling speed is increased at the temperature where computation makes a little contribution to final solution. Thus MCMFA with adaptive cooling schedule produces better solution than MFA with equal amount of computation. In addition, MCMFA solves the problem of choosing the initial temperature. If the initial temperature is lower than critical temperature, MCMFA can raise the temperature to critical temperature. Thus the initial temperature does not affect the quality of final solution. These advantages are justified by analysis and simulation experiments. We analyze the properties of MCMFA with simple HNN of which all interconnections have same weights. We also simulate the behavior of MCMFA with HNN. MFA and MCMFA are applied to graph bipartitioning problems and the speed and quality of solution are compared. Simulation results show that MCMFA requires less computation than MFA. If initial temperature is below critical temperature, MFA converges to local minimum and produces poor solution. But the solution quality of MCMFA is not affected by the initial temperature. References [l] E. Aartz and J. Korst, Simulated Annealing and Boltzmann Machines, John Wiley and Sons, Chichester, Great Britain, 1989. [2] S. Kirkpatrick, C. Gelatt, and M. Vecchi, Optimization by simulated annealing, Science, vol. 220(4598), pp. 671-680, May 13, 1983. [3] M. Creutz, Microcanonical Monte-Carlo Simulation, Physical Review Letters, vol. 50, pp. 1411-1414, May, 1983. [4] J. J. Hopfield, Neurons with graded response have collective computational properties like those of two-state neurons, Proc. Nat. Acad. Sci. U.S., vol. 81, pp. 3088-3092, 1984. [5] D. E. Van Den Bout and T. K. Miller, 111, Graph partitioning using annealed neural networks, IEEE Trans. Neural Networks, vol. 1, no. 2, pp.192-203, Jun. 1990. [6] J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the theory of neural computation, Redwood, Addison-Wesley,1991. [7] G.E. Hinton and T.J. Sejnowski, Learning and Relearning in Boltzmann Machines, in Parallel Distributed Processing, Volume I, Cambridge, MIT Press, 1986. 945

slope-beta I 43.6 42.0 u.2 41.S 40.8 40.1 2.1 $ E 1.8.- Fipure 1 : Graphical solution d final state..:... 39.4 36.? =.a 1.5. 0.6.-!! 1 0.3.4-.0i'*' -.-.ii I $--' 0.9.I:!..!I 3-------A... T-2.0 Ty1.S - - - T-1.0 -.- T.yO.5.-..- T.yO.1 0.0. I I I I B h lb S? 1.5 1.2 0.9 - beta-loo --- beta-10... beta 1 ""t.' :I 40.1, T-0.4,,,,, I, I, 0.6 0.3 38.0 loo0 10000 X of updams Figure 5: The perfomunce oampari.on(initia1 tsmpnturo-0.4). 0.0 Figure 3: The change of temperature wilh various beta. 946