Self-organising Systems 2 Simulated Annealing and Boltzmann Machines

Similar documents
( ) ( ) ( ) ( ) Simulated Annealing. Introduction. Pseudotemperature, Free Energy and Entropy. A Short Detour into Statistical Mechanics.

Lecture Notes on Linear Regression

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

The Minimum Universal Cost Flow in an Infeasible Flow Network

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

1 Convex Optimization

ESCI 341 Atmospheric Thermodynamics Lesson 10 The Physical Meaning of Entropy

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Mean Field / Variational Approximations

Chapter Newton s Method

Lecture 10 Support Vector Machines II

COS 521: Advanced Algorithms Game Theory and Linear Programming

Hopfield Training Rules 1 N

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Generalized Linear Methods

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Some modelling aspects for the Matlab implementation of MMA

Feature Selection: Part 1

Lecture 10 Support Vector Machines. Oct

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

The Expectation-Maximization Algorithm

Kernel Methods and SVMs Extension

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

The Study of Teaching-learning-based Optimization Algorithm

Problem Set 9 Solutions

Multilayer Perceptron (MLP)

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Physics 5153 Classical Mechanics. Principle of Virtual Work-1

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

An Interactive Optimisation Tool for Allocation Problems

CS407 Neural Computation

Markov Chain Monte Carlo Lecture 6

University of Washington Department of Chemistry Chemistry 453 Winter Quarter 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015

10-701/ Machine Learning, Fall 2005 Homework 3

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

Temperature. Chapter Heat Engine

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Finding Dense Subgraphs in G(n, 1/2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Week 5: Neural Networks

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

1 GSW Iterative Techniques for y = Ax

CHAPTER 7 STOCHASTIC ECONOMIC EMISSION DISPATCH-MODELED USING WEIGHTING METHOD

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Monte Carlo method II

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model

Boostrapaggregating (Bagging)

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Online Classification: Perceptron and Winnow

Report on Image warping

Thermodynamics and statistical mechanics in materials modelling II

Q e E i /k B. i i i i

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

Support Vector Machines CS434

Linear Feature Engineering 11

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Chapter 13: Multiple Regression

Foundations of Arithmetic

Introduction to Statistical Methods

FUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM

Comparative Studies of Law of Conservation of Energy. and Law Clusters of Conservation of Generalized Energy

STATISTICAL MECHANICS

Section 8.3 Polar Form of Complex Numbers

Natural Language Processing and Information Retrieval

ELUCIDATION OF TSP WITH SAMAN-NET

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Combined Economic Emission Dispatch Solution using Simulated Annealing Algorithm

Lecture 20: November 7

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Lagrange Multipliers Kernel Trick

Linear Approximation with Regularization and Moving Least Squares

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

Newton s Method for One - Dimensional Optimization - Theory

NP-Completeness : Proofs

Support Vector Machines CS434

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Module 9. Lecture 6. Duality in Assignment Problems

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

How Strong Are Weak Patents? Joseph Farrell and Carl Shapiro. Supplementary Material Licensing Probabilistic Patents to Cournot Oligopolists *

Grover s Algorithm + Quantum Zeno Effect + Vaidman

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

DUE: WEDS FEB 21ST 2018

Math1110 (Spring 2009) Prelim 3 - Solutions

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

and Statistical Mechanics Material Properties

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

1 Matrix representations of canonical matrices

Transcription:

Ams Reference Keywords Plan Self-organsng Systems Smulated Annealng and Boltzmann Machnes to obtan a mathematcal framework for stochastc machnes to study smulated annealng to study the Boltzmann machne Parts of chapter of Haykn, S., Neural Networks: A Comprehensve Foundaton, Prentce-Hall, 1999. temperature, annealng schedule, Metropols algorthm, combnatoral optmzaton, energy functon, move set, mean-feld annealng, quadratc assgnment problem, cost functon, crtcal temperature, Boltzmann machne statstcal mechancs Metropols algorthm annealng schedule travellng salesperson problem energy functon move sets for TSP mean-feld annealng crtcal temperature Boltzmann machne structure Boltzmann machne learnng Energy mnmzaton Boltzmann machne applcatons Introducton Because (ndustral strength) neural networks may have thousands of degrees of freedom (e.g. weghts) t s possble to get nspraton from the theory of statstcal mechancs. Statstcal mechancs deals wth the macroscopc equlbrum propertes of large systems of elements that are subject to the mcroscopc laws of mechancs. The Boltzmann machne seems to be the frst multlayer learnng machne nspred by statstcal mechancs. It learns the statstcal dstrbuton of a data set, and can use ths for tasks lke pattern completon. [There s also the Cauchy machne - same dea, dfferent dstrbuton functon.] Smulated annealng s an optmzaton technque that uses a thermodynamc metaphor. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 1 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 A Short Detour nto Statstcal Mechancs Consder a system wth many degrees of freedom that can be n many possble states. Let p denote the probablty of state. Then p = 1. Let E denote the energy of the system n state. Statstcal mechancs says that when a system s n thermal equlbrum wth ts envronment, state occurs wth probablty 1 E p = exp ( ) (.3) Z k T B where T s temperature (Kelvn scale) k B s Boltzmann's constant, and Z s a constant. As p = 1, we can derve E Z = exp ( ) (.4) k T B The probablty dstrbuton of.3 s called the Gbbs dstrbuton and the factor exp( E /(k B T)) s called the Boltzmann factor. Note from.3 that lower energy or hgher temperature hgher probablty. As T s reduced, the probablty s concentrated n a smaller subset of low-energy states. Pseudotemperature, Free Energy and Entropy We can mmc ths setup n a neural net context usng a concept of pseudotemperature T. As T has no scale (lke Kelvn) we don't need an analog of k B, so we can wrte E p = 1 E exp Z = exp ( ( ) and ) (.5/) Z T T Notce that f T = Z = 1, then E = log e p, so log p measures somethng lke energy. The Helmholtz free energy, F, of a system, s defned as F = T log e Z (.) The average energy of the system s defned by <E> = p E (.8) Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 3 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 4

Entropy Usng.5 to.8, we derve <E> F = T p log e p (.9)1 The RHS (except for T) s called the entropy H of the system: H = p log e p (.) so <E> F = TH or F = <E> TH (.) When two systems A and A' come nto contact, the total entropy of the two systems tends to ncrease: H + H' 0 In vew of., ths means that the free energy of the system, F, tends to decrease, reachng a mnmum when the two systems reach thermal equlbrum: The mnmum of the free energy of a stochastc system wth respect to the varables of the system s acheved at thermal equlbrum, at whch pont the system s governed by Gbbs' dstrbuton. Smulated Annealng Smulated annealng s an optmzaton technque. In Hopfeld nets, local mnma are used n a postve way, but n optmzaton problems, local mnma get n the way: one must have a way to escape from them. The two deas of smulated annealng are as follows: 1. When optmzng a very large and complex system (.e., a system wth many degrees of freedom), nstead of always gong downhll, try to go downhll most of the tme.. Intally, the probablty of gong uphll should be relatvely hgh ( hgh temperature ) but as tme (teratons) go on, ths probablty should decrease (the temperature decreases accordng to an annealng schedule). The term annealng comes from the technque of hardenng a metal (.e. fndng a state of ts crystallne lattce that s hghly packed) by hammerng t whle ntally very hot and then at a successon of decreasng temperatures. 1 To see ths, turn.5 nsde out to obtan E = T(log e (p )+log e Z). Thus <E> = p E = T p (log e (p )+log e Z) = T p (log e (p ) Tlog e Z( p ) = T p (log e (p ) Tlog e Z = T p (log e (p ) + F, so <E> F = T p (log e (p ), as clamed. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 00 Krkpatrck, S., Gelatt, C.D., and Vecch, M.P. (1983) Optmzaton by smulated annealng, Scence 0, 1-80. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 Metropols Algorthm The algorthm for smulated annealng s a varant (wth tme-dependent temperature) of the Metropols 3 algorthm. In each step of ths algorthm, a unt of the system s subjected to a small random dsplacement (or transton or flp), and the resultng change E n the energy of the system s computed. If E 0, the dsplacement s accepted. If E > 0, the algorthm proceeds n a probablstc manner: the probablty that the dsplacement wll be accepted s p.exp( E/T) where p s a constant and T s the temperature. Desgn of the Annealng Schedule Intal value of the temperature: T 0 s chosen hgh enough to ensure that vrtually all proposed transtons are accepted by the smulated annealng algorthm; Decrement: Usually, a geometrc progresson of temperatures s used: T k = α.t k 1, where α s a constant slghtly less than 1 (e.g. 0.8-0.99). At each temperature, enough transtons are attempted that ether there are accepted transtons per unt on the average, or the number of attempts exceeds 0 tmes the number of unts; Fnal value of the temperature: The system s frozen and annealng stops f the desred number of acceptances s not acheved at three successve temperatures. If T s large, exp( E/T) approaches 1. Thus p s the probablty that a transton to a hgher energy state wll be accepted when the temperature s nfnte. The use of the expresson exp( E/T) ensures that at thermal equlbrum the Boltzmann dstrbuton of states prevals 4. Ths n turn ensures that, at hgh temperatures, all states have equal probablty of occurrng, whle as T 0, only the states wth mnmum energy have a non-zero probablty of occurrence. 3 Metropols, N., Rosenbluth, A., Rosenbluth, M., Teller, A., and Teller, E. (1953) Equatons of state calculatons by fast computng machnes, Journal of Chemcal Physcs 1, 8-9. 4 Boltzmann, L. (18) Wetere studen über das Wärmeglechgewcht unter gasmolekülen, Stzungsberchte der Mathematschen-Naturwssenschaftlchen Classe der Kaserlchen Akademe der Wssenschaften 5-30. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 8

Smulated Annealng for Combnatoral Optmzaton Smulated annealng s well-suted for solvng combnatoral optmzaton problems. Solutons (or states correspondng to possble solutons) are the states of the system, and the energy functon s a functon gvng the cost of a soluton. Krkpatrck et al. appled ther methods to the Travellng Salesperson problem, fndng near-optmal solutons for up to 000 stes, usng α = 0.9. In order to apply smulated annealng to such a problem, t s necessary to have a set of neurons and an energy functon. The fgure above llustrates the neural layout and ts nterpretaton. ctes sequence Neural confguraton for a tour of 5 ctes C1-C5, n the order C C4 C1 C3 C5 C. Energy Functon for TSP Constrants on the neural actvatons (for a soluton) nclude that there should be: exactly one neuron "on" (actvaton 1) n each row (.e. each cty vsted exactly once) exactly one neuron on, n each column (salesperson only vsts one cty at a tme!). Let us wrte v j for the actvaton level of the neuron n row and column j. One constrant can be expressed as sayng that we want to mnmze e j = (v 1j + v j + v 3j + v 4j + v 5j 1) = ( v j 1) That's for column j. Takng nto account all rows, we want to mnmze E 1 = j e j = j ( v j 1) Smlarly, takng nto account all rows, we want to mnmze E = ( j v j 1) Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 9 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 Objectve Functon An optmzaton problem, of course, comes wth an objectve functon to be mnmzed. In our case, suppose that d j s the dstance from cty to cty j. Suppose that ctes 1 and are adjacent on the salesperson's tour, and that cty 1 s the m th cty vsted. Then cty must be ether m 1 st or m+1 st on the tour, and the contrbuton to the total dstance travelled wll be d 1, (v 1,m v,m 1 + v 1,m v,m+1 ) Remember that n a soluton, f v,m 1 = 1, then v,m+1 = 0, and vce-versa. Generalzng ths, for the total dstance travelled we get the objectve functon E 3 = 0.5 k j d j v k (v j,k 1 + v j,k+1 ) To mnmze E 1, E, and E 3 smultaneously, we mnmze Move Set for Smulated Annealng nverson,,8,9 s replaced by 9,8,, translaton Remove a secton (-8) & replace t between two other consecutve ctes (n ths case 4 and 5). swtchng 8 9 Select two non-consecutve ctes (n ths case and ) and swtch them n the tour. Matlab code for smulated annealng s avalable n tsp.m, avalable at the "Other Matlab code" lnk under Software Avalablty on the class home page. where the k are postve constants. E = k 1 E 1 + k E + k 3 E 3 Reference for parts of ths materal: Neuro-fuzzy and Soft Computng by JSR Jang, CT Sun, and E Mzutan, Prentce-Hall, 199, page 184. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 1

Mean-Feld Annealng Smulated annealng can be slow, and the annealng schedule can be part of the problem. In some cases t s found that most of the crystalzaton of the system takes place around a partcular temperature, termed the crtcal temperature. Fang, Wlson and L 5 tackled the quadratc assgnment problem explotng ths fact. The Quadratc Assgnment Problem Consder the optmal locaton of m plants at n possble stes, where n m, n the followng stuaton: The amount of goods to be transported between plants s gven. There s a cost assocated wth movng goods between stes. The goal s to allocate plants to stes so as to mnmse total cost. 1 1 13 3 goods flows 1 transport cost/unt 3 5 Luyuan Fang, Wllam H. Wlson, and Tao L (1990) Mean feld annealng neural nets for quadratc assgnment, INNC-90- PARIS Proceedngs, 1: 8-8 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 13 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 14 Termnology: x k = 1 f plant k s located at ste - only one plant per ste. 0 x k 1 c j s the cost of transportng 1 unt of goods from ste to ste j. c j 0. d kl s the amount of goods to be transported from plant k to plant l. d kl 0. Cost Functon: f(x) = =1..n j k=1..m l k c j d kl x k x jl Mnmsng ths functon s an NP-hard problem. Neural archtecture for ths problem We choose a two-dmensonal array of neurons, of dmenson mxn. x j represents the state of the,j-th neuron. In a soluton, f a neuron x j = 1, plant j s assgned to ste. Two constrants on the x j n a soluton: one plant per ste =1..n x j = 1 for each plant j every plant must be located at exactly one ste j=1..m x j 1 for every Energy Functon: E = (A/) * =1..n j k=1..m l k c j d kl x k x jl + (B/) * =1..n k=1..m l k x k x l + (C/) * k=1..m (1 =1..n x k ) where A, B, and C are constants. The energy functon s mnmsed f the constrants are satsfed & the cost functon s mnmsed. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 15 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 1

Crtcal Temperature Phenomenon: when temperature T s very hgh, the network reaches an equlbrum pont where all the neurons have smlar actvaton values near 0.5; as T s decreased, ths pont s also lowered; at a certan temperature T c, (the crtcal temperature), ths pont drops down to δ whch depends on the parameters A, B, C and t. Ths s the lowest equlbrum pont at whch all neurons have smlar actvaton values. Behavour below T c as the temperature drops below T c, the neuron actvatons dverge rapdly towards 0 and 1; when the temperature becomes very low, the network settles nto a stable state whch represents a feasble soluton to the problem. The neuron actvaton values do not dverge untl the crtcal temperature s reached. Near the crtcal temperature, the neuron actvatons rapdly dverge towards the two extreme ponts 0 and 1. Below T c, neuron actvatons agan reman relatvely stable. Crtcal Temperature Estmate Let m c and m d be the mean values of c aj and d bl. The lowest equlbrum pont δ s estmated as δ C/(A(n 1)(m 1)m c m d + B(m 1) + nc) The expected crtcal temperature s estmated as T c = 0.5* t (1 δ)[a(m 1)(n 1)m c m d δ + B(m 1)δ + C(nδ 1)] Annealng Schedule for Mean-Feld Annealng: At T = t(1 δ)[a(m 1)(n 1)c max d max δ + B(m 1)δ + C(nδ 1)], smulate untl equlbrum. Around T c, (between T max and T mn ) the temperature changes accordng to T = K T T c /(T max - T mn ) + ε where T s the present temperature and ε and K are constants. At each temperature, terate s steps (where s s large enough to guarantee reachng equlbrum at a temperature above the actual crtcal temperature). At a low temperature near 0, smulate untl equlbrum. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 1 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 18 Smulaton Results & Summary: A Total of four groups of data, each contanng sets. The problem szes of the four groups are 5 x 5, x, 15 x 15 (all symmetrc), and 5 x asymmetrc data. The results are close to the optmal results. optmal mean feld meanfeld/optmal 1 319.8 35.90 1.014 4.5 1.38 1.059 3 191.5 149.0 1.034 4 134.38 13.58 1.0 5 184.50 185.4 1.015 15.5 15.5 1.000 1. 15. 1.04 8 1359.4 1415.50 1.041 9 131.0 134.9 1.009 18.4 189.1 1.03 Results for 5 by 5 Data: Mean Feld Network. Based on randomly generated 5x5 quadratc assgnment problems. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 19 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 0

Crtcal Temperature Plot for 5 by 5 Mean Feld Network Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 1