Self-organising Systems 2 Simulated Annealing and Boltzmann Machines

Ams Reference Keywords Plan Self-organsng Systems Smulated Annealng and Boltzmann Machnes to obtan a mathematcal framework for stochastc machnes to study smulated annealng to study the Boltzmann machne Parts of chapter of Haykn, S., Neural Networks: A Comprehensve Foundaton, Prentce-Hall, 1999. temperature, annealng schedule, Metropols algorthm, combnatoral optmzaton, energy functon, move set, mean-feld annealng, quadratc assgnment problem, cost functon, crtcal temperature, Boltzmann machne statstcal mechancs Metropols algorthm annealng schedule travellng salesperson problem energy functon move sets for TSP mean-feld annealng crtcal temperature Boltzmann machne structure Boltzmann machne learnng Energy mnmzaton Boltzmann machne applcatons Introducton Because (ndustral strength) neural networks may have thousands of degrees of freedom (e.g. weghts) t s possble to get nspraton from the theory of statstcal mechancs. Statstcal mechancs deals wth the macroscopc equlbrum propertes of large systems of elements that are subject to the mcroscopc laws of mechancs. The Boltzmann machne seems to be the frst multlayer learnng machne nspred by statstcal mechancs. It learns the statstcal dstrbuton of a data set, and can use ths for tasks lke pattern completon. [There s also the Cauchy machne - same dea, dfferent dstrbuton functon.] Smulated annealng s an optmzaton technque that uses a thermodynamc metaphor. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 1 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 A Short Detour nto Statstcal Mechancs Consder a system wth many degrees of freedom that can be n many possble states. Let p denote the probablty of state. Then p = 1. Let E denote the energy of the system n state. Statstcal mechancs says that when a system s n thermal equlbrum wth ts envronment, state occurs wth probablty 1 E p = exp ( ) (.3) Z k T B where T s temperature (Kelvn scale) k B s Boltzmann's constant, and Z s a constant. As p = 1, we can derve E Z = exp ( ) (.4) k T B The probablty dstrbuton of.3 s called the Gbbs dstrbuton and the factor exp( E /(k B T)) s called the Boltzmann factor. Note from.3 that lower energy or hgher temperature hgher probablty. As T s reduced, the probablty s concentrated n a smaller subset of low-energy states. Pseudotemperature, Free Energy and Entropy We can mmc ths setup n a neural net context usng a concept of pseudotemperature T. As T has no scale (lke Kelvn) we don't need an analog of k B, so we can wrte E p = 1 E exp Z = exp ( ( ) and ) (.5/) Z T T Notce that f T = Z = 1, then E = log e p, so log p measures somethng lke energy. The Helmholtz free energy, F, of a system, s defned as F = T log e Z (.) The average energy of the system s defned by <E> = p E (.8) Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 3 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 4

Entropy Usng.5 to.8, we derve <E> F = T p log e p (.9)1 The RHS (except for T) s called the entropy H of the system: H = p log e p (.) so <E> F = TH or F = <E> TH (.) When two systems A and A' come nto contact, the total entropy of the two systems tends to ncrease: H + H' 0 In vew of., ths means that the free energy of the system, F, tends to decrease, reachng a mnmum when the two systems reach thermal equlbrum: The mnmum of the free energy of a stochastc system wth respect to the varables of the system s acheved at thermal equlbrum, at whch pont the system s governed by Gbbs' dstrbuton. Smulated Annealng Smulated annealng s an optmzaton technque. In Hopfeld nets, local mnma are used n a postve way, but n optmzaton problems, local mnma get n the way: one must have a way to escape from them. The two deas of smulated annealng are as follows: 1. When optmzng a very large and complex system (.e., a system wth many degrees of freedom), nstead of always gong downhll, try to go downhll most of the tme.. Intally, the probablty of gong uphll should be relatvely hgh ( hgh temperature ) but as tme (teratons) go on, ths probablty should decrease (the temperature decreases accordng to an annealng schedule). The term annealng comes from the technque of hardenng a metal (.e. fndng a state of ts crystallne lattce that s hghly packed) by hammerng t whle ntally very hot and then at a successon of decreasng temperatures. 1 To see ths, turn.5 nsde out to obtan E = T(log e (p )+log e Z). Thus <E> = p E = T p (log e (p )+log e Z) = T p (log e (p ) Tlog e Z( p ) = T p (log e (p ) Tlog e Z = T p (log e (p ) + F, so <E> F = T p (log e (p ), as clamed. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 00 Krkpatrck, S., Gelatt, C.D., and Vecch, M.P. (1983) Optmzaton by smulated annealng, Scence 0, 1-80. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 Metropols Algorthm The algorthm for smulated annealng s a varant (wth tme-dependent temperature) of the Metropols 3 algorthm. In each step of ths algorthm, a unt of the system s subjected to a small random dsplacement (or transton or flp), and the resultng change E n the energy of the system s computed. If E 0, the dsplacement s accepted. If E > 0, the algorthm proceeds n a probablstc manner: the probablty that the dsplacement wll be accepted s p.exp( E/T) where p s a constant and T s the temperature. Desgn of the Annealng Schedule Intal value of the temperature: T 0 s chosen hgh enough to ensure that vrtually all proposed transtons are accepted by the smulated annealng algorthm; Decrement: Usually, a geometrc progresson of temperatures s used: T k = α.t k 1, where α s a constant slghtly less than 1 (e.g. 0.8-0.99). At each temperature, enough transtons are attempted that ether there are accepted transtons per unt on the average, or the number of attempts exceeds 0 tmes the number of unts; Fnal value of the temperature: The system s frozen and annealng stops f the desred number of acceptances s not acheved at three successve temperatures. If T s large, exp( E/T) approaches 1. Thus p s the probablty that a transton to a hgher energy state wll be accepted when the temperature s nfnte. The use of the expresson exp( E/T) ensures that at thermal equlbrum the Boltzmann dstrbuton of states prevals 4. Ths n turn ensures that, at hgh temperatures, all states have equal probablty of occurrng, whle as T 0, only the states wth mnmum energy have a non-zero probablty of occurrence. 3 Metropols, N., Rosenbluth, A., Rosenbluth, M., Teller, A., and Teller, E. (1953) Equatons of state calculatons by fast computng machnes, Journal of Chemcal Physcs 1, 8-9. 4 Boltzmann, L. (18) Wetere studen über das Wärmeglechgewcht unter gasmolekülen, Stzungsberchte der Mathematschen-Naturwssenschaftlchen Classe der Kaserlchen Akademe der Wssenschaften 5-30. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 8

Smulated Annealng for Combnatoral Optmzaton Smulated annealng s well-suted for solvng combnatoral optmzaton problems. Solutons (or states correspondng to possble solutons) are the states of the system, and the energy functon s a functon gvng the cost of a soluton. Krkpatrck et al. appled ther methods to the Travellng Salesperson problem, fndng near-optmal solutons for up to 000 stes, usng α = 0.9. In order to apply smulated annealng to such a problem, t s necessary to have a set of neurons and an energy functon. The fgure above llustrates the neural layout and ts nterpretaton. ctes sequence Neural confguraton for a tour of 5 ctes C1-C5, n the order C C4 C1 C3 C5 C. Energy Functon for TSP Constrants on the neural actvatons (for a soluton) nclude that there should be: exactly one neuron "on" (actvaton 1) n each row (.e. each cty vsted exactly once) exactly one neuron on, n each column (salesperson only vsts one cty at a tme!). Let us wrte v j for the actvaton level of the neuron n row and column j. One constrant can be expressed as sayng that we want to mnmze e j = (v 1j + v j + v 3j + v 4j + v 5j 1) = ( v j 1) That's for column j. Takng nto account all rows, we want to mnmze E 1 = j e j = j ( v j 1) Smlarly, takng nto account all rows, we want to mnmze E = ( j v j 1) Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 9 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 Objectve Functon An optmzaton problem, of course, comes wth an objectve functon to be mnmzed. In our case, suppose that d j s the dstance from cty to cty j. Suppose that ctes 1 and are adjacent on the salesperson's tour, and that cty 1 s the m th cty vsted. Then cty must be ether m 1 st or m+1 st on the tour, and the contrbuton to the total dstance travelled wll be d 1, (v 1,m v,m 1 + v 1,m v,m+1 ) Remember that n a soluton, f v,m 1 = 1, then v,m+1 = 0, and vce-versa. Generalzng ths, for the total dstance travelled we get the objectve functon E 3 = 0.5 k j d j v k (v j,k 1 + v j,k+1 ) To mnmze E 1, E, and E 3 smultaneously, we mnmze Move Set for Smulated Annealng nverson,,8,9 s replaced by 9,8,, translaton Remove a secton (-8) & replace t between two other consecutve ctes (n ths case 4 and 5). swtchng 8 9 Select two non-consecutve ctes (n ths case and ) and swtch them n the tour. Matlab code for smulated annealng s avalable n tsp.m, avalable at the "Other Matlab code" lnk under Software Avalablty on the class home page. where the k are postve constants. E = k 1 E 1 + k E + k 3 E 3 Reference for parts of ths materal: Neuro-fuzzy and Soft Computng by JSR Jang, CT Sun, and E Mzutan, Prentce-Hall, 199, page 184. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 1

Mean-Feld Annealng Smulated annealng can be slow, and the annealng schedule can be part of the problem. In some cases t s found that most of the crystalzaton of the system takes place around a partcular temperature, termed the crtcal temperature. Fang, Wlson and L 5 tackled the quadratc assgnment problem explotng ths fact. The Quadratc Assgnment Problem Consder the optmal locaton of m plants at n possble stes, where n m, n the followng stuaton: The amount of goods to be transported between plants s gven. There s a cost assocated wth movng goods between stes. The goal s to allocate plants to stes so as to mnmse total cost. 1 1 13 3 goods flows 1 transport cost/unt 3 5 Luyuan Fang, Wllam H. Wlson, and Tao L (1990) Mean feld annealng neural nets for quadratc assgnment, INNC-90- PARIS Proceedngs, 1: 8-8 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 13 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 14 Termnology: x k = 1 f plant k s located at ste - only one plant per ste. 0 x k 1 c j s the cost of transportng 1 unt of goods from ste to ste j. c j 0. d kl s the amount of goods to be transported from plant k to plant l. d kl 0. Cost Functon: f(x) = =1..n j k=1..m l k c j d kl x k x jl Mnmsng ths functon s an NP-hard problem. Neural archtecture for ths problem We choose a two-dmensonal array of neurons, of dmenson mxn. x j represents the state of the,j-th neuron. In a soluton, f a neuron x j = 1, plant j s assgned to ste. Two constrants on the x j n a soluton: one plant per ste =1..n x j = 1 for each plant j every plant must be located at exactly one ste j=1..m x j 1 for every Energy Functon: E = (A/) * =1..n j k=1..m l k c j d kl x k x jl + (B/) * =1..n k=1..m l k x k x l + (C/) * k=1..m (1 =1..n x k ) where A, B, and C are constants. The energy functon s mnmsed f the constrants are satsfed & the cost functon s mnmsed. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 15 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 1

Crtcal Temperature Phenomenon: when temperature T s very hgh, the network reaches an equlbrum pont where all the neurons have smlar actvaton values near 0.5; as T s decreased, ths pont s also lowered; at a certan temperature T c, (the crtcal temperature), ths pont drops down to δ whch depends on the parameters A, B, C and t. Ths s the lowest equlbrum pont at whch all neurons have smlar actvaton values. Behavour below T c as the temperature drops below T c, the neuron actvatons dverge rapdly towards 0 and 1; when the temperature becomes very low, the network settles nto a stable state whch represents a feasble soluton to the problem. The neuron actvaton values do not dverge untl the crtcal temperature s reached. Near the crtcal temperature, the neuron actvatons rapdly dverge towards the two extreme ponts 0 and 1. Below T c, neuron actvatons agan reman relatvely stable. Crtcal Temperature Estmate Let m c and m d be the mean values of c aj and d bl. The lowest equlbrum pont δ s estmated as δ C/(A(n 1)(m 1)m c m d + B(m 1) + nc) The expected crtcal temperature s estmated as T c = 0.5* t (1 δ)[a(m 1)(n 1)m c m d δ + B(m 1)δ + C(nδ 1)] Annealng Schedule for Mean-Feld Annealng: At T = t(1 δ)[a(m 1)(n 1)c max d max δ + B(m 1)δ + C(nδ 1)], smulate untl equlbrum. Around T c, (between T max and T mn ) the temperature changes accordng to T = K T T c /(T max - T mn ) + ε where T s the present temperature and ε and K are constants. At each temperature, terate s steps (where s s large enough to guarantee reachng equlbrum at a temperature above the actual crtcal temperature). At a low temperature near 0, smulate untl equlbrum. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 1 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 18 Smulaton Results & Summary: A Total of four groups of data, each contanng sets. The problem szes of the four groups are 5 x 5, x, 15 x 15 (all symmetrc), and 5 x asymmetrc data. The results are close to the optmal results. optmal mean feld meanfeld/optmal 1 319.8 35.90 1.014 4.5 1.38 1.059 3 191.5 149.0 1.034 4 134.38 13.58 1.0 5 184.50 185.4 1.015 15.5 15.5 1.000 1. 15. 1.04 8 1359.4 1415.50 1.041 9 131.0 134.9 1.009 18.4 189.1 1.03 Results for 5 by 5 Data: Mean Feld Network. Based on randomly generated 5x5 quadratc assgnment problems. Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 19 Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 0

Crtcal Temperature Plot for 5 by 5 Mean Feld Network Smulated Annealng/Boltzmann Machne Notes Bll Wlson, 004 1