Scale-invariant Feature Extraction of Neural Network and Renormalization Group Flow

Size: px

Start display at page:

Download "Scale-invariant Feature Extraction of Neural Network and Renormalization Group Flow"

Calvin Cunningham
5 years ago
Views:

1 KEK-TH-2029 arxv: v1 [hep-th] 22 Jan 2018 Scale-nvarant Feature Extracton of Neural Network and Renormalzaton Group Flow Satosh Iso a,b, Shotaro Shba a and Sumto Yokoo a,b a Theory Center, Hgh Energy Accelerator Research Organzaton (KEK), b Graduate Unversty for Advanced Studes (SOKENDAI), Tsukuba, Ibarak , Japan Abstract Theoretcal understandng of how deep neural network (DNN) extracts features from nput mages s stll unclear, but t s wdely beleved that the extracton s performed herarchcally through a process of coarse-granng. It remnds us of the basc concept of renormalzaton group (RG) n statstcal physcs. In order to explore possble relatons between DNN and RG, we use the Restrcted Boltzmann machne (RBM) appled to Isng model and construct a flow of model parameters (n partcular, temperature) generated by the RBM. We show that the unsupervsed RBM traned by spn confguratons at varous temperatures from T = 0 to T = 6 generates a flow along whch the temperature approaches the crtcal value T c = Ths behavor s opposte to the typcal RG flow of the Isng model. By analyzng varous propertes of the weght matrces of the traned RBM, we dscuss why t flows towards T c and how the RBM learns to extract features of spn confguratons. 1

2 1 Introducton Machne learnng has attracted nterdscplnary nterests as the core method of artfcal ntellgence, partcularly of bg data scence, and s now wdely used to dscrmnate subtle mages by extractng specfc features hdden n complcated nput data. A deep neural network (DNN), whch s motvated by human brans, s one of well-known algorthms [1]. Despte ts enormous successes, t s stll unclear why DNN works so well and how DNN can effcently extract specfc features. In dscrmnatng mages, we frst provde samples of nput mages wth assgned labels, such as a cat or a dog, and then tran the neural network (NN) so as to correctly predct the labels of new, prevously unseen, nput mages: ths s the supervsed learnng and ts ablty of predcton depends on how much relevant features the NN can extract. On the other hand, n unsupervsed learnng algorthms, a NN s traned wthout assgnng labels to data, but traned so as to generate output mages that are as close to the nput ones as possble. If the NN s successfully traned to reconstruct the nput data, t must have acqured specfc features of the nput data. Wth ths n mnd, unsupervsed learnngs are often adopted for pre-tranng of supervsed NNs. How can DNN effcently extract features? Specfc features characterstc to nput data usually have herarchcal structures. An mage of a cat can stll be dentfed as an anmal n a very low resoluton mage but one may not be able to dstngush t from a dog. Thus t s plausble that depth of neural networks reflects such herarchy of features. Namely DNN learns low-level (mcroscopc) characterstcs n upper stream of the network and gradually extracts hgher-level (macroscopc) characterstcs as the nput data flow downstream. In other words, the ntal data wll get coarse-graned towards output. Ths vewpont s remnscent of the renormalzaton group (RG) n statstcal physcs and quantum feld theores, and varous thoughts and studes are gven [2 9] based on ths analogy. Especally, n a semnal paper [4], Mehta and Schwab proposed an explct mappng between the RG and the Restrcted Boltzmann Machne (RBM) [1, 10 14]. RG s the most mportant concept and technology to understand the crtcal phenomena n statstcal physcs and also plays an essental role to constructvely defne quantum feld theores on lattce. It s based on the dea (and proved by Kenneth Wlson [15]) that the long-dstant macroscopc behavor of a many body system s unversally descrbed by relevant operators (relevant nformaton) around a fxed pont, and not affected by mcroscopc detals n the contnuum lmt. Through reducton of degrees of freedom n RG, the relevant nformaton s emphaszed whle other rrelevant nformaton s dscarded. Partcularly, suppose that the statstcal model s descrbed by a set of parameters {λ α }, and that the parameters are mapped to a dfferent set { λ α } by RG transformatons 1. Repeatng such RG transformatons, we can draw a flow dagram n the parameter space of the statstcal model, {λ α } { λ α } { λα }. (1) 1 In order to descrbe the RG transformaton exactly, nfntely many parameters are necessary to be ntroduced. But t can be usually well-approxmated by a fnte number of parameters. 2

3 These RG flows control the behavor of the statstcal model near the crtcal pont where a second order phase transton occurs. A smplest verson of RBM s a NN consstng of two layers, a vsble layer wth varables {v = ±1} and a hdden layer wth varables {h a = ±1}, that are coupled to each other through the Hamltonan Φ({v }, {h a }) = (,a W a v h a + b (v) v + a b (h) a h a ). (2) A probablty dstrbuton of a confguraton {v, h a } s gven by p({v }, {h a }) = 1 Z e Φ({v },{h a}) (3) where we defned the partton functon by Z = {v,h a} e Φ({v },{h a}). No ntra-layer couplngs are ntroduced n the RBM. Now suppose that the RBM s already traned and the parameters of the Hamltonan (2), namely {W a, b (v), b (h) a }, are already fxed through a process of tranng. The probablty dstrbuton p({v }, {h a }) also provdes the followng condtonal probabltes for {h a } (or {v }) wth the other varables beng kept fxed; p({h a }, {v }) p({h a } {v }) = {h p({h a} a}, {v }) p({v } {h a }) = (4) p({h a }, {v }) {v } p({h a}, {v }). (5) These condtonal probabltes generate a flow of dstrbutons, and consequently a flow of parameters {λ α } of the correspondng statstcal model. Suppose that we have a set of N( 1) ntal confguratons {v = σ A } (A = 1,..., N), whch are generated by a statstcal model wth parameters λ α, such as the Isng model at temperature T. In the large N lmt, the dstrbuton functon q 0 ({v }) = 1 δ(v σ A ) (6) N fathfully characterzes the statstcal model wth parameters λ α. Multplyng q 0 ({v }) by the condtonal probabltes (4) and (5) teratvely, we can generate a flow of probablty dstrbutons as A q 0 ({v }) r 1 ({h a }) = {v } p({h a } {v })q 0 ({v }) (7) r 1 ({h a }) q 1 ({v }) = {h a} p({v } {h a })r 1 ({h a }) (8) and so on for q n ({v }) r n+1 ({h a }) and r n+1 ({h a }) q n+1 ({v }). Let us focus on Eq. (7). If the probablty dstrbuton r 1 ({h a }) s well approxmated by the Boltzmann dstrbuton of 3

4 the same statstcal model wth dfferent parameters λ α, we can say that the RBM generates a transformaton 2 from {λ α } to { λ α }. If more than two layers are stacked teratvely, we can obtan a flow of parameters as n Eq. (1). Another way to obtan a flow s to look at the transformatons q 0 ({v }) q 1 ({v }) q 2 ({v }) and translate the flow of probablty dstrbutons nto a flow of parameters {λ α }. In the present paper, we consder the latter flow to dscuss a relaton wth RG. Mehta and Schwab [4] ponted out smlarty between RG transformatons of Eq. (1) and the above flows of parameters n the unsupervsed RBM. But n order to show that the transformaton of parameters {λ α } n the RBM ndeed generates the conventonal RG transformaton, t s necessary to show that the weght matrx W a and the bases b (v), b (h) a of the RBM must be approprately chosen so as to generate the correct RG transformaton that performs coarse-granng of nput confguratons. In Ref. [4], mult-layer RBM s employed as an unsupervsed-learnng NN, and the weghts and the bases are chosen by mnmzng the KL dvergences (relatve entropy) between the nput probablty dstrbuton and the reconstructed dstrbuton by ntegratng (margnalzng) over the hdden varables. The authors suggested the smlarty by lookng at the local spn structures n the hdden varables, but they dd not show t explctly that the weghts determned by the unsupervsed learnng actually generate the flow of RG transformatons. The arguments [4] and msconcepton n the lterature are crtczed by Ref. [6]. In a wder context, the crtcsm s related to the followng queston: what determnes whether a specfc feature of nput data s relevant or not? In RG transformatons of statstcal models, long-wave length (macroscopc) modes are hghly respected whle short-wave length modes are dscarded as nose. In ths way, RG transformatons can extract unversal behavor of the model at long-wave length. But, of course, t s so because we are nterested n the macroscopc behavor of the system: f we are nstead nterested n short-wave length physcs, we need to extract opposte features of the model. Thus, we may say that extracton of relevant features needs pre-exstng bases to judge, and supervsed learnng s necessary to gve such bases to the machne. However ths does not mean that unsupervsed learnngs do not have anythng to do wth the RG. Even n unsupervsed learnngs, a NN automatcally notces and extracts some knd of features of the nput data and the flow generated by the traned NN reflects such features. In the present paper, we nvestgate relatonshp between the RBM and the RG by further studyng the flows of dstrbutons, Eqs. (7) and (8), that the unsupervsed RBM generates. Here notce that n defnng the flow of (7) and (8), we need to specfy how we have traned the RBM because the tranng determnes the propertes of the weghts and bases, and accordngly the behavor of the flow. In ths paper we mostly use the followng three dfferent ways of tranngs. One type of RBM (we call type V) s traned by confguratons at varous temperatures from low to hgh. Other two types (type H and L) are traned by confguratons 2 The stuaton s smlar to the footnote 1, and nfntely many parameters are necessary to represent the probablty dstrbuton p({h a }) n terms of the statstcal model. 4

5 only at hgh (and only at low) temperatures. Then we translate these flows of probablty dstrbutons defned by Eqs. (7) and (8) nto flows of temperature of the Isng model, T T T. (9) In order to measure temperature, we prepare another NN traned by a supervsed learnng. The results of our numercal smulatons lead to a surprsng concluson. In the type V RBM that has adequately learned the features of confguratons at varous temperatures, we found that the temperature approaches the crtcal pont, T T c, along the RBM flow. The behavor s opposte to the conventonal RG flow of Isng model. The paper s organzed as follows. In secton 2, we explan the basc settngs and the methods of our nvestgatons. We prepare sample mages of the spn confguratons of the Isng model, and tran RBMs by the confguratons wthout assgnng labels of temperature. Then we construct flows of parameters (.e., temperature) generated by the traned RBM 3. In secton 3, we show varous results of the numercal smulatons, ncludng the RBM flows of parameters. In secton 4, we analyze propertes of the weght matrces W a usng the method of sngular value decomposton. The fnal secton s devoted to summary and dscussons. Our man results of the RBM flow and conjectures about the feature extractons of the unsupervsed RBM are wrtten n Sec Methods We explan varous methods for numercal smulatons to nvestgate relatons between the unsupervsed RBM and the RG of Isng model. Though most methods n ths secton are standard and well known, we explan them n detals to make the paper self-contaned. In Sec. 2.3, we explan the central method of generatng the RBM flows. Basc materals of the RBM are gven n Sec The other two sectons, Secs. 2.1 and 2.4, can be skpped over unless one s nterested n how we generate the ntal spn confguratons and measure temperature of a set of confguratons. 2.1 Monte-Carlo smulatons of Isng model We frst construct samples of confguratons of the two-dmensonal Isng model by usng Monte-Carlo smulatons. The spn varables σ x,y = ±1 are defned on a two dmensonal lattce of sze L L. The ndex (x, y) represents each lattce ste and takes x, y = 0, 1,..., L 1. 3 Two-dmensonal Isng model s the smplest statstcal model to exhbt the second order phase transton, and there are many prevous studes of the Isng model usng machne learnngs. See e.g., [16 21]. 5

6 The Isng model Hamltonan s gven by H = J L 1 x,y=0 σ x,y (σ x+1,y + σ x 1,y + σ x,y+1 + σ x,y 1 ). (10) It descrbes a ferromagnetc model for J > 0 and an ant-ferromagnetc model for J < 0. Here we mpose the perodc boundary condtons for the spn varables, σ L,y := σ 0,y, σ 1,y := σ L 1,y, σ x,l := σ x,0, σ x, 1 := σ x,l 1. (11) Generatons of spn confguratons at temperature T are performed by the method of Metropols Monte Carlo (MMC) smulaton. In the method, we frst generate a random confguraton {σ x,y }. We then choose one of the spns σ x,y and flp ts spn wth the probablty p x,y = { 1 (when dex,y < 0) e dex,y/k BT (when de x,y > 0) (12) where de x,y s the change of energy of ths system de x,y = 2Jσ x,y (σ x+1,y + σ x 1,y + σ x,y+1 + σ x,y 1 ). (13) The probablty of flppng the spn (12) satsfes the detaled balance condton P s s ρ s = P s sρ s where ρ s e Es/kBT s the canoncal dstrbuton of the spn confguraton s = {σ x,y } at temperature T. Thus after many teratons of flppng all the spns, the confguraton approaches the equlbrum dstrbuton at T. Snce all physcal quanttes are wrtten n terms of a combnaton of J/k B T, we can set the Boltzmann constant k B and the nteracton parameter J to be equal to 1 wthout loss of generalty. In the followng analyss, we set the lattce sze L 2 = and repeat the procedure of MMC smulatons 100L 2 = tmes to construct spn confguratons. In our smulatons, we generated spn confguratons at varous temperatures T = 0, 0.25, 0.5,..., 6. 4 Some of typcal spn confguratons are shown n Fg. 1. Fgure 1: Examples of spn confguratons at temperatures T = 0, 2, 3, 6 4 For T = 0, we practcally set T = 10 6 for numercal calculatons. 6

7 2.2 Unsupervsed learnng of the RBM Our man motvaton n the present paper s to study whether the RBM s related to the RG n statstcal physcs. In ths secton, we revew the basc algorthm of the RBM [1, 10 14] whch s traned by the confguratons constructed by the MMC method of Sec As explaned Fgure 2: (a) Two-layer neural network of the RBM wth a vsble layer {v } and a hdden layer {h a }. These two layers are coupled but there are no ntra-layer couplngs. (b) The RBM generates reconstructed confguratons from {v } to {ṽ } through the hdden confguraton {h a }. n the Introducton, the RBM conssts of two layers as shown n the left panel of Fg. 2. The ntal confguratons {σ x,y } of Isng model generated at varous temperatures are nput nto the vsble layer {v }. The number of neurons n the vsble layer s fxed at N v = L 2 = 100 ( = 1,..., N v ) to represent the spn confguratons of Isng model. On the other hand, the hdden layer can take an arbtrary number of neurons, N h. In the present paper, we consder 7 dfferent szes; N h = 16, 36, 64, 81, 100, 225 and 400. Thus the N h spn varables n the hdden layer are gven by {h a } for a = 1,..., N h. The RBM s a generatve model of probablty dstrbutons based on Eq. (3). We frst explan how we can tran the RBM by optmzng the weghts W a and the bases b (v), b (h) a. Our goal s to represent the gven probablty dstrbuton q 0 ({v }) n Eq. (6), as fathfully as possble, n terms of a model probablty dstrbuton defned by p({v }) = 1 Z e Φ({v },{h a}). (14) {h a} The partton functon Z = {v,h a} e Φ({v },{h a}) s dffcult to evaluate, but summatons over only one set of spn varables (e.g. over {v }) are easy to perform because of the absence of the ntra-layer couplngs. It also makes the condtonal probabltes (4) and (5) to be 7

8 rewrtten as products of probablty dstrbutons of each spn varable; p({h a } {v }) = a p({v } {h a }) = p(h a {v }) = a p(v {h a }) = 1 [ ( )] (15) 1 + exp 2h a W av + b (h) a 1 [ ( )]. (16) 1 + exp 2v a W ah a + b (v) Then the expectaton values of spn varables n the hdden (or vsble) layer n the background of spn confguratons n the other layer are calculated as ) h a {v } = tanh v {ha} = tanh ( ( a W a v + b (h) a W a h a + b (v) ) (17). (18) Now the task s to tran the RBM so as to mnmze the dstance between two probablty dstrbutons of q({v }) and p({v }) by approprately choosng the weghts and the bases. The dstance s called Kullback-Lebler (KL) dvergence, or relatve entropy, and gven by KL(q p) = {v } q({v }) log q({v }) p({v }) = const. q({v }) log p({v }). (19) {v } If two probabltes are equal, the KL dvergence vanshes. Otherwse dervatves of KL(q p) wth respect to the weght W a and the bases b (v), b (h) a are gven by where averages are defned by KL(q p) W a = v h a data v h a model KL(q p) b (v) = v data v model (20) KL(q p) b (h) a = h a data h a model, A({v }) data = {v } q({v })A({v }) (21) A({v }, {h a }) model = {v },{h a} p({v }, {h a })A({v }, {h a }), (22) and h a n data s replaced by h a {v } of Eq. (17). In tranng the RBM, we change the weghts and bases so that the KL dvergence s reduced. Usng the method of back 8

9 propagaton [22], we renew values of the weghts and bases as where W W new = W + δw b (v) b (v) new = b (v) + δb (v) b (h) b (h) new = b (h) + δb (h) (23) δw a = ɛ ( v h a data v h a model ) δb (v) = ɛ ( v data v model ) δb (h) a = ɛ ( h a data h a model ). (24) Here ɛ denotes the learnng rate, whch we set to 0.1. The frst terms data are easy to calculate, but the second terms model are dffcult to evaluate snce t requres the knowledge of the full partton functon Z. To avod ths dffculty, we need to use the method of Gbbs samplng to approxmately evaluate these expectaton values model. Practcally we employ a more smplfed method, whch s called the method of contrastve dvergence (CD) [23 25]. The dea s very smple, and remnscent of the mean feld approxmaton n statstcal physcs. Gven the nput data of the vsble spn confguratons {v A(0) = σ A }, the expectaton value of the hdden spn varable h a can be easly calculated as Eq. (17). We wrte the expectaton value as h A(1) a := h a A(0) {v } = tanh ( W a v A(0) + b (h) a ). (25) Then n ths background of the hdden spn confguratons, the expectaton value of v can be agan easly calculated by usng Eq. (18). We wrte t as Then we obtan h A(2) a v A(1) := v A(1) {h a } = tanh ( a W a h A(1) a + b (v) ). (26) = h a A(1) {v, and so on. We can terate these procedure many tmes } and replace the second terms n Eq. (20) by the expectaton values generated by ths method. In dong the numercal smulatons n the present paper, we adopt the smplest verson of CD, called CD 1, whch gves us the followng approxmate formulas: v data = 1 σ A, N A h a data = 1 N A h A(1) a, v h a data = 1 N A σ A h A(1) a, (27) 9

10 and v model = 1 N A v A(1), h a model = 1 N A h A(2) a, v h a model = 1 N A v A(1) h A(2) a. (28) Here σ A denotes each spn confguraton {σ x,y } generated by the method of Sec As nput data to tran the RBM, we generated 1000 spn confguratons for each of 25 dfferent temperatures T = 0, 0.25,..., 6. Then the ndex A runs from 1 to N = In some cases, as we wll see n Sec. 3.2, we use only a restrcted set of confguratons at hgh or low temperatures, then the ndex runs A = 1,..., N = 1000 (number of temperatures). We repeat the renewal procedure (23) many tmes (5000 epochs), and obtan adjusted values of the weghts and bases. In ths way we tran the RBM by usng a set of confguratons {v A(0) = σ A }, (A = 1,..., N). 2.3 Generaton of RBM flows As dscussed n the Introducton, once the RBM s traned and the weghts and bases are fxed, the RBM generates a sequence of probablty dstrbutons (8). Then we translate the sequence nto a flow of parameters (.e., temperature). In generatng the sequence, the ntal set of confguratons should be prepared separately n addton to the confguratons that are used to tran the RBM 5. We can also generate a flow of parameters n a slghtly dfferent way. For a specfc confguraton v = v (0), we can defne a sequence of confguratons followng Eqs. (25) and (26) as {v (0) } {h (1) a } {v (1) } {h (2) a } {v (2) }. (29) The rght panel of Fg. 2 shows a generaton of new confguratons from {v } to {ṽ } through {h a }. Snce each value of v (n) and h (n) a (for n > 0) s defned by an expectaton value as n Eqs. (25) and (26), t does not take an nteger ±1 but a fractonal value between ±1. In order to get a flow of spn confguratons, we need to replace these fractonal values by ±1 wth a probablty (1± v (n) )/2 or (1± h a (n) )/2. It turns out that the replacement s usually a good approxmaton snce the expectaton values are lkely to take values close to ±1 owng to the property of the traned weghts W a 1. In ths way, we obtan a flow of spn confguratons {v (0) } {v (1) } {v (2) } {v (n) } (30) startng from the ntal confguraton {v (0) }. The flow of confguratons s transformed to a flow of temperature dstrbutons by usng the method explaned n Sec Thus we generate spn confguratons n addton to the confguratons used for tranng the RBM. 10

11 2.4 Temperature measurement by a supervsed-learnng NN Next we desgn a neural network (NN) to measure temperature of spn confguratons. The NN for the supervsed learnng has three layers wth one hdden layer n the mddle (See Fg. 3). Fgure 3: Three-layer neural network for supervsed learnng wth an nput layer {z (1) }, a hdden layer {z a (2) } and an output layer {z µ (3) }. The nput layer {z (1) } conssts of L 2 = 100 neurons n whch we nput spn confguratons of Isng model. The output layer {z µ (3) } has 25 neurons whch correspond to 25 dfferent temperatures that we want to measure. The number of neurons n the hdden layer {z a (2) } s set to 64. We tran ths three-layer NN by a set of spn confguratons, each of whch has a label of temperature. Thus ths s the supervsed learnng. As nput data to tran the NN, we use the same N = confguratons whch were used to tran the RBM 6. The tranng of the NN s carred out as follows. Denote the nput data as Z (1) A = σ A (31) where A = 1,..., N, and σ A are the spn confguratons {σ x,y = ±1} as n Sec The nput data s transformed to Z (2) Aa n the hdden layer by the followng nonlnear transformaton; ( 100 ) ( ) Z (2) Aa = f Z (1) A W (1) a + b(1) a =: f U (1) Aa (32) =1 where W (1) a s an weght matrx and b (1) a s a bas. The actvaton functon f(x) s chosen as 6 In order to check the performance of the NN, namely to see how precsely the machne can measure the temperature of a new set of confguratons, we use other confguratons that are prepared for generatng the sequence of probablty dstrbutons of the RBM n Sec We wll show the results of the performance n Sec

12 f(x) = tanh(x). Z (2) Aa s transformed to Z(3) Aµ label, namely temperature, of each confguraton. The output Z (3) Aµ Z (3) Aµ = g ( 64 a=1 Z (2) Aa W aµ (2) + b (2) µ n the output layer, whch corresponds to the ) ( =: g U (2) Aµ ) s gven by where W (2) aµ and b (2) µ are another weght matrx and bas. The functon g(x) s the softmax functon (33) ( g U (2) Aµ ) = exp U (2) Aµ 25 (2) ν=1 exp U Aν, (34) so that Z (3) Aµ can be regarded as a probablty snce µ Z(3) Aµ = 1 s satsfed for each confguraton A. Thus the NN transforms an nput spn confguraton Z (1) A to the probablty Z (3) Aµ of the confguraton to take the µ-th output value (.e., temperature). Each of the nput confguratons Z (1) A s generated by the MMC method at temperature T. T takes one of the 25 dscrete values T = 1 4 (ν 1), (ν = 1,..., 25). If the A-th confguraton s labelled by ν, we want the NN to gve an output Z (3) Aµ as close as the followng one-hot representaton: d (ν) A = (0,..., 0, ˇ1 (ν), 0,..., 0) A, (35) or ts µ-th component s gven by d (ν) Aµ = δ µν. It can be nterpreted as a probablty of the confguraton A to take the µ-th output. Then the task of the supervsed tranng s to mnmze the cross entropy, whch s equvalent to the KL dvergence of the desred probablty d (ν) Aµ and the output probablty Z(3) Aµ. The loss functon s thus gven by the cross entropy, E A = KL(d (ν) Aµ Z(3) Aµ ) = µ d (ν) Aµ log Z(3) Aµ. (36) Then, usng the method of back propagaton, we renew values of the weghts and bases from the lower to the upper stream; W (l) W (l) new = W (l) + δw (l) b (l) b (l) new = b (l) + δb (l). (37) The varatons of δw (l), δb (l) at the lower stream are gven by δw (2) aµ = ɛ N δb (2) µ = ɛ N 12 A A (Z (2) ) T aa (3) Aµ (3) Aµ (38)

13 where (3) Aµ = Z(3) Aµ d(ν) Aµ. The learnng rate ɛ s set to 0.1. Then usng these lower stream varatons, we change the upper stream weghts and bases as δw (1) a = ɛ N δb (1) a = ɛ N A A (Z (1) ) T A (2) Aa (2) Aa (39) where (2) Aa = µ ( ) (3) Aµ (W (2) ) T µaf U (1) Aa. (40) We repeat ths renewal procedure many tmes (7500 epochs) for the tranng of the NN to obtan sutably adjusted values of the weghts and bases. Fnally we note how we measure temperature of a confguraton. If the sze of a confguraton generated at temperature T s large enough, say L = , the traned NN wll reproduce the temperature of the confguraton qute fathfully. However our confguratons are small szed wth only L = 10. Thus we nstead need an ensemble of many spn confguratons and measure a temperature dstrbuton of the confguratons. The supervsed learnng gves us ths probablty dstrbuton of temperature. 3 Numercal results In ths secton we present our numercal results for the flows generated by unsupervsed RBM, and dscuss a relaton wth the renormalzaton group flow of Isng model. Our man results of the RBM flows are wrtten n Sec Supervsed learnng for temperature measurement Before dscussng the unsupervsed RBM, let us frst see how we traned the NN to measure temperature. In Fg. 4, we plot behavors of the loss functon (36) as we terate renewals of the weghts and bases (37). The blue (lower) lne shows the tranng error, namely values of the loss functon (36) after teratons of tranng usng confguratons. It s contnuously decreasng, even after 7500 epochs. On the other hand, the red (upper) lne shows the test error, namely values of the loss functon for addtonal confguratons whch are not used for the tranng. Ths s also decreasng at frst, but after 6000 epochs t becomes almost constant. After 7500 epochs, n fact, t turns to ncrease. Ths means the machne becomes over-traned, therefore we stopped the learnng at 7500 epochs. In Fg. 5 we show probablty dstrbutons of temperature ths NN measures. Here we use confguratons at T = 0, 2, 3, 6 whch are not used for the tranng. Though they are not 13

14 Fgure 4: Tranng error and test error (up to 7500 epochs) Fgure 5: Probablty dstrbutons of measured temperatures for varous sets of confguratons generated at T = 0, 2, 3, 6 respectvely. Temperature of the confguratons can be dstngushed by lookng at the shapes of the dstrbutons. sharply peaked at the temperatures where the confguratons are generated 7, each of them has characterstc shape that s dfferent temperature by temperature. Thus t s possble to dstngush the temperature of the nput confguratons by lookng at the shape of the probablty dstrbuton, even f these confguratons are not used for the tranng of the NN. In the followng, by usng ths NN, we measure temperature of confguratons that are 7 There are two reasons for ths broadenng of the dstrbutons. One s due to the fnteness of the sze of a confguraton N = L L = Another s due to the lmt of measurng temperature by the NN. If the sze of a confguraton was nfnte and f the ablty of dscrmnatng subtle dfferences of dfferent temperature confguratons was lmtless, we would have obtaned a very sharp peak at the labelled temperature. 14

15 generated by the RBM flow. 3.2 Unsupervsed RBM flows Now we present the man results of the present paper, namely the flows generated by the unsupervsed RBM. We sometmes call t the RBM flow. As dscussed n the Introducton, f the RBM s smlar to the the conventonal RG n that t possesses a functon of coarsegranng, the RBM flow must go away from the crtcal pont T c = In order to check t we construct three dfferent types of unsupervsed RBMs, whch we call type V, type L, and type H respectvely, usng the method of Sec Each of them s traned by a dfferent set of spn confguratons generated at dfferent set of temperatures. We then generate flows of temperature dstrbutons by usng these traned RBMs, followng the methods of Secs. 2.3 and 2.4. Type V RBM: Traned by confguratons at T = {0, 0.25, 0.5,, 6} Frst we construct type V RBM, whch s traned by confguratons at temperatures rangng wdely from low to hgh, T = 0, 0.25,..., 6. The temperature range ncludes the temperature T = 2.25 near T c. After tranng s completed, ths unsupervsed RBM wll have learned features of spn confguratons at these temperatures. Once the tranng s fnshed, we then generate a sequence of reconstructed confguratons as n Eq. (30) usng the methods n Sec For ths, we prepare two dfferent sets of ntal confguratons. One s a set of confguratons at T = 0, and another at T = 6. These ntal confguratons are not used for the tranng of the RBM. Then by usng the supervsed NN n Sec. 3.1, we measure temperature and translate the flow of confguratons to a flow of temperature dstrbutons. In Fgs. 6 and 7, we plot temperature dstrbutons of confguratons that are generated by teratng the RBM reconstructon n Sec The tr n the legends means the numbers of teratons n by the unsupervsed RBM. Fg. 6 shows a flow of temperature dstrbutons startng from spn confguratons generated at T = 0. Fg. 7 starts from T = 6. In all the fgures, the black lnes are the measured temperature dstrbutons of the ntal confguratons 8. Colored lnes show temperature dstrbutons of the reconstructed confguratons {v (n) } after varous numbers of teratons. The left panels show the temperature dstrbutons at small teratons (up to 10 n Fg. 6 and 50 n Fg. 7), whle the rght panels are at larger teratons up to These results ndcate that the crtcal temperature T c s a stable fxed pont of the flows n type V RBM. It s apparently dfferent from a nave expectaton that the RBM flow should show the same behavor as the RG flow. Indeed t s n the opposte drecton. From whchever temperature T = 0 or T = 6 we start the RBM teraton, the peak of the temperature dstrbutons approaches the crtcal pont (T = 2.27). 8 As dscussed n the footnote 7, these dstrbutons are not sharply peaked at the temperature at whch the confguratons are generated. 15

16 Fgure 6: Temperature dstrbutons after varous numbers of teratons of type V RBM, whch s traned by the confguratons at T = 0, 0.25,..., 6. The orgnal confguratons are generated at T = 0. After only several teratons, the temperature dstrbuton s peaked around T c, and stablze there: T c s a stable fxed pont of the flow. Fgure 7: Temperature dstrbutons after varous numbers of teratons of the same RBM as Fg. 6. The orgnal confguratons are generated at T = 6. After 50 teratons, the dstrbuton stablzes at T c. In order to confrm the above behavor, we provde another set of confguratons at T = 2.25 as ntal confguratons, and generate the flow of temperature by the same traned RBM. The flow of temperature dstrbutons s shown n Fg. 8. We can see that the temperature dstrbuton of the reconstructed confguratons remans near the crtcal pont, and never flows away from there 9. If the process of the unsupervsed RBM corresponds to coarse- 9 We also traned the RBM usng confguratons of wder range of temperatures; T = 0, 0.25,..., 10,. The results are very smlar, and the temperature dstrbutons of reconstructed confguratons always approach the 16

17 Fgure 8: Temperature dstrbutons after varous numbers of teratons of the same RBM as Fgs. 6 and 7. The orgnal confguratons are generated at T = The dstrbuton s stable at around T c. granng of spn confguratons, the temperature dstrbutons of the reconstructed confguratons must flow away from T c. Though the drecton of the flow s opposte to the RG flow, both flows have the same property n that the crtcal pont T = T c plays an mportant role n controllng the flows. So far, n obtanng the above results of Fgs. 6, 7 and 8, we used an unsupervsed RBM wth 64 neurons n the hdden layer. We also traned other RBMs wth dfferent szes of the hdden layer, but by the same set of spn confguratons. When the sze of the hdden layer s smaller than (or equal to) that of the vsble layer N v = 100, namely N h = 100, 81, 64, 36 or 16, we fnd that the temperature dstrbuton approaches the crtcal pont. A dfference s that for smaller N h, the speed of the flow to approach T c becomes faster (.e., the flow arrves at T c by smaller numbers of teratons). In contrast, when the RBM has more than 100 neurons n the hdden layer; N h > N v, we obtan dfferent results. Fg. 9 shows the case of N h = 225 neurons. Untl about ten teratons, the measured temperature dstrbuton behaves smlarly to the case of N h 100,.e., t approaches the crtcal temperature. However, afterward t passes the crtcal pont and flows away to hgher temperature. In the case of 400 neurons, t moves towards hgh temperature at faster speed. Ths behavor suggests that, f the hdden layer has more than a necessary sze, the NN tends to learn a lot of nosy fluctuatons. Snce confguratons at hgher temperatures are nose-lke, the flow should go away to hgh temperature. We come back to ths conjecture n later sectons. 17

18 Fgure 9: Temperature dstrbuton after varous numbers of teratons of type V RBM wth 225 neurons n the hdden layer;.e., N h > N v. The orgnal confguratons are generated at T = 0. The dstrbuton has a peak at T = T c after 10 teratons, but then moves towards T =. Fgure 10: Flow of temperature dstrbutons startng from T = 0 n type H RBM. Type H RBM s traned by confguratons at only T = 4, 4.25,..., 6. The NN has N h = 64 neurons (left) and N h = 225 neurons (rght) respectvely n the hdden layer. The speed of the flow s slower for the larger szed hdden layer. Type H/L RBM: Traned by confgs at Hgher/Lower temperatures Next we construct another type of RBM, whch s traned by confguratons at hgher temperatures T = 4, 4.25,..., 6 than T c We call t type H RBM. The results of the flows of temperature dstrbutons n type H RBM are drawn n Fg. 10. In ths case, the crtcal pont T c =

19 Fgure 11: Flow of temperature dstrbutons startng from T = 6 n type L RBM. Type L RBM s traned by confguratons at only T = 0. N h = 64 (left) and N h = 225 (rght). measured temperature passes the crtcal pont and goes away towards hgher temperature. The behavor s understandable snce the RBM must have learned only the features at hgher temperatures. We also fnd that, f the number of neurons n the hdden layer s ncreased, the flow moves more slowly. Fnally, we construct type L RBM, whch s traned by confguratons only at the lowest temperature T = 0. Fg. 11 shows the numercal results of flows n the type L RBM. Smlarly to the type H RBM, the measured temperature passes the crtcal pont, but flows towards lower temperature nstead of hgher temperature. It s, of course, as expected because the type L RBM must have learned the features of spn confguratons at T = 0. In the type L RBM, as far as we have studed, the flow never goes back to hgher temperature even for large N h. It wll be because the T = 0 confguratons used for tranng do not at all contan nosy fluctuatons specfc to hgh temperatures. Ths also suggests that the RBM does not learn features that are not contaned n the confguratons used for tranngs. Summares and Conjectures Here we frst summarze the numercal results: For the type V RBM, When N h 100 = N v, the measured temperature T approaches T c (Fgs. 6, 7 and 8). However, for N h > 100 = N v, the flow eventually goes away towards T = (Fg. 9). Speed of flow s slower for a larger N h. For the type H/L RBM, The temperature T flows towards T = /T = 0 respectvely (Fgs. 10 and 11). 19

20 Speed of flow s slower for a larger N h. Here N h and N v are numbers of hdden and vsble neurons n the RBM. These behavors are reflectons of the propertes of the weghts and bases that the unsupervsed RBMs have learned n the process of tranng. Understandng the above behavors s equvalent to answerng what the unsupervsed RBMs have learned n the process of tranngs. The most mportant queston wll be why the temperature approaches T c n the type V RBM wth N h N v, nstead of, e.g., broadenng over the whole regons of temperature from T = 0 to T = 6. Note that we dd not teach the NN nether about the crtcal temperature nor the presence of phase transton. We just have traned the NN by confguratons at varous temperatures, from T = 0 to T = 6. Nevertheless the numercal smulatons show that the temperature dstrbutons are peaked at T c after some teratons of the RBM reconstructon. Thus we are forced to conclude that the RBM has automatcally learned features specfc to the crtcal temperature T c. An mportant feature at T c s the scale nvarance. We have generated spn confguratons at varous temperatures by the Monte Carlo method, and each confguraton has typcal fluctuatons specfc to each temperature. At very hgh temperature, fluctuatons are almost random at each lattce ste and there are no correlatons between spns at dstant postons. At lower temperature, they become correlated: the correlaton length becomes larger as T T c and dverges at T c. On the other hand, at T T c, spns are clustered and n each doman all spns take σ x,y = +1 or σ x,y = 1. At low temperature confguratons have only bg clusters, and as temperature ncreases small-szed clusters appear. At T c, spn confguratons become to have clusters of varous szes n a scale-nvarant way. Now let us come back to the queston why the type V RBM generates a flow approachng T c and does not randomze to broaden the temperature dstrbuton over the whole regons. We have traned the type V RBM by usng confguratons at varous temperatures wth dfferent szed clusters, and n the process the machne must have smultaneously acqured features at varous temperatures. Consequently the process of the RBM reconstructon adds varous features that the machne has learned to a reconstructed confguraton. If only a sngle feature at a specfc temperature was added to the reconstructed confguraton, the dstrbuton would become to have a peak at ths temperature. But t cannot happen because varous features of dfferent temperatures wll be added to a sngle confguraton by teratons of reconstructon processes. Then one may ask f there s a confguraton that s stable under addtons of features at varous dfferent T. Our frst conjecture about ths queston s that a set of confguratons at T c s a stablzer (and even more an attractor) of the type V RBM wth N h N v. It must be due to the scale nvarant propertes of the confguratons at T c. Namely snce these confguratons are scale nvarant, they have all the features of varous temperatures smultaneously, and consequently they can be the stablzer of ths RBM. Ths sounds plausble snce the scale nvarance means that the confguratons have varous dfferent characterstc length scales. However, we notce that ths doesn t mean that the RBM has forgotten the features of confguratons away from 20

21 the crtcal pont. Rather t means that the RBM has learned features of all temperatures smultaneously. Ths doesn t mean ether that the confguratons at T = T c have especally affected strong nfluence on the machne n the process of tranng. It can be confrmed as follows. Suppose we have traned a RBM by confguratons at temperatures excludng T = T c, namely tran by confguratons at all temperatures except T = 2.25 and 2.5. We found n the numercal smulatons that the RBM generates a flow towards the crtcal pont though we dd not provde confguratons at T = T c. Therefore we can say that the type V RBM has learned the features at all the temperatures and that confguratons at T c are specal because they contan all the features of varous temperatures n the confguratons. Our second conjecture, whch s related to the behavor of the type V RBM wth N h > N v, s that RBMs wth unnecessary large szed hdden layer tend to learn lots of rrelevant features. In the present case, they are nosy fluctuatons of confguratons at hgh temperatures. Hgh temperature confguratons have only short dstance correlatons, whose behavor s smlar to the typcal behavor of nose. The conjecture wll be partally supported by the smlarty of the RBM flows between the type V RBM wth N h > N v and the type H RBM. Namely both RBM flows converge on T =. The smlarty ndcates that the NN wth a larger number of N h may have learned too much nose-lke features of confguratons at hgher temperatures. The above consderatons wll teach us that the moderate sze of the hdden layer, N h < N v, s the most effcent to properly extract the features. 4 Analyss of the weght matrx In the prevous secton, we showed our numercal results for the flows generated by unsupervsed RBMs, and proposed two conjectures. One s that the scale nvarant T = T c confguratons are stablzers of the type V RBM flow. Another conjecture s that the RBM wth an unnecessary large szed hdden layer N h > N v tends to learn too much rrelevant noses. In ths secton, to further understand the theoretcal bass of feature extractons and to gve supportng evdences for our conjectures, we analyze varous propertes of the weght matrces and bases of the traned RBMs. Partcularly, we study propertes of W W T by lookng at spn correlatons n Sec. 4.2, magnetzaton n Sec. 4.3, and egenvalue spectrum n Sec Why W W T s mportant All the nformaton that the machne has learned s contaned n the weghts W a and the bases b (v), b (h) a. Snce the bases have typcally smaller values than the weghts (at least n the present stuatons), we wll concentrate on the weght matrx W a ( = 1,..., N v = L 2 ; a = 1,..., N h ) n the followng. 21

22 Let us frst note that the weght matrx W a transforms as W a W a = j,b U j W b (V T ) ba (41) under transformatons 10 of exchangng the bass of neurons n the vsble layer (U j ) and n the hdden layer (V ab ). Snce the choce of bass n the hdden layer s arbtrary, relevant nformaton n the vsble layer s stored n a combnaton of W a that s nvarant under transformatons of V ab. The smplest combnaton s a product, (W W T ) j = a W a W aj. (42) It s an N v N v = matrx, and ndependent of the sze of N h. But ts property depends on N h because the rank of W W T must be always smaller than mn(n v, N h ). Thus, f N h < N v, the weght matrx s strongly constraned; e.g. a unt matrx W W T = 1 s not allowed. Ths smplest product (42) plays an mportant role n the dynamcs of the flow generated by the RBM. It can be shown as follows. If the bases are gnored, the condtonal probablty (15) and the expectaton value (17) for h a n the background of v become p({h a } {v }) = e ( ) v W a h a 2 cosh ( v W a ), h a = tanh v W a. (43) In p({h a } {v }), a combnaton v W a =: B a can be regarded as an external magnetc feld for h a. Thus these two varables, B a and h a, tend to correlate wth each other. Namely, the probablty p({h a } {v }) becomes larger when they have the same sgn. Moreover, for B a < 1, h a s approxmated by B a and we can roughly dentfy these two varables, h a B a := v W a. (44) It s usually not a good approxmaton snce weghts can have larger values, but let us assume ths for the moment. For a large value of B a, h a s saturated at h a = B a / B a. Suppose that the nput confguraton s gven by {v (0) } = {σ A }. If Eq. (44) s employed, we have h (1) a = B a (0) = v(0) W a. Then the condtonal probablty (16) n the background of h (1) a wth b (h) a = 0, p({v } {h (1) a }) = 2 cosh e a v W a h (1) a ( a W ah (1) a ) (45) 10 Snce spn varables on each lattce ste are restrcted to take values ±1, the matrces, U j and V ab, are elements of the symmetrc group, not the orthogonal Le group. 22

23 can be approxmated as p({v } {h (1) a }) e a v W a j W jav (0) j j W jav (0) j 2 cosh ( a W a ) = 2 cosh e j v (W W T ) j v (0) j ( j (W W T ) j v (0) j ). (46) } so that the probablty dstrbuton p reproduces the A δ(v σ A ). Therefore, tranng of the RBM wll be performed so as to enhance the value A,,j σa(0) (W W T ) j σ A(0) j. Ths means that W s chosen so that (W W T ) j wll reflect the spn correlatons of the nput confguratons {σ A } at ste and j. The RBM learns the nput data {v (0) j probablty dstrbuton of the ntal data, q({v }) = 1 N In ths smplfed dscusson, learnng of the RBM s performed through the combnaton W W T. Of course, we neglected the nonlnear property of the neural network and the above statement cannot be justfed as t s. Nevertheless, we wll fnd below that the analyss of W W T s qute useful to understand how the RBM works. 4.2 Spn correlatons n W W T In Fg. 12, we plot values of matrx elements of the matrx W W T. These three fgures correspond to the RBMs wth dfferent szes of N h. We can see that they have large values n the dagonal and near dagonal elements. Note that the spn varables n the vsble layer, σ x,y wth x, y = 1,..., L = 10, are lned up as (σ 1,1, σ 1,2,..., σ 1,L, σ 2,1,..., σ 2,L, σ 3,1,..., σ L,L ), and named (σ 1, σ 2,..., σ N ). Hence lattce ponts and j of σ ( = 1,..., L 2 ) are adjacent to each other when j = ± 1 and j = ± L. In the followng, we mostly dscuss the type V RBM unless otherwse stated. Fgure 12: Elements of W W T when the hdden layer has 16 (left), 100 (center), 400 (rght) neurons. As dscussed above, the product of weght matrces W W T must reflect correlatons between spn varables of the nput confguratons used for the tranng of the RBM. The most strong correlaton n v (0) (W W T ) j v (0) j s of course the dagonal component, = j. Thus we 23

24 expect that the matrx W W T wll have large dagonal components. Indeed, such behavor can be seen n Fg. 12. In partcular, for N h = 400 > N v = 100 (the rghtmost fgure), W W T s clearly close to a dagonal matrx. It s almost true for the case of N h = 100 = N v (the mddle fgure). However, for N h = 16 < N v = 100 (the leftmost fgure), t s dfferent from a unt matrx and off-dagonal components of (W W T ) j also have large values, n partcular, at j = + 1 and j = + 2. Ths behavor must be a reflecton of the spn correlatons of the nput confguratons 11. It s also a reflecton of the fact that the rank of W W T s smaller than N h and W W T cannot be a unt matrx f N h < N v. Thus even though only less nformaton can be stored n the weght matrx for a smaller number of hdden neurons, the relevant nformaton of the spn correlatons s well encoded n the weght matrx of the RBM wth N h < N v compared wth the RBM wth larger N h. Then we wonder why such relevant nformaton s lost n the RBM wth N h > N v. Ths queston mght be related to our second conjecture proposed at the end of Sec. 3.2 that the RBM wth very large N h wll learn too much rrelevant nformaton, namely noses of the nput confguratons. It s nterestng and a bt surprsng that the RBM wth fewer hdden neurons seems to learn more effcently the relevant nformaton of the spn correlatons. In order to further confrm the relaton between the correlatons n the combnaton of the weght matrx W W T and the spn correlatons of the nput confguratons, we wll study structures of the weght matrces of other types of RBMs. In Fg. 13, we plot behavors of the off-dagonal components of W W T for varous RBMs. Each RBM s traned by confguratons at a sngle temperature T = 0 (type L), T = 2, T = 3 and T = 6 respectvely. The sze of the hdden layer s set to N h = 16. For comparson, we also plot the behavor of the off-dagonal components for the type V RBM. Fg. 13 shows that the correlaton of W W T decays more rapdly at hgher temperature, whch s consstent wth the expected behavor of spn correlatons. Therefore, the RBM seems to learn correctly about the correlaton length, or the sze of clusters, whch becomes smaller at hgher temperature. Furthermore, we fnd that, for the type V RBM that has learned all temperatures T = 0,..., 6, the off-dagonal elements decrease wth the decay rate between the T = 2 case and the T = 3 case. Ths ndcates that the type V RBM has acqured smlar features to those of the confguratons around T c = It s consstent wth the numercal results of Fgs. 6, 7 and 8, and gves another crcumstantal evdence supportng for the frst conjecture n Sec Magnetzaton and sngular value decomposton (SVD) Informaton of the weght matrx W can be nferred by usng the method of the sngular value decomposton (See, e.g., [26, 27]). Suppose that the matrx W W T has egenvalues λ a 11 Off-dagonal components of j = + L or j = + 2L are also large, whch corresponds to correlatons along y-drecton. Large off-dagonal components at j = + 1 and j = + 2 mean correlatons along x-drecton. 24

25 Fgure 13: Averaged values of the off-dagonal components of W W T (normalzed by the dagonal components). Each colored lne corresponds to the RBM that has learned confguratons at a sngle temperature T = 0, 2, 3, 6 respectvely. The black lne (the most mddle lne) s the behavor of the type V RBM that has learned all the temperatures T = 0,..., 6. (a = 1,..., N v ) wth correspondng egenvectors u a ; W W T u a = λ a u a. (47) Decomposng an nput confguraton vector v (0) n terms of the egenvectors u a as v (0) = a c au a wth a normalzaton condton a (c a) 2 = 1, we can rewrte v (0)T W W T v (0) as v (0)T W W T v (0) = a c 2 aλ a. (48) Thus f a vector v (0) contans more components wth larger egenvalues of W W T, the quantty v (0)T W W T v (0) becomes larger. Fg. 14 shows averaged values of v (0)T W W T v (0) over the 1000 confguratons {v (0) } at each temperature. For comparson between dfferent RBMs, we subtracted the values at T = 6. The fgure shows a bg change near the crtcal pont, whch s remnscent of the magnetzaton of Isng model. Snce v (0)T W W T v (0) should contan more nformaton than the magnetzaton tself, the behavor cannot be exactly the same. But t s qute ntrgung that Fg. 14 shows smlar behavor to the magnetzaton 12. It mght be because the quantty contans much nformaton about the lower temperature after subtracton of the values at 12 The behavor ndcates that the prncpal egenvectors wth large egenvalues mght be related to the magnetzaton, and nformaton about the phase transton s surely mported n the weght matrx. Thus we nvestgated propertes of the egenvectors but so far we have not got any physcally reasonable pctures. We want to come back to ths problem n future works. 25

26 Fgure 14: Averaged values of v (0)T W W T v (0) over the 1000 nput confguratons at each temperature. Dfferent colors correspond to type V RBMs wth dfferent number of hdden neurons N h. In ths fgure, the values at T = 6 are subtracted for comparson between dfferent RBMs. hgher temperature 13. In order to see the propertes of v (0)T W W T v (0) more than the magnetzaton n Fg. 14, we plot the same quanttes but wthout subtractng the values at T = 6. Fg. 15 shows two cases for N h = 64 and N h = 225. These fgures show that, at hgh temperature, the RBM wth large N h n the rght panel has larger components of the prncpal egenvectors compared to the RBM wth small N h n the left panel. The dfference must have caused the dfferent behavors n the RBM flows shown n Fg. 6 (N h = 64) and Fg. 9 (N h = 225). Namely the former RBM flow approaches the crtcal temperature T c, whle the latter eventually goes towards hgher temperature. The dfference of two fgures n Fg. 14 ndcate that the RBM wth larger N h seems to have learned more characterstc features at hgh temperatures than the RBM wth fewer N h. Then, does the RBM wth small N h fal to learn the features of hgh temperatures? Whch RBM s more adequate for feature extractons? Although t s dffcult to answer whch s more adequate wthout specfyng what we want the machne to learn, we beleve that the RBM wth N h < N v properly learns all the features of varous temperatures whle the RBM wth N h > N v has learned too much rrelevant features of hgh temperature. Ths s nothng but the second conjecture n Sec. 3.2, and supported by the behavors of correlatons n W W T dscussed n Sec It suggests that the subtracton may correspond to removng the contrbutons of the specfc features at hgher temperature. 26

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model EXACT OE-DIMESIOAL ISIG MODEL The one-dmensonal Isng model conssts of a chan of spns, each spn nteractng only wth ts two nearest neghbors. The smple Isng problem n one dmenson can be solved drectly n several