Scale-invariant Feature Extraction of Neural Network and Renormalization Group Flow

Size: px
Start display at page:

Download "Scale-invariant Feature Extraction of Neural Network and Renormalization Group Flow"

Transcription

1 KEK-TH-2029 arxv: v1 [hep-th] 22 Jan 2018 Scale-nvarant Feature Extracton of Neural Network and Renormalzaton Group Flow Satosh Iso a,b, Shotaro Shba a and Sumto Yokoo a,b a Theory Center, Hgh Energy Accelerator Research Organzaton (KEK), b Graduate Unversty for Advanced Studes (SOKENDAI), Tsukuba, Ibarak , Japan Abstract Theoretcal understandng of how deep neural network (DNN) extracts features from nput mages s stll unclear, but t s wdely beleved that the extracton s performed herarchcally through a process of coarse-granng. It remnds us of the basc concept of renormalzaton group (RG) n statstcal physcs. In order to explore possble relatons between DNN and RG, we use the Restrcted Boltzmann machne (RBM) appled to Isng model and construct a flow of model parameters (n partcular, temperature) generated by the RBM. We show that the unsupervsed RBM traned by spn confguratons at varous temperatures from T = 0 to T = 6 generates a flow along whch the temperature approaches the crtcal value T c = Ths behavor s opposte to the typcal RG flow of the Isng model. By analyzng varous propertes of the weght matrces of the traned RBM, we dscuss why t flows towards T c and how the RBM learns to extract features of spn confguratons. 1

2 1 Introducton Machne learnng has attracted nterdscplnary nterests as the core method of artfcal ntellgence, partcularly of bg data scence, and s now wdely used to dscrmnate subtle mages by extractng specfc features hdden n complcated nput data. A deep neural network (DNN), whch s motvated by human brans, s one of well-known algorthms [1]. Despte ts enormous successes, t s stll unclear why DNN works so well and how DNN can effcently extract specfc features. In dscrmnatng mages, we frst provde samples of nput mages wth assgned labels, such as a cat or a dog, and then tran the neural network (NN) so as to correctly predct the labels of new, prevously unseen, nput mages: ths s the supervsed learnng and ts ablty of predcton depends on how much relevant features the NN can extract. On the other hand, n unsupervsed learnng algorthms, a NN s traned wthout assgnng labels to data, but traned so as to generate output mages that are as close to the nput ones as possble. If the NN s successfully traned to reconstruct the nput data, t must have acqured specfc features of the nput data. Wth ths n mnd, unsupervsed learnngs are often adopted for pre-tranng of supervsed NNs. How can DNN effcently extract features? Specfc features characterstc to nput data usually have herarchcal structures. An mage of a cat can stll be dentfed as an anmal n a very low resoluton mage but one may not be able to dstngush t from a dog. Thus t s plausble that depth of neural networks reflects such herarchy of features. Namely DNN learns low-level (mcroscopc) characterstcs n upper stream of the network and gradually extracts hgher-level (macroscopc) characterstcs as the nput data flow downstream. In other words, the ntal data wll get coarse-graned towards output. Ths vewpont s remnscent of the renormalzaton group (RG) n statstcal physcs and quantum feld theores, and varous thoughts and studes are gven [2 9] based on ths analogy. Especally, n a semnal paper [4], Mehta and Schwab proposed an explct mappng between the RG and the Restrcted Boltzmann Machne (RBM) [1, 10 14]. RG s the most mportant concept and technology to understand the crtcal phenomena n statstcal physcs and also plays an essental role to constructvely defne quantum feld theores on lattce. It s based on the dea (and proved by Kenneth Wlson [15]) that the long-dstant macroscopc behavor of a many body system s unversally descrbed by relevant operators (relevant nformaton) around a fxed pont, and not affected by mcroscopc detals n the contnuum lmt. Through reducton of degrees of freedom n RG, the relevant nformaton s emphaszed whle other rrelevant nformaton s dscarded. Partcularly, suppose that the statstcal model s descrbed by a set of parameters {λ α }, and that the parameters are mapped to a dfferent set { λ α } by RG transformatons 1. Repeatng such RG transformatons, we can draw a flow dagram n the parameter space of the statstcal model, {λ α } { λ α } { λα }. (1) 1 In order to descrbe the RG transformaton exactly, nfntely many parameters are necessary to be ntroduced. But t can be usually well-approxmated by a fnte number of parameters. 2

3 These RG flows control the behavor of the statstcal model near the crtcal pont where a second order phase transton occurs. A smplest verson of RBM s a NN consstng of two layers, a vsble layer wth varables {v = ±1} and a hdden layer wth varables {h a = ±1}, that are coupled to each other through the Hamltonan Φ({v }, {h a }) = (,a W a v h a + b (v) v + a b (h) a h a ). (2) A probablty dstrbuton of a confguraton {v, h a } s gven by p({v }, {h a }) = 1 Z e Φ({v },{h a}) (3) where we defned the partton functon by Z = {v,h a} e Φ({v },{h a}). No ntra-layer couplngs are ntroduced n the RBM. Now suppose that the RBM s already traned and the parameters of the Hamltonan (2), namely {W a, b (v), b (h) a }, are already fxed through a process of tranng. The probablty dstrbuton p({v }, {h a }) also provdes the followng condtonal probabltes for {h a } (or {v }) wth the other varables beng kept fxed; p({h a }, {v }) p({h a } {v }) = {h p({h a} a}, {v }) p({v } {h a }) = (4) p({h a }, {v }) {v } p({h a}, {v }). (5) These condtonal probabltes generate a flow of dstrbutons, and consequently a flow of parameters {λ α } of the correspondng statstcal model. Suppose that we have a set of N( 1) ntal confguratons {v = σ A } (A = 1,..., N), whch are generated by a statstcal model wth parameters λ α, such as the Isng model at temperature T. In the large N lmt, the dstrbuton functon q 0 ({v }) = 1 δ(v σ A ) (6) N fathfully characterzes the statstcal model wth parameters λ α. Multplyng q 0 ({v }) by the condtonal probabltes (4) and (5) teratvely, we can generate a flow of probablty dstrbutons as A q 0 ({v }) r 1 ({h a }) = {v } p({h a } {v })q 0 ({v }) (7) r 1 ({h a }) q 1 ({v }) = {h a} p({v } {h a })r 1 ({h a }) (8) and so on for q n ({v }) r n+1 ({h a }) and r n+1 ({h a }) q n+1 ({v }). Let us focus on Eq. (7). If the probablty dstrbuton r 1 ({h a }) s well approxmated by the Boltzmann dstrbuton of 3

4 the same statstcal model wth dfferent parameters λ α, we can say that the RBM generates a transformaton 2 from {λ α } to { λ α }. If more than two layers are stacked teratvely, we can obtan a flow of parameters as n Eq. (1). Another way to obtan a flow s to look at the transformatons q 0 ({v }) q 1 ({v }) q 2 ({v }) and translate the flow of probablty dstrbutons nto a flow of parameters {λ α }. In the present paper, we consder the latter flow to dscuss a relaton wth RG. Mehta and Schwab [4] ponted out smlarty between RG transformatons of Eq. (1) and the above flows of parameters n the unsupervsed RBM. But n order to show that the transformaton of parameters {λ α } n the RBM ndeed generates the conventonal RG transformaton, t s necessary to show that the weght matrx W a and the bases b (v), b (h) a of the RBM must be approprately chosen so as to generate the correct RG transformaton that performs coarse-granng of nput confguratons. In Ref. [4], mult-layer RBM s employed as an unsupervsed-learnng NN, and the weghts and the bases are chosen by mnmzng the KL dvergences (relatve entropy) between the nput probablty dstrbuton and the reconstructed dstrbuton by ntegratng (margnalzng) over the hdden varables. The authors suggested the smlarty by lookng at the local spn structures n the hdden varables, but they dd not show t explctly that the weghts determned by the unsupervsed learnng actually generate the flow of RG transformatons. The arguments [4] and msconcepton n the lterature are crtczed by Ref. [6]. In a wder context, the crtcsm s related to the followng queston: what determnes whether a specfc feature of nput data s relevant or not? In RG transformatons of statstcal models, long-wave length (macroscopc) modes are hghly respected whle short-wave length modes are dscarded as nose. In ths way, RG transformatons can extract unversal behavor of the model at long-wave length. But, of course, t s so because we are nterested n the macroscopc behavor of the system: f we are nstead nterested n short-wave length physcs, we need to extract opposte features of the model. Thus, we may say that extracton of relevant features needs pre-exstng bases to judge, and supervsed learnng s necessary to gve such bases to the machne. However ths does not mean that unsupervsed learnngs do not have anythng to do wth the RG. Even n unsupervsed learnngs, a NN automatcally notces and extracts some knd of features of the nput data and the flow generated by the traned NN reflects such features. In the present paper, we nvestgate relatonshp between the RBM and the RG by further studyng the flows of dstrbutons, Eqs. (7) and (8), that the unsupervsed RBM generates. Here notce that n defnng the flow of (7) and (8), we need to specfy how we have traned the RBM because the tranng determnes the propertes of the weghts and bases, and accordngly the behavor of the flow. In ths paper we mostly use the followng three dfferent ways of tranngs. One type of RBM (we call type V) s traned by confguratons at varous temperatures from low to hgh. Other two types (type H and L) are traned by confguratons 2 The stuaton s smlar to the footnote 1, and nfntely many parameters are necessary to represent the probablty dstrbuton p({h a }) n terms of the statstcal model. 4

5 only at hgh (and only at low) temperatures. Then we translate these flows of probablty dstrbutons defned by Eqs. (7) and (8) nto flows of temperature of the Isng model, T T T. (9) In order to measure temperature, we prepare another NN traned by a supervsed learnng. The results of our numercal smulatons lead to a surprsng concluson. In the type V RBM that has adequately learned the features of confguratons at varous temperatures, we found that the temperature approaches the crtcal pont, T T c, along the RBM flow. The behavor s opposte to the conventonal RG flow of Isng model. The paper s organzed as follows. In secton 2, we explan the basc settngs and the methods of our nvestgatons. We prepare sample mages of the spn confguratons of the Isng model, and tran RBMs by the confguratons wthout assgnng labels of temperature. Then we construct flows of parameters (.e., temperature) generated by the traned RBM 3. In secton 3, we show varous results of the numercal smulatons, ncludng the RBM flows of parameters. In secton 4, we analyze propertes of the weght matrces W a usng the method of sngular value decomposton. The fnal secton s devoted to summary and dscussons. Our man results of the RBM flow and conjectures about the feature extractons of the unsupervsed RBM are wrtten n Sec Methods We explan varous methods for numercal smulatons to nvestgate relatons between the unsupervsed RBM and the RG of Isng model. Though most methods n ths secton are standard and well known, we explan them n detals to make the paper self-contaned. In Sec. 2.3, we explan the central method of generatng the RBM flows. Basc materals of the RBM are gven n Sec The other two sectons, Secs. 2.1 and 2.4, can be skpped over unless one s nterested n how we generate the ntal spn confguratons and measure temperature of a set of confguratons. 2.1 Monte-Carlo smulatons of Isng model We frst construct samples of confguratons of the two-dmensonal Isng model by usng Monte-Carlo smulatons. The spn varables σ x,y = ±1 are defned on a two dmensonal lattce of sze L L. The ndex (x, y) represents each lattce ste and takes x, y = 0, 1,..., L 1. 3 Two-dmensonal Isng model s the smplest statstcal model to exhbt the second order phase transton, and there are many prevous studes of the Isng model usng machne learnngs. See e.g., [16 21]. 5

6 The Isng model Hamltonan s gven by H = J L 1 x,y=0 σ x,y (σ x+1,y + σ x 1,y + σ x,y+1 + σ x,y 1 ). (10) It descrbes a ferromagnetc model for J > 0 and an ant-ferromagnetc model for J < 0. Here we mpose the perodc boundary condtons for the spn varables, σ L,y := σ 0,y, σ 1,y := σ L 1,y, σ x,l := σ x,0, σ x, 1 := σ x,l 1. (11) Generatons of spn confguratons at temperature T are performed by the method of Metropols Monte Carlo (MMC) smulaton. In the method, we frst generate a random confguraton {σ x,y }. We then choose one of the spns σ x,y and flp ts spn wth the probablty p x,y = { 1 (when dex,y < 0) e dex,y/k BT (when de x,y > 0) (12) where de x,y s the change of energy of ths system de x,y = 2Jσ x,y (σ x+1,y + σ x 1,y + σ x,y+1 + σ x,y 1 ). (13) The probablty of flppng the spn (12) satsfes the detaled balance condton P s s ρ s = P s sρ s where ρ s e Es/kBT s the canoncal dstrbuton of the spn confguraton s = {σ x,y } at temperature T. Thus after many teratons of flppng all the spns, the confguraton approaches the equlbrum dstrbuton at T. Snce all physcal quanttes are wrtten n terms of a combnaton of J/k B T, we can set the Boltzmann constant k B and the nteracton parameter J to be equal to 1 wthout loss of generalty. In the followng analyss, we set the lattce sze L 2 = and repeat the procedure of MMC smulatons 100L 2 = tmes to construct spn confguratons. In our smulatons, we generated spn confguratons at varous temperatures T = 0, 0.25, 0.5,..., 6. 4 Some of typcal spn confguratons are shown n Fg. 1. Fgure 1: Examples of spn confguratons at temperatures T = 0, 2, 3, 6 4 For T = 0, we practcally set T = 10 6 for numercal calculatons. 6

7 2.2 Unsupervsed learnng of the RBM Our man motvaton n the present paper s to study whether the RBM s related to the RG n statstcal physcs. In ths secton, we revew the basc algorthm of the RBM [1, 10 14] whch s traned by the confguratons constructed by the MMC method of Sec As explaned Fgure 2: (a) Two-layer neural network of the RBM wth a vsble layer {v } and a hdden layer {h a }. These two layers are coupled but there are no ntra-layer couplngs. (b) The RBM generates reconstructed confguratons from {v } to {ṽ } through the hdden confguraton {h a }. n the Introducton, the RBM conssts of two layers as shown n the left panel of Fg. 2. The ntal confguratons {σ x,y } of Isng model generated at varous temperatures are nput nto the vsble layer {v }. The number of neurons n the vsble layer s fxed at N v = L 2 = 100 ( = 1,..., N v ) to represent the spn confguratons of Isng model. On the other hand, the hdden layer can take an arbtrary number of neurons, N h. In the present paper, we consder 7 dfferent szes; N h = 16, 36, 64, 81, 100, 225 and 400. Thus the N h spn varables n the hdden layer are gven by {h a } for a = 1,..., N h. The RBM s a generatve model of probablty dstrbutons based on Eq. (3). We frst explan how we can tran the RBM by optmzng the weghts W a and the bases b (v), b (h) a. Our goal s to represent the gven probablty dstrbuton q 0 ({v }) n Eq. (6), as fathfully as possble, n terms of a model probablty dstrbuton defned by p({v }) = 1 Z e Φ({v },{h a}). (14) {h a} The partton functon Z = {v,h a} e Φ({v },{h a}) s dffcult to evaluate, but summatons over only one set of spn varables (e.g. over {v }) are easy to perform because of the absence of the ntra-layer couplngs. It also makes the condtonal probabltes (4) and (5) to be 7

8 rewrtten as products of probablty dstrbutons of each spn varable; p({h a } {v }) = a p({v } {h a }) = p(h a {v }) = a p(v {h a }) = 1 [ ( )] (15) 1 + exp 2h a W av + b (h) a 1 [ ( )]. (16) 1 + exp 2v a W ah a + b (v) Then the expectaton values of spn varables n the hdden (or vsble) layer n the background of spn confguratons n the other layer are calculated as ) h a {v } = tanh v {ha} = tanh ( ( a W a v + b (h) a W a h a + b (v) ) (17). (18) Now the task s to tran the RBM so as to mnmze the dstance between two probablty dstrbutons of q({v }) and p({v }) by approprately choosng the weghts and the bases. The dstance s called Kullback-Lebler (KL) dvergence, or relatve entropy, and gven by KL(q p) = {v } q({v }) log q({v }) p({v }) = const. q({v }) log p({v }). (19) {v } If two probabltes are equal, the KL dvergence vanshes. Otherwse dervatves of KL(q p) wth respect to the weght W a and the bases b (v), b (h) a are gven by where averages are defned by KL(q p) W a = v h a data v h a model KL(q p) b (v) = v data v model (20) KL(q p) b (h) a = h a data h a model, A({v }) data = {v } q({v })A({v }) (21) A({v }, {h a }) model = {v },{h a} p({v }, {h a })A({v }, {h a }), (22) and h a n data s replaced by h a {v } of Eq. (17). In tranng the RBM, we change the weghts and bases so that the KL dvergence s reduced. Usng the method of back 8

9 propagaton [22], we renew values of the weghts and bases as where W W new = W + δw b (v) b (v) new = b (v) + δb (v) b (h) b (h) new = b (h) + δb (h) (23) δw a = ɛ ( v h a data v h a model ) δb (v) = ɛ ( v data v model ) δb (h) a = ɛ ( h a data h a model ). (24) Here ɛ denotes the learnng rate, whch we set to 0.1. The frst terms data are easy to calculate, but the second terms model are dffcult to evaluate snce t requres the knowledge of the full partton functon Z. To avod ths dffculty, we need to use the method of Gbbs samplng to approxmately evaluate these expectaton values model. Practcally we employ a more smplfed method, whch s called the method of contrastve dvergence (CD) [23 25]. The dea s very smple, and remnscent of the mean feld approxmaton n statstcal physcs. Gven the nput data of the vsble spn confguratons {v A(0) = σ A }, the expectaton value of the hdden spn varable h a can be easly calculated as Eq. (17). We wrte the expectaton value as h A(1) a := h a A(0) {v } = tanh ( W a v A(0) + b (h) a ). (25) Then n ths background of the hdden spn confguratons, the expectaton value of v can be agan easly calculated by usng Eq. (18). We wrte t as Then we obtan h A(2) a v A(1) := v A(1) {h a } = tanh ( a W a h A(1) a + b (v) ). (26) = h a A(1) {v, and so on. We can terate these procedure many tmes } and replace the second terms n Eq. (20) by the expectaton values generated by ths method. In dong the numercal smulatons n the present paper, we adopt the smplest verson of CD, called CD 1, whch gves us the followng approxmate formulas: v data = 1 σ A, N A h a data = 1 N A h A(1) a, v h a data = 1 N A σ A h A(1) a, (27) 9

10 and v model = 1 N A v A(1), h a model = 1 N A h A(2) a, v h a model = 1 N A v A(1) h A(2) a. (28) Here σ A denotes each spn confguraton {σ x,y } generated by the method of Sec As nput data to tran the RBM, we generated 1000 spn confguratons for each of 25 dfferent temperatures T = 0, 0.25,..., 6. Then the ndex A runs from 1 to N = In some cases, as we wll see n Sec. 3.2, we use only a restrcted set of confguratons at hgh or low temperatures, then the ndex runs A = 1,..., N = 1000 (number of temperatures). We repeat the renewal procedure (23) many tmes (5000 epochs), and obtan adjusted values of the weghts and bases. In ths way we tran the RBM by usng a set of confguratons {v A(0) = σ A }, (A = 1,..., N). 2.3 Generaton of RBM flows As dscussed n the Introducton, once the RBM s traned and the weghts and bases are fxed, the RBM generates a sequence of probablty dstrbutons (8). Then we translate the sequence nto a flow of parameters (.e., temperature). In generatng the sequence, the ntal set of confguratons should be prepared separately n addton to the confguratons that are used to tran the RBM 5. We can also generate a flow of parameters n a slghtly dfferent way. For a specfc confguraton v = v (0), we can defne a sequence of confguratons followng Eqs. (25) and (26) as {v (0) } {h (1) a } {v (1) } {h (2) a } {v (2) }. (29) The rght panel of Fg. 2 shows a generaton of new confguratons from {v } to {ṽ } through {h a }. Snce each value of v (n) and h (n) a (for n > 0) s defned by an expectaton value as n Eqs. (25) and (26), t does not take an nteger ±1 but a fractonal value between ±1. In order to get a flow of spn confguratons, we need to replace these fractonal values by ±1 wth a probablty (1± v (n) )/2 or (1± h a (n) )/2. It turns out that the replacement s usually a good approxmaton snce the expectaton values are lkely to take values close to ±1 owng to the property of the traned weghts W a 1. In ths way, we obtan a flow of spn confguratons {v (0) } {v (1) } {v (2) } {v (n) } (30) startng from the ntal confguraton {v (0) }. The flow of confguratons s transformed to a flow of temperature dstrbutons by usng the method explaned n Sec Thus we generate spn confguratons n addton to the confguratons used for tranng the RBM. 10

11 2.4 Temperature measurement by a supervsed-learnng NN Next we desgn a neural network (NN) to measure temperature of spn confguratons. The NN for the supervsed learnng has three layers wth one hdden layer n the mddle (See Fg. 3). Fgure 3: Three-layer neural network for supervsed learnng wth an nput layer {z (1) }, a hdden layer {z a (2) } and an output layer {z µ (3) }. The nput layer {z (1) } conssts of L 2 = 100 neurons n whch we nput spn confguratons of Isng model. The output layer {z µ (3) } has 25 neurons whch correspond to 25 dfferent temperatures that we want to measure. The number of neurons n the hdden layer {z a (2) } s set to 64. We tran ths three-layer NN by a set of spn confguratons, each of whch has a label of temperature. Thus ths s the supervsed learnng. As nput data to tran the NN, we use the same N = confguratons whch were used to tran the RBM 6. The tranng of the NN s carred out as follows. Denote the nput data as Z (1) A = σ A (31) where A = 1,..., N, and σ A are the spn confguratons {σ x,y = ±1} as n Sec The nput data s transformed to Z (2) Aa n the hdden layer by the followng nonlnear transformaton; ( 100 ) ( ) Z (2) Aa = f Z (1) A W (1) a + b(1) a =: f U (1) Aa (32) =1 where W (1) a s an weght matrx and b (1) a s a bas. The actvaton functon f(x) s chosen as 6 In order to check the performance of the NN, namely to see how precsely the machne can measure the temperature of a new set of confguratons, we use other confguratons that are prepared for generatng the sequence of probablty dstrbutons of the RBM n Sec We wll show the results of the performance n Sec

12 f(x) = tanh(x). Z (2) Aa s transformed to Z(3) Aµ label, namely temperature, of each confguraton. The output Z (3) Aµ Z (3) Aµ = g ( 64 a=1 Z (2) Aa W aµ (2) + b (2) µ n the output layer, whch corresponds to the ) ( =: g U (2) Aµ ) s gven by where W (2) aµ and b (2) µ are another weght matrx and bas. The functon g(x) s the softmax functon (33) ( g U (2) Aµ ) = exp U (2) Aµ 25 (2) ν=1 exp U Aν, (34) so that Z (3) Aµ can be regarded as a probablty snce µ Z(3) Aµ = 1 s satsfed for each confguraton A. Thus the NN transforms an nput spn confguraton Z (1) A to the probablty Z (3) Aµ of the confguraton to take the µ-th output value (.e., temperature). Each of the nput confguratons Z (1) A s generated by the MMC method at temperature T. T takes one of the 25 dscrete values T = 1 4 (ν 1), (ν = 1,..., 25). If the A-th confguraton s labelled by ν, we want the NN to gve an output Z (3) Aµ as close as the followng one-hot representaton: d (ν) A = (0,..., 0, ˇ1 (ν), 0,..., 0) A, (35) or ts µ-th component s gven by d (ν) Aµ = δ µν. It can be nterpreted as a probablty of the confguraton A to take the µ-th output. Then the task of the supervsed tranng s to mnmze the cross entropy, whch s equvalent to the KL dvergence of the desred probablty d (ν) Aµ and the output probablty Z(3) Aµ. The loss functon s thus gven by the cross entropy, E A = KL(d (ν) Aµ Z(3) Aµ ) = µ d (ν) Aµ log Z(3) Aµ. (36) Then, usng the method of back propagaton, we renew values of the weghts and bases from the lower to the upper stream; W (l) W (l) new = W (l) + δw (l) b (l) b (l) new = b (l) + δb (l). (37) The varatons of δw (l), δb (l) at the lower stream are gven by δw (2) aµ = ɛ N δb (2) µ = ɛ N 12 A A (Z (2) ) T aa (3) Aµ (3) Aµ (38)

13 where (3) Aµ = Z(3) Aµ d(ν) Aµ. The learnng rate ɛ s set to 0.1. Then usng these lower stream varatons, we change the upper stream weghts and bases as δw (1) a = ɛ N δb (1) a = ɛ N A A (Z (1) ) T A (2) Aa (2) Aa (39) where (2) Aa = µ ( ) (3) Aµ (W (2) ) T µaf U (1) Aa. (40) We repeat ths renewal procedure many tmes (7500 epochs) for the tranng of the NN to obtan sutably adjusted values of the weghts and bases. Fnally we note how we measure temperature of a confguraton. If the sze of a confguraton generated at temperature T s large enough, say L = , the traned NN wll reproduce the temperature of the confguraton qute fathfully. However our confguratons are small szed wth only L = 10. Thus we nstead need an ensemble of many spn confguratons and measure a temperature dstrbuton of the confguratons. The supervsed learnng gves us ths probablty dstrbuton of temperature. 3 Numercal results In ths secton we present our numercal results for the flows generated by unsupervsed RBM, and dscuss a relaton wth the renormalzaton group flow of Isng model. Our man results of the RBM flows are wrtten n Sec Supervsed learnng for temperature measurement Before dscussng the unsupervsed RBM, let us frst see how we traned the NN to measure temperature. In Fg. 4, we plot behavors of the loss functon (36) as we terate renewals of the weghts and bases (37). The blue (lower) lne shows the tranng error, namely values of the loss functon (36) after teratons of tranng usng confguratons. It s contnuously decreasng, even after 7500 epochs. On the other hand, the red (upper) lne shows the test error, namely values of the loss functon for addtonal confguratons whch are not used for the tranng. Ths s also decreasng at frst, but after 6000 epochs t becomes almost constant. After 7500 epochs, n fact, t turns to ncrease. Ths means the machne becomes over-traned, therefore we stopped the learnng at 7500 epochs. In Fg. 5 we show probablty dstrbutons of temperature ths NN measures. Here we use confguratons at T = 0, 2, 3, 6 whch are not used for the tranng. Though they are not 13

14 Fgure 4: Tranng error and test error (up to 7500 epochs) Fgure 5: Probablty dstrbutons of measured temperatures for varous sets of confguratons generated at T = 0, 2, 3, 6 respectvely. Temperature of the confguratons can be dstngushed by lookng at the shapes of the dstrbutons. sharply peaked at the temperatures where the confguratons are generated 7, each of them has characterstc shape that s dfferent temperature by temperature. Thus t s possble to dstngush the temperature of the nput confguratons by lookng at the shape of the probablty dstrbuton, even f these confguratons are not used for the tranng of the NN. In the followng, by usng ths NN, we measure temperature of confguratons that are 7 There are two reasons for ths broadenng of the dstrbutons. One s due to the fnteness of the sze of a confguraton N = L L = Another s due to the lmt of measurng temperature by the NN. If the sze of a confguraton was nfnte and f the ablty of dscrmnatng subtle dfferences of dfferent temperature confguratons was lmtless, we would have obtaned a very sharp peak at the labelled temperature. 14

15 generated by the RBM flow. 3.2 Unsupervsed RBM flows Now we present the man results of the present paper, namely the flows generated by the unsupervsed RBM. We sometmes call t the RBM flow. As dscussed n the Introducton, f the RBM s smlar to the the conventonal RG n that t possesses a functon of coarsegranng, the RBM flow must go away from the crtcal pont T c = In order to check t we construct three dfferent types of unsupervsed RBMs, whch we call type V, type L, and type H respectvely, usng the method of Sec Each of them s traned by a dfferent set of spn confguratons generated at dfferent set of temperatures. We then generate flows of temperature dstrbutons by usng these traned RBMs, followng the methods of Secs. 2.3 and 2.4. Type V RBM: Traned by confguratons at T = {0, 0.25, 0.5,, 6} Frst we construct type V RBM, whch s traned by confguratons at temperatures rangng wdely from low to hgh, T = 0, 0.25,..., 6. The temperature range ncludes the temperature T = 2.25 near T c. After tranng s completed, ths unsupervsed RBM wll have learned features of spn confguratons at these temperatures. Once the tranng s fnshed, we then generate a sequence of reconstructed confguratons as n Eq. (30) usng the methods n Sec For ths, we prepare two dfferent sets of ntal confguratons. One s a set of confguratons at T = 0, and another at T = 6. These ntal confguratons are not used for the tranng of the RBM. Then by usng the supervsed NN n Sec. 3.1, we measure temperature and translate the flow of confguratons to a flow of temperature dstrbutons. In Fgs. 6 and 7, we plot temperature dstrbutons of confguratons that are generated by teratng the RBM reconstructon n Sec The tr n the legends means the numbers of teratons n by the unsupervsed RBM. Fg. 6 shows a flow of temperature dstrbutons startng from spn confguratons generated at T = 0. Fg. 7 starts from T = 6. In all the fgures, the black lnes are the measured temperature dstrbutons of the ntal confguratons 8. Colored lnes show temperature dstrbutons of the reconstructed confguratons {v (n) } after varous numbers of teratons. The left panels show the temperature dstrbutons at small teratons (up to 10 n Fg. 6 and 50 n Fg. 7), whle the rght panels are at larger teratons up to These results ndcate that the crtcal temperature T c s a stable fxed pont of the flows n type V RBM. It s apparently dfferent from a nave expectaton that the RBM flow should show the same behavor as the RG flow. Indeed t s n the opposte drecton. From whchever temperature T = 0 or T = 6 we start the RBM teraton, the peak of the temperature dstrbutons approaches the crtcal pont (T = 2.27). 8 As dscussed n the footnote 7, these dstrbutons are not sharply peaked at the temperature at whch the confguratons are generated. 15

16 Fgure 6: Temperature dstrbutons after varous numbers of teratons of type V RBM, whch s traned by the confguratons at T = 0, 0.25,..., 6. The orgnal confguratons are generated at T = 0. After only several teratons, the temperature dstrbuton s peaked around T c, and stablze there: T c s a stable fxed pont of the flow. Fgure 7: Temperature dstrbutons after varous numbers of teratons of the same RBM as Fg. 6. The orgnal confguratons are generated at T = 6. After 50 teratons, the dstrbuton stablzes at T c. In order to confrm the above behavor, we provde another set of confguratons at T = 2.25 as ntal confguratons, and generate the flow of temperature by the same traned RBM. The flow of temperature dstrbutons s shown n Fg. 8. We can see that the temperature dstrbuton of the reconstructed confguratons remans near the crtcal pont, and never flows away from there 9. If the process of the unsupervsed RBM corresponds to coarse- 9 We also traned the RBM usng confguratons of wder range of temperatures; T = 0, 0.25,..., 10,. The results are very smlar, and the temperature dstrbutons of reconstructed confguratons always approach the 16

17 Fgure 8: Temperature dstrbutons after varous numbers of teratons of the same RBM as Fgs. 6 and 7. The orgnal confguratons are generated at T = The dstrbuton s stable at around T c. granng of spn confguratons, the temperature dstrbutons of the reconstructed confguratons must flow away from T c. Though the drecton of the flow s opposte to the RG flow, both flows have the same property n that the crtcal pont T = T c plays an mportant role n controllng the flows. So far, n obtanng the above results of Fgs. 6, 7 and 8, we used an unsupervsed RBM wth 64 neurons n the hdden layer. We also traned other RBMs wth dfferent szes of the hdden layer, but by the same set of spn confguratons. When the sze of the hdden layer s smaller than (or equal to) that of the vsble layer N v = 100, namely N h = 100, 81, 64, 36 or 16, we fnd that the temperature dstrbuton approaches the crtcal pont. A dfference s that for smaller N h, the speed of the flow to approach T c becomes faster (.e., the flow arrves at T c by smaller numbers of teratons). In contrast, when the RBM has more than 100 neurons n the hdden layer; N h > N v, we obtan dfferent results. Fg. 9 shows the case of N h = 225 neurons. Untl about ten teratons, the measured temperature dstrbuton behaves smlarly to the case of N h 100,.e., t approaches the crtcal temperature. However, afterward t passes the crtcal pont and flows away to hgher temperature. In the case of 400 neurons, t moves towards hgh temperature at faster speed. Ths behavor suggests that, f the hdden layer has more than a necessary sze, the NN tends to learn a lot of nosy fluctuatons. Snce confguratons at hgher temperatures are nose-lke, the flow should go away to hgh temperature. We come back to ths conjecture n later sectons. 17

18 Fgure 9: Temperature dstrbuton after varous numbers of teratons of type V RBM wth 225 neurons n the hdden layer;.e., N h > N v. The orgnal confguratons are generated at T = 0. The dstrbuton has a peak at T = T c after 10 teratons, but then moves towards T =. Fgure 10: Flow of temperature dstrbutons startng from T = 0 n type H RBM. Type H RBM s traned by confguratons at only T = 4, 4.25,..., 6. The NN has N h = 64 neurons (left) and N h = 225 neurons (rght) respectvely n the hdden layer. The speed of the flow s slower for the larger szed hdden layer. Type H/L RBM: Traned by confgs at Hgher/Lower temperatures Next we construct another type of RBM, whch s traned by confguratons at hgher temperatures T = 4, 4.25,..., 6 than T c We call t type H RBM. The results of the flows of temperature dstrbutons n type H RBM are drawn n Fg. 10. In ths case, the crtcal pont T c =

19 Fgure 11: Flow of temperature dstrbutons startng from T = 6 n type L RBM. Type L RBM s traned by confguratons at only T = 0. N h = 64 (left) and N h = 225 (rght). measured temperature passes the crtcal pont and goes away towards hgher temperature. The behavor s understandable snce the RBM must have learned only the features at hgher temperatures. We also fnd that, f the number of neurons n the hdden layer s ncreased, the flow moves more slowly. Fnally, we construct type L RBM, whch s traned by confguratons only at the lowest temperature T = 0. Fg. 11 shows the numercal results of flows n the type L RBM. Smlarly to the type H RBM, the measured temperature passes the crtcal pont, but flows towards lower temperature nstead of hgher temperature. It s, of course, as expected because the type L RBM must have learned the features of spn confguratons at T = 0. In the type L RBM, as far as we have studed, the flow never goes back to hgher temperature even for large N h. It wll be because the T = 0 confguratons used for tranng do not at all contan nosy fluctuatons specfc to hgh temperatures. Ths also suggests that the RBM does not learn features that are not contaned n the confguratons used for tranngs. Summares and Conjectures Here we frst summarze the numercal results: For the type V RBM, When N h 100 = N v, the measured temperature T approaches T c (Fgs. 6, 7 and 8). However, for N h > 100 = N v, the flow eventually goes away towards T = (Fg. 9). Speed of flow s slower for a larger N h. For the type H/L RBM, The temperature T flows towards T = /T = 0 respectvely (Fgs. 10 and 11). 19

20 Speed of flow s slower for a larger N h. Here N h and N v are numbers of hdden and vsble neurons n the RBM. These behavors are reflectons of the propertes of the weghts and bases that the unsupervsed RBMs have learned n the process of tranng. Understandng the above behavors s equvalent to answerng what the unsupervsed RBMs have learned n the process of tranngs. The most mportant queston wll be why the temperature approaches T c n the type V RBM wth N h N v, nstead of, e.g., broadenng over the whole regons of temperature from T = 0 to T = 6. Note that we dd not teach the NN nether about the crtcal temperature nor the presence of phase transton. We just have traned the NN by confguratons at varous temperatures, from T = 0 to T = 6. Nevertheless the numercal smulatons show that the temperature dstrbutons are peaked at T c after some teratons of the RBM reconstructon. Thus we are forced to conclude that the RBM has automatcally learned features specfc to the crtcal temperature T c. An mportant feature at T c s the scale nvarance. We have generated spn confguratons at varous temperatures by the Monte Carlo method, and each confguraton has typcal fluctuatons specfc to each temperature. At very hgh temperature, fluctuatons are almost random at each lattce ste and there are no correlatons between spns at dstant postons. At lower temperature, they become correlated: the correlaton length becomes larger as T T c and dverges at T c. On the other hand, at T T c, spns are clustered and n each doman all spns take σ x,y = +1 or σ x,y = 1. At low temperature confguratons have only bg clusters, and as temperature ncreases small-szed clusters appear. At T c, spn confguratons become to have clusters of varous szes n a scale-nvarant way. Now let us come back to the queston why the type V RBM generates a flow approachng T c and does not randomze to broaden the temperature dstrbuton over the whole regons. We have traned the type V RBM by usng confguratons at varous temperatures wth dfferent szed clusters, and n the process the machne must have smultaneously acqured features at varous temperatures. Consequently the process of the RBM reconstructon adds varous features that the machne has learned to a reconstructed confguraton. If only a sngle feature at a specfc temperature was added to the reconstructed confguraton, the dstrbuton would become to have a peak at ths temperature. But t cannot happen because varous features of dfferent temperatures wll be added to a sngle confguraton by teratons of reconstructon processes. Then one may ask f there s a confguraton that s stable under addtons of features at varous dfferent T. Our frst conjecture about ths queston s that a set of confguratons at T c s a stablzer (and even more an attractor) of the type V RBM wth N h N v. It must be due to the scale nvarant propertes of the confguratons at T c. Namely snce these confguratons are scale nvarant, they have all the features of varous temperatures smultaneously, and consequently they can be the stablzer of ths RBM. Ths sounds plausble snce the scale nvarance means that the confguratons have varous dfferent characterstc length scales. However, we notce that ths doesn t mean that the RBM has forgotten the features of confguratons away from 20

21 the crtcal pont. Rather t means that the RBM has learned features of all temperatures smultaneously. Ths doesn t mean ether that the confguratons at T = T c have especally affected strong nfluence on the machne n the process of tranng. It can be confrmed as follows. Suppose we have traned a RBM by confguratons at temperatures excludng T = T c, namely tran by confguratons at all temperatures except T = 2.25 and 2.5. We found n the numercal smulatons that the RBM generates a flow towards the crtcal pont though we dd not provde confguratons at T = T c. Therefore we can say that the type V RBM has learned the features at all the temperatures and that confguratons at T c are specal because they contan all the features of varous temperatures n the confguratons. Our second conjecture, whch s related to the behavor of the type V RBM wth N h > N v, s that RBMs wth unnecessary large szed hdden layer tend to learn lots of rrelevant features. In the present case, they are nosy fluctuatons of confguratons at hgh temperatures. Hgh temperature confguratons have only short dstance correlatons, whose behavor s smlar to the typcal behavor of nose. The conjecture wll be partally supported by the smlarty of the RBM flows between the type V RBM wth N h > N v and the type H RBM. Namely both RBM flows converge on T =. The smlarty ndcates that the NN wth a larger number of N h may have learned too much nose-lke features of confguratons at hgher temperatures. The above consderatons wll teach us that the moderate sze of the hdden layer, N h < N v, s the most effcent to properly extract the features. 4 Analyss of the weght matrx In the prevous secton, we showed our numercal results for the flows generated by unsupervsed RBMs, and proposed two conjectures. One s that the scale nvarant T = T c confguratons are stablzers of the type V RBM flow. Another conjecture s that the RBM wth an unnecessary large szed hdden layer N h > N v tends to learn too much rrelevant noses. In ths secton, to further understand the theoretcal bass of feature extractons and to gve supportng evdences for our conjectures, we analyze varous propertes of the weght matrces and bases of the traned RBMs. Partcularly, we study propertes of W W T by lookng at spn correlatons n Sec. 4.2, magnetzaton n Sec. 4.3, and egenvalue spectrum n Sec Why W W T s mportant All the nformaton that the machne has learned s contaned n the weghts W a and the bases b (v), b (h) a. Snce the bases have typcally smaller values than the weghts (at least n the present stuatons), we wll concentrate on the weght matrx W a ( = 1,..., N v = L 2 ; a = 1,..., N h ) n the followng. 21

22 Let us frst note that the weght matrx W a transforms as W a W a = j,b U j W b (V T ) ba (41) under transformatons 10 of exchangng the bass of neurons n the vsble layer (U j ) and n the hdden layer (V ab ). Snce the choce of bass n the hdden layer s arbtrary, relevant nformaton n the vsble layer s stored n a combnaton of W a that s nvarant under transformatons of V ab. The smplest combnaton s a product, (W W T ) j = a W a W aj. (42) It s an N v N v = matrx, and ndependent of the sze of N h. But ts property depends on N h because the rank of W W T must be always smaller than mn(n v, N h ). Thus, f N h < N v, the weght matrx s strongly constraned; e.g. a unt matrx W W T = 1 s not allowed. Ths smplest product (42) plays an mportant role n the dynamcs of the flow generated by the RBM. It can be shown as follows. If the bases are gnored, the condtonal probablty (15) and the expectaton value (17) for h a n the background of v become p({h a } {v }) = e ( ) v W a h a 2 cosh ( v W a ), h a = tanh v W a. (43) In p({h a } {v }), a combnaton v W a =: B a can be regarded as an external magnetc feld for h a. Thus these two varables, B a and h a, tend to correlate wth each other. Namely, the probablty p({h a } {v }) becomes larger when they have the same sgn. Moreover, for B a < 1, h a s approxmated by B a and we can roughly dentfy these two varables, h a B a := v W a. (44) It s usually not a good approxmaton snce weghts can have larger values, but let us assume ths for the moment. For a large value of B a, h a s saturated at h a = B a / B a. Suppose that the nput confguraton s gven by {v (0) } = {σ A }. If Eq. (44) s employed, we have h (1) a = B a (0) = v(0) W a. Then the condtonal probablty (16) n the background of h (1) a wth b (h) a = 0, p({v } {h (1) a }) = 2 cosh e a v W a h (1) a ( a W ah (1) a ) (45) 10 Snce spn varables on each lattce ste are restrcted to take values ±1, the matrces, U j and V ab, are elements of the symmetrc group, not the orthogonal Le group. 22

23 can be approxmated as p({v } {h (1) a }) e a v W a j W jav (0) j j W jav (0) j 2 cosh ( a W a ) = 2 cosh e j v (W W T ) j v (0) j ( j (W W T ) j v (0) j ). (46) } so that the probablty dstrbuton p reproduces the A δ(v σ A ). Therefore, tranng of the RBM wll be performed so as to enhance the value A,,j σa(0) (W W T ) j σ A(0) j. Ths means that W s chosen so that (W W T ) j wll reflect the spn correlatons of the nput confguratons {σ A } at ste and j. The RBM learns the nput data {v (0) j probablty dstrbuton of the ntal data, q({v }) = 1 N In ths smplfed dscusson, learnng of the RBM s performed through the combnaton W W T. Of course, we neglected the nonlnear property of the neural network and the above statement cannot be justfed as t s. Nevertheless, we wll fnd below that the analyss of W W T s qute useful to understand how the RBM works. 4.2 Spn correlatons n W W T In Fg. 12, we plot values of matrx elements of the matrx W W T. These three fgures correspond to the RBMs wth dfferent szes of N h. We can see that they have large values n the dagonal and near dagonal elements. Note that the spn varables n the vsble layer, σ x,y wth x, y = 1,..., L = 10, are lned up as (σ 1,1, σ 1,2,..., σ 1,L, σ 2,1,..., σ 2,L, σ 3,1,..., σ L,L ), and named (σ 1, σ 2,..., σ N ). Hence lattce ponts and j of σ ( = 1,..., L 2 ) are adjacent to each other when j = ± 1 and j = ± L. In the followng, we mostly dscuss the type V RBM unless otherwse stated. Fgure 12: Elements of W W T when the hdden layer has 16 (left), 100 (center), 400 (rght) neurons. As dscussed above, the product of weght matrces W W T must reflect correlatons between spn varables of the nput confguratons used for the tranng of the RBM. The most strong correlaton n v (0) (W W T ) j v (0) j s of course the dagonal component, = j. Thus we 23

24 expect that the matrx W W T wll have large dagonal components. Indeed, such behavor can be seen n Fg. 12. In partcular, for N h = 400 > N v = 100 (the rghtmost fgure), W W T s clearly close to a dagonal matrx. It s almost true for the case of N h = 100 = N v (the mddle fgure). However, for N h = 16 < N v = 100 (the leftmost fgure), t s dfferent from a unt matrx and off-dagonal components of (W W T ) j also have large values, n partcular, at j = + 1 and j = + 2. Ths behavor must be a reflecton of the spn correlatons of the nput confguratons 11. It s also a reflecton of the fact that the rank of W W T s smaller than N h and W W T cannot be a unt matrx f N h < N v. Thus even though only less nformaton can be stored n the weght matrx for a smaller number of hdden neurons, the relevant nformaton of the spn correlatons s well encoded n the weght matrx of the RBM wth N h < N v compared wth the RBM wth larger N h. Then we wonder why such relevant nformaton s lost n the RBM wth N h > N v. Ths queston mght be related to our second conjecture proposed at the end of Sec. 3.2 that the RBM wth very large N h wll learn too much rrelevant nformaton, namely noses of the nput confguratons. It s nterestng and a bt surprsng that the RBM wth fewer hdden neurons seems to learn more effcently the relevant nformaton of the spn correlatons. In order to further confrm the relaton between the correlatons n the combnaton of the weght matrx W W T and the spn correlatons of the nput confguratons, we wll study structures of the weght matrces of other types of RBMs. In Fg. 13, we plot behavors of the off-dagonal components of W W T for varous RBMs. Each RBM s traned by confguratons at a sngle temperature T = 0 (type L), T = 2, T = 3 and T = 6 respectvely. The sze of the hdden layer s set to N h = 16. For comparson, we also plot the behavor of the off-dagonal components for the type V RBM. Fg. 13 shows that the correlaton of W W T decays more rapdly at hgher temperature, whch s consstent wth the expected behavor of spn correlatons. Therefore, the RBM seems to learn correctly about the correlaton length, or the sze of clusters, whch becomes smaller at hgher temperature. Furthermore, we fnd that, for the type V RBM that has learned all temperatures T = 0,..., 6, the off-dagonal elements decrease wth the decay rate between the T = 2 case and the T = 3 case. Ths ndcates that the type V RBM has acqured smlar features to those of the confguratons around T c = It s consstent wth the numercal results of Fgs. 6, 7 and 8, and gves another crcumstantal evdence supportng for the frst conjecture n Sec Magnetzaton and sngular value decomposton (SVD) Informaton of the weght matrx W can be nferred by usng the method of the sngular value decomposton (See, e.g., [26, 27]). Suppose that the matrx W W T has egenvalues λ a 11 Off-dagonal components of j = + L or j = + 2L are also large, whch corresponds to correlatons along y-drecton. Large off-dagonal components at j = + 1 and j = + 2 mean correlatons along x-drecton. 24

25 Fgure 13: Averaged values of the off-dagonal components of W W T (normalzed by the dagonal components). Each colored lne corresponds to the RBM that has learned confguratons at a sngle temperature T = 0, 2, 3, 6 respectvely. The black lne (the most mddle lne) s the behavor of the type V RBM that has learned all the temperatures T = 0,..., 6. (a = 1,..., N v ) wth correspondng egenvectors u a ; W W T u a = λ a u a. (47) Decomposng an nput confguraton vector v (0) n terms of the egenvectors u a as v (0) = a c au a wth a normalzaton condton a (c a) 2 = 1, we can rewrte v (0)T W W T v (0) as v (0)T W W T v (0) = a c 2 aλ a. (48) Thus f a vector v (0) contans more components wth larger egenvalues of W W T, the quantty v (0)T W W T v (0) becomes larger. Fg. 14 shows averaged values of v (0)T W W T v (0) over the 1000 confguratons {v (0) } at each temperature. For comparson between dfferent RBMs, we subtracted the values at T = 6. The fgure shows a bg change near the crtcal pont, whch s remnscent of the magnetzaton of Isng model. Snce v (0)T W W T v (0) should contan more nformaton than the magnetzaton tself, the behavor cannot be exactly the same. But t s qute ntrgung that Fg. 14 shows smlar behavor to the magnetzaton 12. It mght be because the quantty contans much nformaton about the lower temperature after subtracton of the values at 12 The behavor ndcates that the prncpal egenvectors wth large egenvalues mght be related to the magnetzaton, and nformaton about the phase transton s surely mported n the weght matrx. Thus we nvestgated propertes of the egenvectors but so far we have not got any physcally reasonable pctures. We want to come back to ths problem n future works. 25

26 Fgure 14: Averaged values of v (0)T W W T v (0) over the 1000 nput confguratons at each temperature. Dfferent colors correspond to type V RBMs wth dfferent number of hdden neurons N h. In ths fgure, the values at T = 6 are subtracted for comparson between dfferent RBMs. hgher temperature 13. In order to see the propertes of v (0)T W W T v (0) more than the magnetzaton n Fg. 14, we plot the same quanttes but wthout subtractng the values at T = 6. Fg. 15 shows two cases for N h = 64 and N h = 225. These fgures show that, at hgh temperature, the RBM wth large N h n the rght panel has larger components of the prncpal egenvectors compared to the RBM wth small N h n the left panel. The dfference must have caused the dfferent behavors n the RBM flows shown n Fg. 6 (N h = 64) and Fg. 9 (N h = 225). Namely the former RBM flow approaches the crtcal temperature T c, whle the latter eventually goes towards hgher temperature. The dfference of two fgures n Fg. 14 ndcate that the RBM wth larger N h seems to have learned more characterstc features at hgh temperatures than the RBM wth fewer N h. Then, does the RBM wth small N h fal to learn the features of hgh temperatures? Whch RBM s more adequate for feature extractons? Although t s dffcult to answer whch s more adequate wthout specfyng what we want the machne to learn, we beleve that the RBM wth N h < N v properly learns all the features of varous temperatures whle the RBM wth N h > N v has learned too much rrelevant features of hgh temperature. Ths s nothng but the second conjecture n Sec. 3.2, and supported by the behavors of correlatons n W W T dscussed n Sec It suggests that the subtracton may correspond to removng the contrbutons of the specfc features at hgher temperature. 26

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model EXACT OE-DIMESIOAL ISIG MODEL The one-dmensonal Isng model conssts of a chan of spns, each spn nteractng only wth ts two nearest neghbors. The smple Isng problem n one dmenson can be solved drectly n several

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Density matrix. c α (t)φ α (q)

Density matrix. c α (t)φ α (q) Densty matrx Note: ths s supplementary materal. I strongly recommend that you read t for your own nterest. I beleve t wll help wth understandng the quantum ensembles, but t s not necessary to know t n

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Inductance Calculation for Conductors of Arbitrary Shape

Inductance Calculation for Conductors of Arbitrary Shape CRYO/02/028 Aprl 5, 2002 Inductance Calculaton for Conductors of Arbtrary Shape L. Bottura Dstrbuton: Internal Summary In ths note we descrbe a method for the numercal calculaton of nductances among conductors

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2 Salmon: Lectures on partal dfferental equatons 5. Classfcaton of second-order equatons There are general methods for classfyng hgher-order partal dfferental equatons. One s very general (applyng even to

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

EPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski

EPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski EPR Paradox and the Physcal Meanng of an Experment n Quantum Mechancs Vesseln C Nonnsk vesselnnonnsk@verzonnet Abstract It s shown that there s one purely determnstc outcome when measurement s made on

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1 C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Canonical transformations

Canonical transformations Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Uncertainty and auto-correlation in. Measurement

Uncertainty and auto-correlation in. Measurement Uncertanty and auto-correlaton n arxv:1707.03276v2 [physcs.data-an] 30 Dec 2017 Measurement Markus Schebl Federal Offce of Metrology and Surveyng (BEV), 1160 Venna, Austra E-mal: markus.schebl@bev.gv.at

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced, FREQUENCY DISTRIBUTIONS Page 1 of 6 I. Introducton 1. The dea of a frequency dstrbuton for sets of observatons wll be ntroduced, together wth some of the mechancs for constructng dstrbutons of data. Then

More information

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Dynamical Systems and Information Theory

Dynamical Systems and Information Theory Dynamcal Systems and Informaton Theory Informaton Theory Lecture 4 Let s consder systems that evolve wth tme x F ( x, x, x,... That s, systems that can be descrbed as the evoluton of a set of state varables

More information

V.C The Niemeijer van Leeuwen Cumulant Approximation

V.C The Niemeijer van Leeuwen Cumulant Approximation V.C The Nemejer van Leeuwen Cumulant Approxmaton Unfortunately, the decmaton procedure cannot be performed exactly n hgher dmensons. For example, the square lattce can be dvded nto two sublattces. For

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

The Feynman path integral

The Feynman path integral The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Supplementary Notes for Chapter 9 Mixture Thermodynamics

Supplementary Notes for Chapter 9 Mixture Thermodynamics Supplementary Notes for Chapter 9 Mxture Thermodynamcs Key ponts Nne major topcs of Chapter 9 are revewed below: 1. Notaton and operatonal equatons for mxtures 2. PVTN EOSs for mxtures 3. General effects

More information

5 The Rational Canonical Form

5 The Rational Canonical Form 5 The Ratonal Canoncal Form Here p s a monc rreducble factor of the mnmum polynomal m T and s not necessarly of degree one Let F p denote the feld constructed earler n the course, consstng of all matrces

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

This model contains two bonds per unit cell (one along the x-direction and the other along y). So we can rewrite the Hamiltonian as:

This model contains two bonds per unit cell (one along the x-direction and the other along y). So we can rewrite the Hamiltonian as: 1 Problem set #1 1.1. A one-band model on a square lattce Fg. 1 Consder a square lattce wth only nearest-neghbor hoppngs (as shown n the fgure above): H t, j a a j (1.1) where,j stands for nearest neghbors

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS IJRRAS 8 (3 September 011 www.arpapress.com/volumes/vol8issue3/ijrras_8_3_08.pdf NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS H.O. Bakodah Dept. of Mathematc

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Workshop: Approximating energies and wave functions Quantum aspects of physical chemistry

Workshop: Approximating energies and wave functions Quantum aspects of physical chemistry Workshop: Approxmatng energes and wave functons Quantum aspects of physcal chemstry http://quantum.bu.edu/pltl/6/6.pdf Last updated Thursday, November 7, 25 7:9:5-5: Copyrght 25 Dan Dll (dan@bu.edu) Department

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

A how to guide to second quantization method.

A how to guide to second quantization method. Phys. 67 (Graduate Quantum Mechancs Sprng 2009 Prof. Pu K. Lam. Verson 3 (4/3/2009 A how to gude to second quantzaton method. -> Second quantzaton s a mathematcal notaton desgned to handle dentcal partcle

More information

Mathematical Preparations

Mathematical Preparations 1 Introducton Mathematcal Preparatons The theory of relatvty was developed to explan experments whch studed the propagaton of electromagnetc radaton n movng coordnate systems. Wthn expermental error the

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

763622S ADVANCED QUANTUM MECHANICS Solution Set 1 Spring c n a n. c n 2 = 1.

763622S ADVANCED QUANTUM MECHANICS Solution Set 1 Spring c n a n. c n 2 = 1. 7636S ADVANCED QUANTUM MECHANICS Soluton Set 1 Sprng 013 1 Warm-up Show that the egenvalues of a Hermtan operator  are real and that the egenkets correspondng to dfferent egenvalues are orthogonal (b)

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Physics 5153 Classical Mechanics. Principle of Virtual Work-1

Physics 5153 Classical Mechanics. Principle of Virtual Work-1 P. Guterrez 1 Introducton Physcs 5153 Classcal Mechancs Prncple of Vrtual Work The frst varatonal prncple we encounter n mechancs s the prncple of vrtual work. It establshes the equlbrum condton of a mechancal

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia Usng deep belef network modellng to characterze dfferences n bran morphometry n schzophrena Walter H. L. Pnaya * a ; Ary Gadelha b ; Orla M. Doyle c ; Crstano Noto b ; André Zugman d ; Qurno Cordero b,

More information

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng

More information

RG treatment of an Ising chain with long range interactions

RG treatment of an Ising chain with long range interactions RG treatment of an Isng chan wth long range nteractons Davd Ramrez Department of Physcs Undergraduate Massachusetts Insttute of Technology, Cambrdge, MA 039, USA An analyss of a one-dmensonal Isng model

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Explicit constructions of all separable two-qubits density matrices and related problems for three-qubits systems

Explicit constructions of all separable two-qubits density matrices and related problems for three-qubits systems Explct constructons of all separable two-qubts densty matrces and related problems for three-qubts systems Y. en-ryeh and. Mann Physcs Department, Technon-Israel Insttute of Technology, Hafa 2000, Israel

More information

Advanced Quantum Mechanics

Advanced Quantum Mechanics Advanced Quantum Mechancs Rajdeep Sensarma! sensarma@theory.tfr.res.n ecture #9 QM of Relatvstc Partcles Recap of ast Class Scalar Felds and orentz nvarant actons Complex Scalar Feld and Charge conjugaton

More information

PARTICIPATION FACTOR IN MODAL ANALYSIS OF POWER SYSTEMS STABILITY

PARTICIPATION FACTOR IN MODAL ANALYSIS OF POWER SYSTEMS STABILITY POZNAN UNIVE RSITY OF TE CHNOLOGY ACADE MIC JOURNALS No 86 Electrcal Engneerng 6 Volodymyr KONOVAL* Roman PRYTULA** PARTICIPATION FACTOR IN MODAL ANALYSIS OF POWER SYSTEMS STABILITY Ths paper provdes a

More information

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm Desgn and Optmzaton of Fuzzy Controller for Inverse Pendulum System Usng Genetc Algorthm H. Mehraban A. Ashoor Unversty of Tehran Unversty of Tehran h.mehraban@ece.ut.ac.r a.ashoor@ece.ut.ac.r Abstract:

More information

CHAPTER III Neural Networks as Associative Memory

CHAPTER III Neural Networks as Associative Memory CHAPTER III Neural Networs as Assocatve Memory Introducton One of the prmary functons of the bran s assocatve memory. We assocate the faces wth names, letters wth sounds, or we can recognze the people

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Lecture 5.8 Flux Vector Splitting

Lecture 5.8 Flux Vector Splitting Lecture 5.8 Flux Vector Splttng 1 Flux Vector Splttng The vector E n (5.7.) can be rewrtten as E = AU (5.8.1) (wth A as gven n (5.7.4) or (5.7.6) ) whenever, the equaton of state s of the separable form

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

STATISTICAL MECHANICAL ENSEMBLES 1 MICROSCOPIC AND MACROSCOPIC VARIABLES PHASE SPACE ENSEMBLES. CHE 524 A. Panagiotopoulos 1

STATISTICAL MECHANICAL ENSEMBLES 1 MICROSCOPIC AND MACROSCOPIC VARIABLES PHASE SPACE ENSEMBLES. CHE 524 A. Panagiotopoulos 1 CHE 54 A. Panagotopoulos STATSTCAL MECHACAL ESEMBLES MCROSCOPC AD MACROSCOPC ARABLES The central queston n Statstcal Mechancs can be phrased as follows: f partcles (atoms, molecules, electrons, nucle,

More information

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

Lecture 7: Boltzmann distribution & Thermodynamics of mixing Prof. Tbbtt Lecture 7 etworks & Gels Lecture 7: Boltzmann dstrbuton & Thermodynamcs of mxng 1 Suggested readng Prof. Mark W. Tbbtt ETH Zürch 13 März 018 Molecular Drvng Forces Dll and Bromberg: Chapters

More information

COS 511: Theoretical Machine Learning

COS 511: Theoretical Machine Learning COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that

More information

Einstein-Podolsky-Rosen Paradox

Einstein-Podolsky-Rosen Paradox H 45 Quantum Measurement and Spn Wnter 003 Ensten-odolsky-Rosen aradox The Ensten-odolsky-Rosen aradox s a gedanken experment desgned to show that quantum mechancs s an ncomplete descrpton of realty. The

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

THERMAL PHASE TRANSITIONS AND GROUND STATE PHASE TRANSITIONS: THE COMMON FEATURES AND SOME INTERESTING MODELS

THERMAL PHASE TRANSITIONS AND GROUND STATE PHASE TRANSITIONS: THE COMMON FEATURES AND SOME INTERESTING MODELS THERMAL PHASE TRANSITIONS AND GROUND STATE PHASE TRANSITIONS: THE COMMON FEATURES AND SOME INTERESTING MODELS Ján Greguš Department of Physcs, Faculty of the Natural Scences, Unversty of Constantne the

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

Lecture 4. Macrostates and Microstates (Ch. 2 )

Lecture 4. Macrostates and Microstates (Ch. 2 ) Lecture 4. Macrostates and Mcrostates (Ch. ) The past three lectures: we have learned about thermal energy, how t s stored at the mcroscopc level, and how t can be transferred from one system to another.

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys

More information

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable

More information

Susceptibility and Inverted Hysteresis Loop of Prussian Blue Analogs with Orthorhombic Structure

Susceptibility and Inverted Hysteresis Loop of Prussian Blue Analogs with Orthorhombic Structure Commun. Theor. Phys. 58 (202) 772 776 Vol. 58, No. 5, November 5, 202 Susceptblty and Inverted Hysteress Loop of Prussan Blue Analogs wth Orthorhombc Structure GUO An-Bang (ÁËǑ) and JIANG We ( å) School

More information

Lecture Note 3. Eshelby s Inclusion II

Lecture Note 3. Eshelby s Inclusion II ME340B Elastcty of Mcroscopc Structures Stanford Unversty Wnter 004 Lecture Note 3. Eshelby s Incluson II Chrs Wenberger and We Ca c All rghts reserved January 6, 004 Contents 1 Incluson energy n an nfnte

More information

Temperature. Chapter Heat Engine

Temperature. Chapter Heat Engine Chapter 3 Temperature In prevous chapters of these notes we ntroduced the Prncple of Maxmum ntropy as a technque for estmatng probablty dstrbutons consstent wth constrants. In Chapter 9 we dscussed the

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information