On Autoencoders and Score Matching for Energy Based Models

Size: px
Start display at page:

Download "On Autoencoders and Score Matching for Energy Based Models"

Transcription

1 On Autoencoders and Score Matchng for Energy Based Models Kevn Swersky* Marc Aurelo Ranzato Davd Buchman* Benjamn M. Marln* Nando de Fretas* *Department of Computer Scence, Uersty of Brtsh Columba, Vancouver, BC V6T Z4, Canada Department of Computer Scence, Uersty of Toronto, Toronto, ON M5S G4, Canada Abstract We consder estmaton methods for the class of contnuous-data energy based models EBMs). Our man result shows that estmatng the parameters of an EBM usng score matchng when the condtonal dstrbuton over the vsble unts s Gaussan corresponds to tranng a partcular form of regularzed autoencoder. We show how dfferent Gaussan EBMs lead to dfferent autoencoder archtectures, provdng deep lnks between these two famles of models. We compare the score matchng estmator for the mpot model, a partcular Gaussan EBM, to several other tranng methods on a varety of tasks ncludng mage denosng and unsupervsed feature extracton. We show that the regularzaton functon nduced by score matchng leads to superor classfcaton performance relatve to a standard autoencoder. We also show that score matchng yelds classfcaton results that are ndstngushable from better-known stochastc approxmaton maxmum lkelhood estmators.. Introducton In ths work, we consder a rch class of probablstc models called energy based models EBMs) LeCun et al., 006; Teh et al., 003; Hnton, 00). These models defne a probablty dstrbuton though an exponentated energy functon. Markov Random Felds MRFs) and Restrcted Boltzmann Machnes RBMs) are the most common nstance of such models and have Appearng n Proceedngs of the 8 th Internatonal Conference on Machne Learnng, Bellevue, WA, USA, 0. Copyrght 0 by the authors)/owners). a long hstory n partcular applcaton areas ncludng modelng natural mages. Recently, more sophstcated latent varable EBMs for contnuous data ncludng the PoT Wellng et al., 003), mpot Ranzato et al., 00b), mcrbm Ranzato & Hnton, 00), FoE Schmdt et al., 00) and others have become popular models for learnng representatons of natural mages as well as other sources of real-valued data. Such models, also called gated MRFs, leverage latent varables to represent hgher order nteractons between the nput varables. In the very actve research area of deep learnng Hnton et al., 006), these models been employed as elementary buldng blocks to construct herarchcal models that acheve very promsng performance on several perceptual tasks Ranzato & Hnton, 00; Bengo, 009). Maxmum lkelhood estmaton s the default parameter estmaton approach for probablstc models due to ts optmal theoretcal propertes. Unfortunately, maxmum lkelhood estmaton s computatonally nfeasble n many EBM models due to the presence of an ntractable normalzaton term the partton functon) n the model probablty. Ths term arses n EBMs because the exponentated energes do not automatcally ntegrate to unty, unlke drected models parameterzed by products of locally normalzed condtonal dstrbutons Bayesan networks). Several alternatve methods have been proposed to estmate the parameters of an EBM wthout the need for computng the partton functon. One partcularly nterestng method s called score matchng SM) Hyvärnen, 005). The score matchng objectve functon s constructed from an L loss on the dfference between the dervatves of the log of the model and emprcal dstrbuton functons wth respect to the nputs. Hyvärnen 005) showed that ths results n a cancellaton of the

2 Autoencoders and Score Matchng partton functon. Further manpulaton yelds an estmator that can be computed analytcally and s provably consstent. Autoencoder neural networks are another class of models that are often used to model hgh-dmensonal realvalued data Hnton & Zemel, 994; Vncent et al., 008; Vncent, 0; Kngma & LeCun, 00). Both EBMs and autoencoders are unsupervsed models that can be thought of as learnng to re-represent nput data n a latent space. In contrast to probablstc EBMs, autoencoders are determnstc and feed-forward. As a result, autoencoders can be traned to reconstruct ther nput through one or more hdden layers, they have fast feed-forward nference for hdden layer states, and all common tranng losses lead to computatonally tractable model estmaton methods. In order to learn better representatons, autoencoders are often modfed by tyng the weghts between the nput and output layers to reduce the number of parameters, ncludng addtonal terms n the objectve to bas learnng toward sparse hdden unt actvatons, and addng nose to nput data to ncrease robustness Vncent et al., 008; Vncent, 0). Interestngly, Vncent 0) showed that a partcular knd of denosng autoencoder traned to mnmze an L reconstructon error can be nterpreted as Gaussan RBM traned usng Hyvärnen s score matchng estmator. In ths paper, we apply score matchng to a number of latent varable EBMs where the condtonal dstrbuton of the vsble unts gven the hdden unts s Gaussan. We show that the resultng estmaton algorthms can be nterpreted as mnmzng a regularzed L reconstructon error on the vsble unts. For Gaussan-bnary RBMs, the reconstructon term corresponds to a standard autoencoder wth ted weghts. For the mpot and mcrbm models, the reconstructon terms correspond to new autoencoder archtectures that take nto account the covarance structure of the nputs. Ths suggests a new way to derve novel autoencoder tranng crtera by applyng score matchng to the free energy of an EBM. We further generalze score matchng to arbtrary EBMs wth real-valued nput unts and show that ths vew leads to an ntutve nterpretaton for the regularzaton terms that appear n the score matchng objectve functon.. Score Matchng for Latent Energy Based Models A latent varable energy based model defnes a probablty dstrbuton over real valued data vectors v V R as follows: P v, h; θ) = exp E θv, h)), ) Zθ) where h H R n h are the latent varables, E θ v, h) s an energy functon parameterzed by θ Θ, and Zθ) s the partton functon. We refer to these models as latent energy based models. Ths general latent energy based model subsumes many specfc models for real-valued data such as Boltzmann machnes, exponental-famly harmonums Wellng et al., 005), factored RBMs and Product of Student s T PoT) models Memsevc & Hnton, 009; Ranzato & Hnton, 00; Ranzato et al., 00a;b). The margnal dstrbuton n terms of the free energy F θ v) s obtaned by ntegratng out the hdden varables as seen below. Typcally, but not always, ths margnalzaton can be carred out analytcally. P v; θ) = exp F θv)). ) Zθ) Maxmum lkelhood parameter estmaton s dffcult when Zθ) s ntractable. In EBMs the ntractablty of Zθ) arses due to the fact that t s a very hgh-dmensonal ntegral that often lacks a closed form soluton. In such cases, stochastc algorthms can be appled to approxmately maxmze the lkelhood and a varety of algorthms have been descrbed and evaluated Swersky et al., 00; Marln et al., 00) n the lterature ncludng contrastve dvergence CD) Hnton, 00), persstent contrastve dvergence PCD) Younes, 989; Teleman, 008), and fast persstent contrastve dvergence FPCD) Teleman & Hnton, 009). However, these methods often requre very careful hand-tunng of optmzaton-related parameters lke step sze, momentum, batch sze and weght decay, whch s complcated by the fact that the objectve functon can not be computed. The score matchng estmator was proposed by Hyvärnen 005) to overcome the ntractablty of Zθ) when dealng wth contnuous data. The score matchng objectve functon s defned through a score functon appled to the emprcal pv) and model p θ v) dstrbutons. The score functon for a generc dstrbuton pv) s gven by ψ pv)) = log pv) v = F θv) v = h E θv,h) v p θ h v)dh. The full objectve functon s gven below. Jθ) = E pv) ψ pv)) ψ p θ v))). 3)

3 Autoencoders and Score Matchng The beneft of optmzng Jθ) s that Zθ) cancels off n the dervatve of log p θ v) snce t s constant wth respect to each v. However, n the above form, Jθ) s stll ntractable due to the dependence on pv). Hyvärnen, shows that under weak regularty condtons Jθ) can be expressed n the followng form, whch can be tractably approxmated by replacng the expectaton over the emprcal dstrbuton by an emprcal average over the tranng set: Jθ) = E pv) ψ p θ v))) + ψ p θ v)) v. 4) In theoretcal stuatons where the regularty condtons on the dervatves of the emprcal dstrbuton are not satsfed, or n practcal stuatons where a fnte sample approxmaton to the expectaton over the emprcal dstrbuton s used, a smoothed verson of the score matchng estmator may be of nterest. Consder smoothng pv) usng a probablstc kernel q β v v ) wth bandwdth parameter β > 0. We obtan a new dstrbuton q β v) = q β v v ) pv )dv. Vncent 0) showed that applyng score matchng to q β v) s equvalent to the followng objectve functon where q β v, v ) = q β v v ) pv ): Qθ) =E qβ v,v ) ψ q β v v )) ψ p θ v))). 5) For the case where q β v v ) = N v v, β ).e. a Gaussan smoothng kernel wth varance β, ths s equvalent to the regularzed score matchng objectve proposed n Kngma & LeCun, 00). We refer to the objectve gven by Equaton 5 as denosng score matchng SMD). Although SMD s ntractable to evaluate analytcally, we can agan replace the ntegral over v by an emprcal average over a fnte sample of tranng data. We can then replace the ntegral over v by an emprcal average over samples v, whch can be easly drawn from q β v v ) for each tranng sample v. Compared to PCD and CD, SM and SMD gve tractable objectve functons that can be used to montor tranng progress. Whle SMD s not consstent, t does have sgnfcant computatonal advantages relatve to SM Vncent, 0). 3. Applyng and Interpretng Score Matchng For Latent EBMs We now derve score matchng objectves for several commonly used EBMs. In order to apply score matchng to a partcular EBM, one smply needs an expresson for the correspondng free energy. Example Score Matchng for Gaussanbnary RBMs: Here, the energy E θ v, h) s gven by: n v n h v σ W j h j n h b j h j + n v c v ), 6) where the parameters are θ = W, σ, b, c) and h j {0, }. Ths leads to the free energy F θ v): n v c v ) n h log + exp σ σ v σ W j + b j The correspondng score matchng objectve s: Jθ) = N σ N n v v n σ n= n h + Wj σ where ĥjn := sgm +exp x). c σ n h W j σ ĥ jn )) 7) ĥ jn ĥjn), 8) ) v n σ W j + b j and sgmx) := For a standardzed Normal model, wth c = 0 and σ =, ths objectve reduces to: Jθ) = N N n v n h v n W j ĥ jn n= n h + W jĥjn ĥjn), 9) The frst term corresponds to the quadratc reconstructon error of an autoencoder wth ted weghts. From ths we can see that ths type of of autoencoder, whch researchers have prevously treated as a dfferent model, can n fact be explaned by the applcaton of the score matchng estmaton prncple to Gaussan RBMs. Example Score matchng for mcrbm: The energy E θ v, h m, h c ) of the mcrbm model for each data pont ncludes mean Bernoull hdden unts h m j {0, } and covarance Bernoull hdden unts h c k {0, }. The latter allow one to model correlatons n the data v Ranzato & Hnton, 00; Ranzato et al., 00a). To ease the notaton, we wll gnore the ndex,

4 Autoencoders and Score Matchng n over the data. The energy for ths model s: n f n hc n v n hm n v P fk h c k C f v ) W j h m j v n hm f= n hc n v b m j h m j b c kh c k b v v + n v v, 0) where θ = b v, b m, b c, P, W, C). Ths leads to the free energy F θ v): n hc log + e φc k ) n hm n v log + e φm j ) b v v + n v v, ) where φ c k = nf f= P fk n v C f v ) + b c k and φm j = W jv + b m j. The correspondng score matchng objectve s: Jθ) = ψ p θ v)) n hc + n hm + n f ρĥc k )D k + ĥc k K k hm ˆ j h ˆ ) m j )W j ) ) n hc n ψ p θ v)) = ĥ c k D hm k + hˆ m j W j + b v v K k = P fk Cf f= n f ) ) D k = P fk C f v C f f= ĥ c k =sgm φc k) hˆ m j =sgm φ m ) j ρx) :=x x). = Example 3 Score matchng for mpot The energy E θ v, h m, h c ) of the mpot model s: n hc h c k + n v C k v ) ) + γ) logh c k) n v + v n v n hm b v v n hm h m j W j v b m j h m j, 3) where θ = γ, W, C, b v, b m ) and h c s a vector of Gamma covarance latent varables, C s a flter bank and γ s a scalar parameter. energy F θ v): n hc γ log + φc k) ) n hm Ths leads to the free n v log + e φm j ) b v v + n v v, 4) where φ c k = n v C kv and φ m j = n v W jv + b m j. The correspondng score matchng objectve Jθ) s equvalent to the objectve gven n Equaton wth the followng redefnton of terms: where I nhc P = I nhc ĥ c k =γϕφc k) 5) ˆ h m j =sgmφ m j ) 6) ϕx) := + x) ρx) :=x, s the n hc n hc dentty matrx. In each of these examples, we see that an objectve emerges whch seeks to mnmze a form of regularzed reconstructon error, and that the forms of these regularzers can end up beng qute dfferent. Rather than tryng to nterpret score matchng on a case by case bass, we provde a general theorem for all latent EBMs on whch score matchng can be appled: Theorem The score matchng objectve, Equaton 4), for a latent energy based model can be expressed succnctly n terms of ether the free energy or expectatons of the energy wth respect to the condtonal dstrbuton ph v). Specfcally, Jθ) =E pv) =E pv) + var pθ h v) ψ p θ v))) + ψ p θ v)) v ) Eθ v, h) E pθ h v) v Eθ v, h) E θ v, h) E pθ h v) v v Corollary If the energy functon of a latent EBM E θ v, h) takes the followng form: E θ v, h) = v µh))t Ωh)v µh)) + gh), where µh) s an arbtrary vector-valued functon of length n v, gh) s an arbtrary scalar functon, and.

5 Autoencoders and Score Matchng Ωh) s an n v n v postve-defnte matrx-valued functon, then the vector-valued score functon ψp θ v)) wll be: E pθ h v) Ωh)v µh)). As a result, the score matchng objectve can be expressed as: ) Jθ) =E pv) Epθ h v) Ωh)v µh)) +var pθ h v) Ωh)v µh)) E pθ h v) Ωh). The proofs of Theorem and Corollary are straghtforward, and can be found n an onlne appendx to ths paper. Corollary states that score matchng appled to a Gaussan latent EBM wll always result n a quadratc reconstructon term wth penaltes to mnmze the varance of the reconstructon and to maxmze the expected curvature of the energy wth respect to v. Ths shows that we can develop new autoencoder archtectures n a prncpled way by smply startng wth an EBM and applyng score matchng. One further connecton between the two models s that one step of gradent descent on the free energy F θ v) of an EBM corresponds to one feed-forward step of an autoencoder. To see ths, consder the mpot model. If we start at some vsble confguraton v and update a sngle dmenson : v t+) = v t) = v t) n hm + η F θv) v nhc + η ĥ c k D k hˆ m j W j + b v v t). Then settng η =, the v t) terms cancel and we get: n hc n v t+) = ĥ c k D hm k + hˆ m j W j + b v. 7) Ths corresponds to the reconstructon produced by mpot n ts score matchng objectve. In general, an autoencoder reconstructon can be produced by takng a sngle step of gradent descent along the free energy of ts correspondng EBM. smpaper-appendx.pdf 4. Experments In ths secton, we study several estmaton methods appled to the mpot model ncludng SM, SMD, CD, PCD, and FPCD wth the goal of uncoverng dfferences n the characterstcs of traned models due to varatons n tranng methods. For our experments, we used two datasets of mages. The frst dataset conssts of 8,000 color mage patches of sze 6x6 pxels randomly extracted from the Berkeley segmentaton dataset. We subtracted the per-patch means and appled PCA whtenng. We retaned 99% of the varance, correspondng to 05 egeectors. All estmaton methods were appled to the mpot model by tranng on mn-batches of sze 8 for 00 epochs of stochastc gradent descent. The second dataset, named CIFAR 0 Krzhevsky, 009), conssts of color mages of sze 3x3 pxels belongng to one of 0 categores. The task s to classfy a set of 0,000 test mages. CIFAR 0 s a subset of a larger dataset of tny mages Torralba et al., 008). Usng a protocol establshed n prevous work Krzhevsky, 009; Ranzato & Hnton, 00) we bult a tranng dataset of 8x8 color mage patches from ths larger dataset, ensurng there was no overlap wth CIFAR 0. The preprocessng of the data s exactly the same as for the Berkeley dataset, but here we use approxmately 800,000 mage patches and perform only 0 epochs of tranng. For our experments, we used the Theano package 3, and mpot 4 code from Ranzato et al., 00b). 4.. Objectve Functon Analyss From Corollary, we know that we can nterpret score matchng for mpot as tradng off reconstructon error, reconstructon varance and the expected curvature of the energy functon wth respect to the vsble unts. Ths experment, usng the Berkeley dataset, s desgned to determne how these terms evolve over the course of tranng and to what degree ther changes mpact the fnal model. Fgures a) and b) show the values of the three terms usng non-nosy nputs on each tranng epoch, as well as the overall objectve functon the sum of the 3 terms). Surprsngly, these results show that most of the tranng s olved wth maxmzng the expected curvature correspondng to a lower negatve curvature). In SM, each pont groupng/segbench/ publcatons/mpot/mpot.html

6 Autoencoders and Score Matchng 0.5 Total Recon Var Curve 00 0 Total Recon Var Curve Total Recon Var Curve Value Value Value a) SM terms 0 4 ) b) SMD terms c) Autoencoder terms Free energy dfference FPCD PCD CD SM SMD MSE FPCD PCD CD SM SMD MSE FPCD PCD CD SM SMD d) Free energy dfference e) Mean-feld denosng f) Bayesan denosng Fgure. a), b), c) Expected reconstructon error, reconstructon varance, and energy curvature for SM, SMD, and AE. Total represents the sum of these terms. d) Dfference of free energy between nosy and test mages. e) MSE of denosed test mages usng mean-feld. f) MSE of denosed test mages usng Bayesan MAP. s relatvely solated n v-space meanng that the objectve wll try to make the dstrbuton very peaked. In SMD, each pont exsts near a cloud of ponts and so the dstrbuton must be broader. From ths perspectve, SMD can be seen as a regularzed verson of SM that puts less emphass on changng the expected curvature. Ths also seems to gve SMD some room to reduce the reconstructon error. To examne the mpact of regularzaton, we traned an autoencoder AE) based on the mpot model usng the reconstructon gven by Equaton 7, whch corresponds to SM wthout the varance and curvature terms. Fgure c) shows that smply optmzng the reconstructon leaves the curvature almost arant, whch agrees wth the fndngs of Ranzato et al., 007). 4.. Denosng In our next set of experments, we compare models learned by each of the score matchng estmators wth models learned by the more commonly used stochastc estmators. For these experments, we traned mpot models correspondng to SM, SMD, FPCD, PCD, and CD. We compare the models n terms of the average free energy dfference between natural mage patches and patches corrupted by Gaussan nose. consder denosng natural mage patches. 5 We also Durng tranng, we hope that the probablty of natural mages wll ncrease whle that of other mages decreases. The free energy dfference between natural and other mages s equvalent to the log of ther probablty rato, so we expect the free energy dfference to ncrease durng tranng as well. Fgure d) shows the dfference n free energy between a test set of 0,000 mage patches from the Berkeley dataset, and the energy of the same mages corrupted by nose. For most estmators, the free energy dfference mproves as tranng proceeds, as expected. Interestngly, SM and SMD exhbt completely opposte behavors. SM seems to sgnfcantly ncrease the free energy dfference relatve to nearby nosy mages, correspondng to a dstrbuton that s peaked around natural mages. SMD, on the other hand, actually decreases the free energy dfference relatve to nearby nosy mages. In the next experment, we consder an mage denosng task. We take an mage patch v and add Gaussan whte nose, obtanng a nosy patch v. We then ap- 5 Note that for coenence, both tasks were performed n the PCA doman. We use a standard devaton of for the Gaussan nose n all cases.

7 Autoencoders and Score Matchng a) Mean flters b) Covarance flters Fgure. mpot flters learned usng dfferent estmaton methods: a) mean flters, b) covarance flters. ply each model to denose each patch v, obtanng a reconstructon ˆv. The frst denosng method, shown n Fgure e), computes a reconstructon ˆv by smulatng one step of a Markov chan usng a mean-feld approxmaton. That s, we frst compute h c k and hm j by Equatons 5 and 6 usng v as the nput. The reconstructon s the expectaton of the condtonal dstrbuton P θ v h c k, hm j ). The second method, shown n Fgure f), s the Bayesan MAP estmator: ˆv = arg mn F θ v) + λ v v v, 8) where λ s a scalar representng how close the reconstructon should reman to the nosy nput. We select λ by cross-valdaton. The results show that score matchng acheves the mnmum error usng both denosng approaches, however t quckly overfts as tranng proceeds. FPCD and PCD do not match the mnmum error of SM and also overft, albet to a lesser extent. CD and SMD do not appear to overft. However, we note that the mnmum error obtaned by SMD s sgnfcantly hgher than the mnmum error obtaned by SM usng both denosng methods. Ths s qute ntutve snce SMD s equvalent to estmatng the model usng a smoothed tranng dstrbuton that shfts mass onto nearby nosy mages Feature Extracton and Classfcaton One of the prmary uses for latent EBMs s to generate dscrmnatve features. Table shows the result of usng each method to extract features on the benchmark CIFAR 0 dataset. We follow the protocol of Ranzato & Hnton, 00) wth early stoppng. We use a valdaton set to select regularzaton parame- Table. Recognton accuracy on CIFAR 0. CD PCD FPCD SM SMD AE 64.6% 64.7% 65.5% 65.0% 64.7% 57.6% ters. Wth the excepton of AE, all methods appear to do well and the dfferences between them are not statstcally sgnfcant. AE, on the other hand, does sgnfcantly worse. Fnally, we show examples of flters learned by each method. Fgure a) shows a random subset of mean flters correspondng to the columns of W, whle Fgure b) shows a random subset of covarance flters correspondng to the columns of C. Interestngly, only FPCD and PCD show structure n the learned mean flters. In the covarance unts, all methods except AE learn localzed Gabor-lke flters. It s well known that obtanng nce lookng flters wll usually correlate wth good performance, but t s not always clear what leads to these flters. We have shown here that one way to obtan good qualtatve and quanttatve performance s to focus on approprately modelng the curvature of the energy wth respect to v. In ths context, the SM reconstructon and varance terms serve to ensure that the peaks of the dstrbuton occur around the tranng cases. 5. Concluson By applyng score matchng to the energy space of a latent EBM, as opposed to the free energy space, we gan an ntutve nterpretaton of the score matchng objectve. We can always break the objectve down nto three terms correspondng to expectatons under the condtonal dstrbuton of the hdden unts: reconstructon, reconstructon varance, and curvature. We have determned that for the Gaussan-bnary RBM, the reconstructon term wll always correspond to an autoencoder wth ted weghts. Whle autoencoders and RBMs were prevously consdered to be related, but separate models, ths analyss shows that they can be nterpreted as dfferent estmators appled to the same underlyng model. We also showed that one can derve novel autoencoders by applyng score matchng to more complex EBMs. Ths allows us to

8 Autoencoders and Score Matchng thnk about models n terms of EBMs before creatng a correspondng autoencoder to leverage fast nference. Furthermore, ths framework provdes gudance on selectng prncpled regularzaton functons for autoencoder tranng, leadng to mproved representatons. Our experments show that not only does score matchng yeld smlar performance to exstng estmaton methods when appled to classfcaton, but that shapng the curvature of the energy approprately may be mportant for generatng good features. Whle ths seems obvous for probablstc EBMs, t has prevously been dffcult to apply to autoencoders because they were not thought of as havng a correspondng energy functon. Now that we know whch statstcs may be mportant to montor durng tranng, t would be nterestng to see what happens when other heurstcs, such as sparsty, are appled to help generate nterpretable features. References Bengo, Y. Learnng deep archtectures for AI. Foundatons and Trends n Machne Learnng, ): 7, 009. Hnton, G.E. Tranng products of experts by mnmzng contrastve dvergence. Neural Computaton, 4:77 800, 00. Hnton, G.E. and Zemel, R.S. Autoencoders, mnmum descrpton length and Helmholtz free energy. In Advances n Neural Informaton Processng Systems, pp. 3 0, 994. Hnton, G.E., Osndero, S., and Teh, Y.W. A fast learnng algorthm for deep belef nets. Neural Computaton, 8 7):57 554, 006. Hyvärnen, A. Estmaton of non-normalzed statstcal models usng score matchng. Journal of Machne Learnng Research, 6: , 005. Kngma, D. and LeCun, Y. Regularzed estmaton of mage statstcs by score matchng. In Advances n Neural Informaton Processng Systems, 00. Krzhevsky, A. Learnng multple layers of features from tny mages, 009. MSc Thess, Dept. of Comp. Scence, U. of Toronto. LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., and Huang, F.J. A tutoral on energy-based learnng. In Predctng Structured Data. MIT Press, 006. Marln, B.M., Swersky, K., Chen, B., and de Fretas, N. Inductve prncples for restrcted Boltzmann machne learnng. In Artfcal Intellgence and Statstcs, pp , 00. Memsevc, R. and Hnton, G.E. Learnng to represent spatal transformatons wth factored hgher-order Boltzmann machnes. Neural Computaton, :473 49, 009. Ranzato, M. and Hnton, G.E. Modelng pxel means and covarances usng factorzed thrd-order Boltzmann machnes. In IEEE Computer Vson and Pattern Recognton, pp , 00. Ranzato, M., Boureau, Y.L., Chopra, S., and LeCun, Y. A unfed energy-based framework for unsupervsed learnng. In Artfcal Intellgence and Statstcs, 007. Ranzato, M., Krzhevsky, A., and Hnton, G.E. Factored 3- way restrcted Boltzmann machnes for modelng natural mages. In Artfcal Intellgence and Statstcs, pp. 6 68, 00a. Ranzato, M., Mnh, V., and Hnton, G.E. How to generate realstc mages usng gated MRF s. In Advances n Neural Informaton Processng Systems, pp , 00b. Schmdt, U., Gao, Q., and Roth, S. A generatve perspectve on MRFs n low-level vson. In IEEE Computer Vson and Pattern Recognton, 00. Swersky, K., Chen, B., Marln, B.M., and de Fretas, N. A tutoral on stochastc approxmaton algorthms for tranng restrcted Boltzmann machnes and deep belef nets. In Informaton Theory and Applcatons Workshop, pp. 0, 00. Teh, Y.W., Wellng, M., Osndero, S., and Hnton, G.E. Energy-based models for sparse overcomplete representatons. Journal of Machne Learnng Research, 4:35 60, 003. Teleman, T. Tranng restrcted Boltzmann machnes usng approxmatons to the lkelhood gradent. In Internatonal Conference on Machne Learnng, pp , 008. Teleman, T. and Hnton, G.E. Usng fast weghts to mprove persstent contrastve dvergence. In Internatonal Conference on Machne Learnng, 009. Torralba, A., Fergus, R., and Freeman, W.T. 80 mllon tny mages: A large dataset for non-parametrc object and scene recognton. IEEE Transactons on Pattern Analyss and Machne Intellgence, 30: , 008. Vncent, P. A connecton between score matchng and denosng autoencoders. Neural Computaton, To appear, 0. Vncent, P., Larochelle, H., Bengo, Y., and Manzagol, P.A. Extractng and composng robust features wth denosng autoencoders. In Internatonal Conference on Machne Learnng, pp , 008. Wellng, M., Hnton, G.E., and Osndero, S. Learnng sparse topographc representatons wth products of student-t dstrbutons. In Advances n Neural Informaton Processng Systems, 003. Wellng, M., Rosen-Zv, M., and Hnton, G.E. Exponental famly harmonums wth an applcaton to nformaton retreval. In Advances n Neural Informaton Processng Systems, 005. Younes, L. Parametrc nference for mperfectly observed Gbbsan felds. Probablty Theory and Related Felds, 84):65 645, 989.

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia Usng deep belef network modellng to characterze dfferences n bran morphometry n schzophrena Walter H. L. Pnaya * a ; Ary Gadelha b ; Orla M. Doyle c ; Crstano Noto b ; André Zugman d ; Qurno Cordero b,

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal

More information

Deep Learning: A Quick Overview

Deep Learning: A Quick Overview Deep Learnng: A Quck Overvew Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr http://mlg.postech.ac.kr/

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas On for Energy Based Models Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas Toronto Machine Learning Group Meeting, 2011 Motivation Models Learning Goal: Unsupervised

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probablstc & Unsupervsed Learnng Convex Algorthms n Approxmate Inference Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt Unversty College London Term 1, Autumn 2008 Convexty A convex

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Multi-Conditional Learning for Joint Probability Models with Latent Variables

Multi-Conditional Learning for Joint Probability Models with Latent Variables Mult-Condtonal Learnng for Jont Probablty Models wth Latent Varables Chrs Pal, Xueru Wang, Mchael Kelm and Andrew McCallum Department of Computer Scence Unversty of Massachusetts Amherst Amherst, MA USA

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

CS294A Lecture notes. Andrew Ng

CS294A Lecture notes. Andrew Ng CS294A Lecture notes Andrew Ng Sparse autoencoder 1 Introducton Supervsed learnng s one of the most powerful tools of AI, and has led to automatc zp code recognton, speech recognton, self-drvng cars, and

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The

More information

4.3 Poisson Regression

4.3 Poisson Regression of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1 Estmatng the Fundamental Matrx by Transformng Image Ponts n Projectve Space 1 Zhengyou Zhang and Charles Loop Mcrosoft Research, One Mcrosoft Way, Redmond, WA 98052, USA E-mal: fzhang,cloopg@mcrosoft.com

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We

2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We Advances n Neural Informaton Processng Systems 8 Actve Learnng n Multlayer Perceptrons Kenj Fukumzu Informaton and Communcaton R&D Center, Rcoh Co., Ltd. 3-2-3, Shn-yokohama, Yokohama, 222 Japan E-mal:

More information

Deep Learning. Boyang Albert Li, Jie Jay Tan

Deep Learning. Boyang Albert Li, Jie Jay Tan Deep Learnng Boyang Albert L, Je Jay Tan An Unrelated Vdeo A bcycle controller learned usng NEAT (Stanley) What do you mean, deep? Shallow Hdden Markov models ANNs wth one hdden layer Manually selected

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

Maxent Models & Deep Learning

Maxent Models & Deep Learning Maxent Models & Deep Learnng 1. Last bts of maxent (sequence) models 1.MEMMs vs. CRFs 2.Smoothng/regularzaton n maxent models 2. Deep Learnng 1. What s t? Why s t good? (Part 1) 2. From logstc regresson

More information

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise. Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

13 Principal Components Analysis

13 Principal Components Analysis Prncpal Components Analyss 13 Prncpal Components Analyss We now dscuss an unsupervsed learnng algorthm, called Prncpal Components Analyss, or PCA. The method s unsupervsed because we are learnng a mappng

More information

Unified Subspace Analysis for Face Recognition

Unified Subspace Analysis for Face Recognition Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

On mutual information estimation for mixed-pair random variables

On mutual information estimation for mixed-pair random variables On mutual nformaton estmaton for mxed-par random varables November 3, 218 Aleksandr Beknazaryan, Xn Dang and Haln Sang 1 Department of Mathematcs, The Unversty of Msssspp, Unversty, MS 38677, USA. E-mal:

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Probability Theory (revisited)

Probability Theory (revisited) Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Why BP Works STAT 232B

Why BP Works STAT 232B Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called

More information

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN S. Chtwong, S. Wtthayapradt, S. Intajag, and F. Cheevasuvt Faculty of Engneerng, Kng Mongkut s Insttute of Technology

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys

More information

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 - Chapter 9R -Davd Klenfeld - Fall 2005 9 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys a set

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Uncertainty as the Overlap of Alternate Conditional Distributions

Uncertainty as the Overlap of Alternate Conditional Distributions Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant

More information

The Feynman path integral

The Feynman path integral The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

2016 Wiley. Study Session 2: Ethical and Professional Standards Application 6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton

More information

Tracking with Kalman Filter

Tracking with Kalman Filter Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Supplementary Notes for Chapter 9 Mixture Thermodynamics

Supplementary Notes for Chapter 9 Mixture Thermodynamics Supplementary Notes for Chapter 9 Mxture Thermodynamcs Key ponts Nne major topcs of Chapter 9 are revewed below: 1. Notaton and operatonal equatons for mxtures 2. PVTN EOSs for mxtures 3. General effects

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information