Gaussian-Bernoulli Deep Boltzmann Machine

Size: px
Start display at page:

Download "Gaussian-Bernoulli Deep Boltzmann Machine"

Transcription

1 Gaussan-Bernoull Deep Boltzmann Machne KyungHyun Cho, Tapan Rao and Alexander Iln Department of Informaton and Computer Scence, Aalto Unversty School of Scence Emal: Abstract In ths paper, we study a model that we call Gaussan-Bernoull deep Boltzmann machne (GDBM) and dscuss potental mprovements n tranng the model. GDBM s desgned to be applcable to contnuous data and t s constructed from Gaussan-Bernoull restrcted Boltzmann machne (GRBM) by addng multple layers of bnary hdden neurons. The studed mprovements of the learnng algorthm for GDBM nclude parallel temperng, enhanced gradent, adaptve learnng rate and layer-wse pretranng. We emprcally show that they help avod some of the common dffcultes found n tranng deep Boltzmann machnes such as dvergence of learnng, the dffculty n choosng rght learnng rate schedulng, and the exstence of meanngless hgher layers. I. ITRODUCTIO Deep Boltzmann machne (DBM) [] s a recent extenson of the smple restrcted Boltzmann machne (RBM) n whch several RBMs are staced on top of each other. Unle n other models, such as deep belef networ or deep autoencoders (see, e.g., [2], [3]), n DBM, each neuron n the ntermedate layers receves both top-down and bottom-up sgnals, whch facltates propagaton of uncertanty durng the nference procedure []. The orgnal DBM s constructed such that each vsble neuron represents a bnary varable, that s DBM learns dstrbutons over bnary vectors. A popular approach to modelng real-valued data s normalzngeachnputvarableto[0,]andtreatngtasaprobablty (e.g., usng a gray-scale value of a pxel as a probablty [2], [4]). Ths approach s however restrctve and t fts best to bounded varables. In the orgnal DBM [], real-valued data are frst transformed nto bnary codes by tranng a model called Gaussan-Bernoull RBM (GRBM) [2], and DBM s learned for the bnary codes extracted from the data. Ths approach showed promsng results [], [], [6] but t may be benefcal to combne GRBM and DBM n a sngle model and allow ther ont optmzaton. In ths paper, we study a Gaussan-Bernoull deep Boltzmann machne (GDBM) whch uses Gaussan unts n the vsble layer of DBM. Even though dervng stochastc gradent s rather easy for GDBM, the tranng procedure can easly run nto problems wthout careful selecton of the learnng parameters. Ths s largely caused by the fact that GRBM s nown to be dffcult to tune, especally the varance parameters of the vsble neurons (see, e.g., [7]). We propose an algorthm for tranng GDBM based on the mprovements ntroduced for tranng GRBM n [8]. Also, we dscuss the unversal approxmator property of GDBM. The rest of the paper s organzed as follows. The GDBM modelsntroducednsectonii.insectoniii,wepresentthe tranng algorthm and explan how to compute the terms of the stochastc gradent usng mean-feld approxmatons and parallel temperng samplng. In Secton IV-A, we descrbe the update rules that are nvarant to the data representaton and more robust to the ntalzaton of the parameters. In Secton IV-B, we descrbe how we adapt the learnng rate usng the deas proposed n [9]. We frst presented most of ths wor as an ntroducton to GDBMs n the IPS Worshop on Deep Learnng and Unsupervsed Feature Learnng []. However, ths worshop has no proceedngs, and therefore our paper presented there s not a proper publcaton. After ths, GDBM has been used n applcatons such as multmodal learnng [] and mage denosng [2]. II. GAUSSIA-BEROULLI DEEP BOLTZMA MACHIE GDBM wth a sngle vsble layer and L hdden layers s parameterzed wth weghts W of synaptc connectons between the vsble layer and the frst hdden layer, W (l) between layers l and l+, bases b of the vsble layer, b (l) of each hdden layer l, and standard-devatons σ of the vsble neurons.foracertanstate[v T h ()T h (L)T ] T,theenergy s defned as: E(v,h (),,h (L) θ) = v = (v b ) 2 v l= = 2σ 2 L l b (l) h(l) v σ 2 = = L l and the correspondng probablty s ( p v, {h (l)} ) L θ = l= Z(θ) exp l+ l= = = ( E h () w h (l) h(l+) w (l) ( v, {h (l)} L l= )) θ, where v and l are the number of neurons n the vsble layer and the l-th hdden layer, respectvely, and Z(θ) s the normalzng constant dependent on the parameters θ of the GDBM. ote that we use the GRBM parameterzaton, ncludng learnng z = logσ 2 nstead of σ drectly, from [8]. ote also that GRBM s a specal case of GDBM wth L =. The states of the neurons n the same layer are ndependent of each other gven the adacent upper and lower layers. The ()

2 condtonal probablty of a vsble neuron s p(v h (),θ) = v h () w +b,σ 2, = where ( µ,σ 2 ) s a probablty densty of ormal dstrbuton wth a mean µ and a standard devaton σ, and the condtonal probabltes of the hdden neurons are ) P(h () v,h (2),θ) = f P(h (l) ( v σ 2 = h(l ),h (l+),θ) = l f = 2 w + l+ h (l ) w (l ) + = = h (2) w() +b() h (l+) w (l) where f( ) s a sgmod functon and L+ = 0. A. GDBM s a unversal approxmator +b(l), GRBM belongs to the famly of mxture of Gaussans (MoG) snce ts ont dstrbuton can be factorzed nto p(v,h () ) = P(h () )p(v h () ) where P(h () ) s the mxture coeffcents and p(v h () ) s Gaussan. MoGs are nown to be unversal approxmators [3]. However, the center ponts of the exponentally many Gaussans n the data space are defned by only a lnear number of parameters wth respect to the number of hdden unts, so not all MoGs can be wrtten as a GRBM. Gven a MoG, we could transform t nto a GRBM f we further constran that exactly one of the hdden unts s actve at a tme, that s, h() =. We set the columns of W to match the center ponts of the MoG and b to 0, and b () s set such that the margnals P(h () ) would match the mxng coeffcents of the MoG. As the fnal step, we mplement the restrcton h() = = ω for each, w () = 3ω for all = and w () = ω for usng another hdden layer. We set 2 = and b (2) all, and further subtract ω from each b (). As ω goes to nfnty, t s clear that the probablty of all states where h (2) h () or h() goes to zero and other states mplement the prevous GRBM wth the wanted restrcton. We have thus shown that any MoG can be modeled wth a GDBM, and GDBM s a unversal approxmator. III. TRAIIG GDBM GDBM can be traned wth the stochastc maxmzaton of the lelhood, where the lelhood functon s computed by margnalzng out all the hdden neurons. For each parameter θ, the partal-dervatve of the log-lelhood functon s L θ ( E(v (t),h θ) ) θ d ( E(v,h θ)), θ m (2), where d and m denote the expectaton over the data dstrbuton P(h {v (t) },θ) and the model dstrbuton P(v,h θ), respectvely.{v (t) } s a set of all the tranng samples. A. Computng expectaton over the data dstrbuton Computng the frst term of (2) s straghtforward for restrcted Boltzmann machnes because n that model the hdden neurons are ndependent of each other gven the vsble neurons. However, ths does not apply to GDBM and therefore one needs to use some sort of approxmaton. We employ the mean-feld approxmaton that was used for tranng bnary DBM n []. In the mean-feld approxmaton, the vsble neurons are fxed to a tranng data sample, and the state of each hdden neuron h (l) s descrbed wth ts probablty µ (l) of beng actve, whch s updated wth the followng fxed-pont teratons: µ (l) f l = l+ µ (l ) w (l ) + = µ (l+) w (l) +b(l), = v /σ 2, and the update rule for ote that 0 = v, µ (0) the top-most layer not contan the summaton term l+ =. Usng the mean-feld approxmaton s nown to ntroduce a bas. However, ths s a computatonally effcent scheme to capture only one mode of the posteror dstrbuton, whch can be desrable n many practcal applcatons []. B. Computng expectaton over the model dstrbuton: Parallel Temperng approach The second term of the gradent (2) can be computed usng Marov-chan Monte-Carlo (MCMC) samplng. The orgnal approach proposed n [4] uses persstent Gbbs samplng wth only a few steps of samplng at each update. Ths s equvalent to persstent contrastve dvergence(pcd) ntroduced for tranng RBM []. Unfortunately the persstent Gbbs samplng suffers from poor mxng of samples, whch results n the fact that traned models may have probablty mass n the areas whch are not represented n the tranng data (false modes) [6], [4], [], [7]. In our experments, we were able to observe that learnng can easly dverge when persstent Gbbs samplng s used (see Secton V). We therefore use parallel temperng recently proposed n the context of RBM and GRBM [6], [8], [8] as a samplng procedure for GDBM. Parallel temperng overcomes the poormxng problem by mantanng multple chans of samplng wth dfferent temperatures. In the chan wth a hgh temperature, partcles are more lely to explore the state space easly, whereas partcles n the chan wth low temperatures more closely follow the target model dstrbuton. We defne the tempered dstrbutons by varyng parameters θ of the orgnal GDBM (). We denote by θ β the parameters of the ntermedate models defned by nverse temperatures β, where β = 0 corresponds to the base (most dffuse) dstrbuton and β = corresponds to the target dstrbuton

3 defned by the orgnal GDBM. Defnng approprate ntermedate dstrbutons s qute straghtforward for bnary RBM [6], [8] but t s not as trval for models wth real-valued vsble unts [8]. In ths wor, we use the temperng scheme defned by the followng choce of θ β : b β = βb+( β)m, b (l) β = βb(l) (l ), σ β, = βσ 2 +( β)s2, W (l) β = βw(l), where m = [m ] v = and s2 are the means and varances estmated from the samples obtaned from all the tempered dstrbutons. Thus, the base dstrbuton s the Gaussan dstrbuton ftted to the samples from all the ntermedate chans. The proposed scheme assures that the swappng happens even f the target dstrbuton dverges from the data dstrbuton. Accordng to our experments, ths results n more stable learnng compared to the scheme proposed n [8]. Adaptng the temperature durng learnng can mprove mxng and therefore facltate learnng [9]. In our experments, we adapt the temperatures so as to mantan the numbers of partcle swaps between the consecutve chans as equal as possble. Ths s done by adustng the nverse temperatures {β, =,..., chans } after every swappng round usng the followng heurstc: β (t) ηβ (t ) +( η) = n chans = n, where η s the dampng factor, n denote the number of swaps between the chans defned by β and β +, and the hottest chan s ept at the ntal temperature, that s β = 0. Ths smple approach does not have much computatonal overhead and seems to mprove learnng, as we show n Secton V. The proposed scheme of adaptng the base dstrbuton and the temperatures seem to wor well n practce, although t may ntroduce a bas. The analyss of the possble bases of ths approach s a queston for further research. IV. IMPROVIG THE TRAIIG PROCEDURE In ths secton, we show how to mprove tranng of GDBM by adaptng several deas ntroduced for tranng RBM and deep networs. A. Enhanced gradent for GDBM Enhanced gradent was ntroduced recently to mae the update rules of bnary Boltzmann machnes nvarant to data representaton [], [9]. The gradent was derved by ntroducng bt-flppng transformatons and mang the update rule, whch follows from (2), nvarant to such transformatons. It has been shown to mprove learnng of RBM by mang the results less senstve to the learnng parameters and ntalzaton. The same deas can be used for enhancng the gradent n the models wth Gaussan vsble neurons such as GRBM and GDBM. Instead of the bt-flppng transformatons, one can transform the orgnal model by shftng each vsble unt as ṽ = v and correctng the bas terms accordngly: b = () b + and b = b () w σ 2, whch would result n an equvalent model. Followng the methodology of [9], one can select the shftng parameters such that the resultng gradent wth respect to weghts w (l) and bases b (l) do not contan the same terms. Ths yelds the followng update rules: ( ) ( ) v e w = Cov d σ 2,h () v Cov m σ 2,h (), ( ( ) e w (l) = Cov d h (l),h (l+) ) Cov m h (l),h (l+) e b = b e b () = b () e b (l) = b (l) h () dm h (2) e w, h (l+) h (l ) dm dm e w () e w (l) ( l < L), e w (l ) (l > ), v dm e w, dm where h (l) = 2 h (l) + 2 h (l) s the average dm d m actvty of a neuron under the data and model dstrbutons and v dm = 2 v /σ 2 d + 2 v /σ 2 m. Cov P(, ) s a covarance between two varables under the dstrbuton P defned as Cov P (v,h ) = v h P v P h P. B. Adaptve learnng rate Thechoceofthelearnngratetobeusedwththestochastc gradent (2) greatly affects the tranng procedure [2], [22], [9]. In order to avod ths effect of choosng a learnng rate and ts schedulng, we adopt the strategy of automatc adaptaton of the learnng rate, as proposed n [9]. The adaptaton s done based on the estmate of the lelhood computed usng the dentty p(v d θ η ) = p (v d θ η ) Z(θ) p (v θ η ) p, (3) (v θ) p(v θ) where θ η are the model parameters obtaned by updatng θ wth learnng rate η, p denotes an unnormalzed pdf such that p(v θ) = p (v θ)/z(θ) and the requred expectaton s approxmated usng samples from p(v θ). The unnormalzed probabltes p (v θ) are obtaned by ntegratng out the hdden neurons from the ont model: p (v θ) = h p (v,h), whch yelds a smple analytcal form n the case of RBM or GRBM. However, explct ntegraton of the hdden neurons s not tractable for GDBM and therefore one has to employ some approxmatons. We use the approxmaton E(v,h) E(v,µ), h

4 Model β Model Tempered chans Model Tempered chans Base Base Base Fg.. The left fgure shows the evoluton of the nverse temperatures durng the learnng whle β s fxed to 0. The mddle and rght fgures plot the number of swaps between a par of consecutve samples at each update (swap) whle the temperatures were adapted (mddle) and the temperatures were fxed at the equally spaced levels (rght). 0 where µ s the mean-feld approxmaton of the hdden actvatons whch s computed as dscussed n Secton III-A. ote that we use dfferent portons of data (mn-batches) for computng the stochastc gradent and adaptaton of the learnng rate, as was suggested n [9]. Therefore, the fxedpont teratons requred for computng the mean-feld approxmaton have to be run twce. However, ntalzng the meanfeld values wth samples from the model dstrbuton seems to yeld fast convergence. Also, mang only a few fxed-pont teratons (wthout convergence) seems to be enough to get stable behavor of the learnng rate adaptaton. C. Layer-wse pretranng Layer-wse pretranng s commonly used n deep networs to help obtan better models by ntalzng weghts sensbly [23]. DBM requres specal care durng the pretranng phase because the neurons n the ntermedate layers receve sgnal both from the upper and the lower layers, unle n deep belef networs [2]. Salahutdnov proposed cope wth ths problem by halvng the pretraned weghts n the ntermedate layers and duplcatng the vsble and topmost layers durng the pretranng [4]. The pretraned GRBM contanng the vsble layer has the followng energy: E(v,h () θ) = (v b ) 2 2σ 2/ v v σ 2 v c h () h () w, where v = 2 corresponds to duplcatng the vsble layer. Smlarly, the topmost RBM durng pretranng has the energy E(h (L ),h (L) θ) = where we also use h = 2. b h (L ) V. EXPERIMETS ( h c )h (L) h (L ) h (L) ( h w (L ) ), We tran the GDBM model on Olvett face dataset [24]. Out of 0 faces, we used the frst 0 faces of people for tranng and the remanng ones of people as test samples. We test the model n the problem of reconstructng the left half of the face from the rght half. We compared the reconstructon results usng the methodology presented n []. ote that the test set does not contan faces of people from the tranng set for better assessment of the generalzaton slls. The dataset was ntally normalzed such that each pxel has zero-mean and unt varance. We traned GDBM models wth three hdden layers, where each hdden layer had 00 neurons. Unless specfed otherwse, GDBM s traned usng pretranng, enhanced gradent and adaptve learnng rate. The sze of a mnbatch and the number of samples from the model dstrbuton were both set to 64. The ntal learnng rate and ts upper-bound were both set to 0.00 for pretranng and to for ont tranng of all the layers. Weght-decay of 0.00 was used both durng tranng RBM and pretranng DBM. The reconstructon was performed by fxng the nown half of the mage, computng the mean-feld approxmaton of the posteror dstrbuton over all the unnown neurons ncludng the mssng half of the mage. Adaptve Learnng Rate. Our experments confrmed that the learnng rate was able to adapt automatcally usng the proposed strategy. The upper plot n Fgure 3 shows that the algorthm very qucly fnds the approprate regon of learnng rates. After that, the learnng rate slowly decreases, whch s a desred property n stochastc optmzaton (see, e.g., [26]). Parallel Temperng. Parallel temperng yelds pretty good results (see Fg. 4) whle we were not able to acheve convergence of GDBM wth PCD. Fg. ndcates that proposed scheme of temperature adaptaton results n the ncreased and consstent number of swaps among all consecutve pars of tempered chans, and hence, n better mxng. Enhanced Gradent and Pretranng. One obvous way to chec whether the hgher layers of GDBM have learned any useful structure s to nspect the mean-feld approxmaton values of the neurons n those layers gven the tranng or test data samples. When no useful structure was learned, most hdden neurons n the top layer converge near whch means that nearly no sgnal was receved from the neghborng layers. On the other hands, when those neurons actually affect the modelng of the dstrbuton, they converge to the values close to ether 0 or.

5 0 0 0 Hdden neurons at layer Hdden neurons at layer Hdden neurons at layer Hdden neurons at layer Hdden neurons at layer Hdden neurons at layer Hdden neurons at layer Hdden neurons at layer Hdden neurons at layer 3 Fg. 2. The fgures vsualze the mean-feld values of the hdden neurons at dfferent hdden layers when the vsble neurons are fxed to the test data samples. The top fgures were obtaned usng the GDBM traned wthout any pretranng. The mddle and bottom fgures were obtaned from the GDBM traned wth both the pretranng and the ont-tranng usng ether usng the tradtonal gradent or the enhanced gradent, respectvely. All three GDBMs were traned for 2 epochs. One GDBM was traned wthout the layer-wse pretranng, startng from the random ntalzaton. In ths case t was clear that except for the frst hdden layer no upper layer was able to learnanyusefulstructure,asshownonthetoprowoffgure2. Most of approxmated values of the hdden neurons n the second and thrd layers are near. We traned the second GDBM by frst ntalzng t wth layer-wse pretranng. However, ths tme, we dd not use the enhanced gradent for ether the pretranng or the ont tranng.comparngwththetoprowoffgure2,tsapparent from the fgures of the mddle row that the hdden neurons n the upper layers are approxmated closer to 0 or when the layers were pretraned. It suggests that the pretranng s mportant n a sense that t enables the upper layers to learn the structure of the data dstrbuton. However, we were able to observe that many hdden neurons n the hgher layers (see the hdden layer 2, for nstance) do not contrbute much to the modelng of the dstrbuton, as they are ether always nactve (= 0) or always actve (= ). Ths behavor was already dscovered n case of RBM, and t was shown that the enhanced gradent can resolve the problem [9]. Thus, we traned yet another GDBM now by usng the enhanced gradent. The bottom row of Fgure 2 clearly ndcates that the enhanced gradent was able to address the problem. ow the hdden neurons n the layer 3 respond dfferently to the dstnct test samples, and by dong so, encourages the flow of the uncertanty between the hdden layer and the hdden layer 3, enablng the hdden neurons n the layer 3 to capture the structure also. Comparson wth other models. We traned prncpal component analyss (PCA) and GRBM on the same dataset. PCA used 0 prncpal components [], and GRBM had 00 hdden neurons. We lmted the number of the prncpal components to 0 because ncludng more components resulted n stronger overfttng and larger reconstructon errors. The lower plot n Fg. 3 shows the evoluton of the dfference between the orgnal test faces and the reconstructed

6 Orgnal mages PCA reconstructon GRBM reconstructon GDBM reconstructon Fg. 4. Reconstructed test samples usng PCA, GRBM and GDBM. η 3 4 of the reconstructed faces. GDBM traned only wth pretranng usng the dentcal GRBM tranng procedure, however, does not become overftted and performs better than GRBM, whch ndcates that the addtonal hdden layers help perform and generalze better. Furthermore, the ont tranng of GDBM by the proposed learnng algorthm mproves the performance sgnfcantly. It suggests that t s ndeed mportant to ontly tran all layers of GDBM n order to obtan a better generatve model. VI. DISCUSSIO RMSE PCA GRBM GDBM Fg. 3. The evoluton of the learnng rate when the adaptve learnng rate was used (left) and the reconstructon errors usng three dfferent methods: PCA, GRBM, and GDBM (rght). faces for each model. It ndcates that GRBM start overfttng qucly, whch results n the ncrease of the reconstructon error of the unseen faces. It s also evdent from the reconstructed faces n Fg. 4. Ultmately, GRBM performs worse than PCA evdenced by both RMSE and the vsual nspecton In ths paper, we descrbed Gaussan-Bernoull deep Boltzmann machne and dscussed ts unversal approxmator property. Based on the learnng algorthm for the bnary DBM [], we adapted three mprovements whch are parallel temperng, enhanced gradent, and adaptve learnng rate for tranng a GDBM. Through the experments we were able to emprcally show that GDBM traned usng these mprovements can acheve good generatve performance. Although they are not presented n ths paper, usng the same hyper-parameters for learnng faces, we were able to tran GDBM on other data sets such as ORB [27] and CIFAR- [7]. It clearly ndcates that the proposed mprovements mae learnng nsenstve to the choce of the learnng hyper-parameters and thus easer. However, the traned GDBMs were not able to produce any state-of-the-art classfcaton accuracy. The dscrepancy between the generatve capablty and the classfcaton performance of GDBM s left for the future research. Recently, several novel approaches have been proposed for effcently tranng DBM. Some of them are; adaptve MCMC samplng [], tempered-transton [4], usng a separate set of recognton weghts [6], centerng trc [28], two-stage pretranng algorthm [29] and metrc-free natural gradent method []. It wll be nterestng to see how they perform

7 when they are used for tranng GDBMs compared to the learnng algorthm proposed n ths paper. REFERECES [] R. Salahutdnov and G. E. Hnton, Deep Boltzmann machnes, n Proceedngs of the Internaton Conference on Artfcal Intellgence and Statstcs (AISTATS 09), 09, pp [2] G. E. Hnton and R. Salahutdnov, Reducng the dmensonalty of data wth neural networs, Scence, vol. 33, no. 786, pp , July 06. [3] Y. Bengo, Learnng deep archtectures for AI, Foundatons and Trends n Machne Learnng, vol. 2, no., pp. 27, 09. [4] B. Lae, R. Salahudnov, J. Gross, and J. Tenenbaum, One shot learnng of smple vsual concepts, n Proceedngs of the 33rd Annual Meetng of the Cogntve Scence Socety,. [] R. Salahutdnov, Learnng deep Boltzmann machnes usng adaptve MCMC, n Proceedngs of the 27th Internatonal Conference on Machne Learnng (ICML ), J. Fürnranz and T. Joachms, Eds. Hafa, Israel: Omnpress, June, pp [6] R. Salahutdnov and H. Larochelle, Effcent learnng of deep boltzmann machnes, n Proceedngs of the 27th Conference on Uncertanty n Artfcal Intellgence,. [7] A. Krzhevsy, Learnng multple layers of features from tny mages, Computer Scence Department, Unversty of Toronto, Tech. Rep., 09. [8] K. Cho, A. Iln, and T. Rao, Improved learnng of Gaussan-Bernoull restrcted Boltzmann machnes, n Proceedngs of the th Internatonal Conference on Artfcal eural etwors (ICA ),. [9] K. Cho, T. Rao, and A. Iln, Enhanced gradent and adaptve learnng rate for tranng restrcted Boltzmann machnes, n Proceedngs of the 28th Internatonal Conference on Machne Learnng(ICML ). ew Yor, Y, USA: ACM, June, pp. 2. [], Gaussan-Bernoull deep Boltzmann machne, n IPS Worshop on Deep Learnng and Unsupervsed Feature Learnng, Serra evada, Span, December. []. Srvastava and R. Salahutdnov, Multmodal learnng wth deep boltzmann machnes, n Advances n eural Informaton Processng Systems, P. Bartlett, F. Perera, C. Burges, L. Bottou, and K. Wenberger, Eds., 2, pp [2] K. Cho, Boltzmann machnes and denosng autoencoders for mage denosng, ArXv e-prnts, Jan. 3. [3] A. S. D.M. Ttterngton and U. Maov, Statstcal Analyss of Fnte Mxture Dstrbutons. ew Yor, London, Sydney: John Wley & Sons, 98. [4] R. Salahutdnov, Learnng n Marov random felds usng tempered transtons, n Advances n eural Informaton Processng Systems 22, Y. Bengo, D. Schuurmans, J. Lafferty, C. K. I. Wllams, and A. Culotta, Eds., 09, pp [] T. Teleman, Tranng restrcted Boltzmann machnes usng approxmatons to the lelhood gradent, n Proceedngs of the th Internaton Conference on Machne Learnng (ICML 08). ew Yor, Y, USA: ACM, 08, pp [6] G. Desardns, A. Courvlle, Y. Bengo, P. Vncent, and O. Delalleau, Parallel temperng for tranng of restrcted Boltzmann machnes, n Proceedngs of the Thrteenth Internatonal Conference on Artfcal Intellgence and Statstcs, ser. JMLR Worshop and Conference Proceedngs, Y.-W. Teh and M. Ttterngton, Eds. JMLR W&CP,, pp. 2. [7] K. Cho, Improved Learnng Algorthms for Restrcted Boltzmann Machnes, Master s thess, Aalto Unversty School of Scence,. [8] K. Cho, T. Rao, and A. Iln, Parallel temperng s effcent for learnng restrcted boltzmann machnes, n eural etwors (IJC), The Internatonal Jont Conference on,, pp. 8. [9] G. Desardns, A. Courvlle, and Y. Bengo, Adaptve parallel temperng for stochastc maxmum lelhood learnng of RBMs, n IPS Worshop on Deep Learnng and Unsupervsed Feature Learnng,. [] T. Rao, K. Cho, and A. Iln, Enhanced gradent for learnng boltzmann machnes (abstract), n The Learnng Worshop, Fort Lauderdale, Florda, Aprl. [2] A. Fscher and C. Igel, Emprcal analyss of the dvergence of Gbbs samplng based learnng algorthms for restrcted Boltzmann machnes, n Proceedngs of the th nternatonal conference on Artfcal neural networs: Part III, ser. ICA. Berln, Hedelberg: Sprnger-Verlag,, pp [22] H. Schulz, A. Müller, and S. Behne, Investgatng Convergence of Restrcted Boltzmann Machne Learnng, n IPS Worshop on Deep Learnng and Unsupervsed Feature Learnng,. [23] D. Erhan, Y. Bengo, A. Courvlle, P.-A. Manzagol, P. Vncent, and S. Bengo, Why does unsupervsed pre-tranng help deep learnng? Journal of Machne Learnng Research, vol., pp , Mar.. [24] F. Samara and A. Harter, Parametersaton of a stochastc model for human face dentfcaton, n Proceedngs of the Second IEEE Worshop on Applcatons of Computer Vson, 994., dec 994, pp [] H. Poon and P. Domngos, Sum-Product networs: A new deep archtecture, n Proceedngs of the 27th Conference on Uncertanty n Artfcal Intellgence,. [26] H. Kushner and G. Yn, Stochastc Approxmaton and Recursve Algorthms and Applcatons. Sprnger, 03. [27] Y. Lecun, F. J. Huang, and L. Bottou, Learnng methods for generc obect recognton wth nvarance to pose and lghtng, vol. 2, 04. [28] G. Montavon and K.-R. Müller, Deep Boltzmann machnes and the centerng trc, n eural etwors: Trcs of the trade, Reloaded, 2nd ed., ser. LCS, G. Montavon, G. B. Orr, and K.-R. Müller, Eds. Sprnger, 2, vol [29] K. Cho, T. Rao, A. Iln, and J. Karhunen, A Two-Stage Pretranng Algorthm for Deep Boltzmann Machnes, n IPS 2 Worshop on Deep Learnng and Unsupervsed Feature Learnng, Lae Tahoe, December 2. [] G. Desardns, R. Pascanu, A. Courvlle, and Y. Bengo, Metrc-free natural gradent for ont-tranng of boltzmann machnes, n Proceedngs of the Frst Internatonal Conference on Learnng Representatons (ICLR 3), 3.

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia Usng deep belef network modellng to characterze dfferences n bran morphometry n schzophrena Walter H. L. Pnaya * a ; Ary Gadelha b ; Orla M. Doyle c ; Crstano Noto b ; André Zugman d ; Qurno Cordero b,

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

Deep Learning: A Quick Overview

Deep Learning: A Quick Overview Deep Learnng: A Quck Overvew Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr http://mlg.postech.ac.kr/

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Sparse Gaussian Processes Using Backward Elimination

Sparse Gaussian Processes Using Backward Elimination Sparse Gaussan Processes Usng Backward Elmnaton Lefeng Bo, Lng Wang, and Lcheng Jao Insttute of Intellgent Informaton Processng and Natonal Key Laboratory for Radar Sgnal Processng, Xdan Unversty, X an

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Feature Selection & Dynamic Tracking F&P Textbook New: Ch 11, Old: Ch 17 Guido Gerig CS 6320, Spring 2013

Feature Selection & Dynamic Tracking F&P Textbook New: Ch 11, Old: Ch 17 Guido Gerig CS 6320, Spring 2013 Feature Selecton & Dynamc Trackng F&P Textbook New: Ch 11, Old: Ch 17 Gudo Gerg CS 6320, Sprng 2013 Credts: Materal Greg Welch & Gary Bshop, UNC Chapel Hll, some sldes modfed from J.M. Frahm/ M. Pollefeys,

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Second order approximations for probability models

Second order approximations for probability models Second order approxmatons for probablty models lbert Kappen Department of Bophyscs Njmegen Unversty Njmegen, the Netherlands bertmbfysunnl Wm Wegernc Department of Bophyscs Njmegen Unversty Njmegen, the

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1 Estmatng the Fundamental Matrx by Transformng Image Ponts n Projectve Space 1 Zhengyou Zhang and Charles Loop Mcrosoft Research, One Mcrosoft Way, Redmond, WA 98052, USA E-mal: fzhang,cloopg@mcrosoft.com

More information

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows: Supplementary Note Mathematcal bacground A lnear magng system wth whte addtve Gaussan nose on the observed data s modeled as follows: X = R ϕ V + G, () where X R are the expermental, two-dmensonal proecton

More information

Lecture 23: Artificial neural networks

Lecture 23: Artificial neural networks Lecture 23: Artfcal neural networks Broad feld that has developed over the past 20 to 30 years Confluence of statstcal mechancs, appled math, bology and computers Orgnal motvaton: mathematcal modelng of

More information

Probability Density Function Estimation by different Methods

Probability Density Function Estimation by different Methods EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT Probablty Densty Functon Estmaton by dfferent Methods Vas Chandraant Rayar Abstract The am of the assgnment was to estmate the probablty densty functon (PDF of

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Multi-Conditional Learning for Joint Probability Models with Latent Variables

Multi-Conditional Learning for Joint Probability Models with Latent Variables Mult-Condtonal Learnng for Jont Probablty Models wth Latent Varables Chrs Pal, Xueru Wang, Mchael Kelm and Andrew McCallum Department of Computer Scence Unversty of Massachusetts Amherst Amherst, MA USA

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN S. Chtwong, S. Wtthayapradt, S. Intajag, and F. Cheevasuvt Faculty of Engneerng, Kng Mongkut s Insttute of Technology

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

On the correction of the h-index for career length

On the correction of the h-index for career length 1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

1 Motivation and Introduction

1 Motivation and Introduction Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Statistical Foundations of Pattern Recognition

Statistical Foundations of Pattern Recognition Statstcal Foundatons of Pattern Recognton Learnng Objectves Bayes Theorem Decson-mang Confdence factors Dscrmnants The connecton to neural nets Statstcal Foundatons of Pattern Recognton NDE measurement

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

Why feed-forward networks are in a bad shape

Why feed-forward networks are in a bad shape Why feed-forward networks are n a bad shape Patrck van der Smagt, Gerd Hrznger Insttute of Robotcs and System Dynamcs German Aerospace Center (DLR Oberpfaffenhofen) 82230 Wesslng, GERMANY emal smagt@dlr.de

More information

AN IMPROVED PARTICLE FILTER ALGORITHM BASED ON NEURAL NETWORK FOR TARGET TRACKING

AN IMPROVED PARTICLE FILTER ALGORITHM BASED ON NEURAL NETWORK FOR TARGET TRACKING AN IMPROVED PARTICLE FILTER ALGORITHM BASED ON NEURAL NETWORK FOR TARGET TRACKING Qn Wen, Peng Qcong 40 Lab, Insttuton of Communcaton and Informaton Engneerng,Unversty of Electronc Scence and Technology

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Unified Subspace Analysis for Face Recognition

Unified Subspace Analysis for Face Recognition Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA

More information

arxiv:cs.cv/ Jun 2000

arxiv:cs.cv/ Jun 2000 Correlaton over Decomposed Sgnals: A Non-Lnear Approach to Fast and Effectve Sequences Comparson Lucano da Fontoura Costa arxv:cs.cv/0006040 28 Jun 2000 Cybernetc Vson Research Group IFSC Unversty of São

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

An adaptive SMC scheme for ABC. Bayesian Computation (ABC)

An adaptive SMC scheme for ABC. Bayesian Computation (ABC) An adaptve SMC scheme for Approxmate Bayesan Computaton (ABC) (ont work wth Prof. Mke West) Department of Statstcal Scence - Duke Unversty Aprl/2011 Approxmate Bayesan Computaton (ABC) Problems n whch

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Automatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models

Automatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models Automatc Object Trajectory- Based Moton Recognton Usng Gaussan Mxture Models Fasal I. Bashr, Ashfaq A. Khokhar, Dan Schonfeld Electrcal and Computer Engneerng, Unversty of Illnos at Chcago. Chcago, IL,

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Uncertainty and auto-correlation in. Measurement

Uncertainty and auto-correlation in. Measurement Uncertanty and auto-correlaton n arxv:1707.03276v2 [physcs.data-an] 30 Dec 2017 Measurement Markus Schebl Federal Offce of Metrology and Surveyng (BEV), 1160 Venna, Austra E-mal: markus.schebl@bev.gv.at

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

MATH 567: Mathematical Techniques in Data Science Lab 8

MATH 567: Mathematical Techniques in Data Science Lab 8 1/14 MATH 567: Mathematcal Technques n Data Scence Lab 8 Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 11, 2017 Recall We have: a (2) 1 = f(w (1) 11 x 1 + W (1) 12 x 2 + W

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Gaussian-Bernoulli Deep Boltzmann Machine

Gaussian-Bernoulli Deep Boltzmann Machine Gaussian-Bernoulli Deep Boltzmann Machine KyungHyun Cho, Tapani Raiko and Alexander Ilin Aalto University School of Science Department of Information and Computer Science Espoo, Finland {kyunghyun.cho,tapani.raiko,alexander.ilin}@aalto.fi

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

A New Evolutionary Computation Based Approach for Learning Bayesian Network

A New Evolutionary Computation Based Approach for Learning Bayesian Network Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 4026 4030 Advanced n Control Engneerng and Informaton Scence A New Evolutonary Computaton Based Approach for Learnng Bayesan Network Yungang

More information

Polynomial Regression Models

Polynomial Regression Models LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

On Autoencoders and Score Matching for Energy Based Models

On Autoencoders and Score Matching for Energy Based Models On Autoencoders and Score Matchng for Energy Based Models Kevn Swersky* kswersky@cs.ubc.ca Marc Aurelo Ranzato ranzato@cs.toronto.edu Davd Buchman* davdbuc@cs.ubc.ca Benjamn M. Marln* bmarln@cs.ubc.ca

More information

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys

More information

Using Immune Genetic Algorithm to Optimize BP Neural Network and Its Application Peng-fei LIU1,Qun-tai SHEN1 and Jun ZHI2,*

Using Immune Genetic Algorithm to Optimize BP Neural Network and Its Application Peng-fei LIU1,Qun-tai SHEN1 and Jun ZHI2,* Advances n Computer Scence Research (ACRS), volume 54 Internatonal Conference on Computer Networks and Communcaton Technology (CNCT206) Usng Immune Genetc Algorthm to Optmze BP Neural Network and Its Applcaton

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Neural Networks & Learning

Neural Networks & Learning Neural Netorks & Learnng. Introducton The basc prelmnares nvolved n the Artfcal Neural Netorks (ANN) are descrbed n secton. An Artfcal Neural Netorks (ANN) s an nformaton-processng paradgm that nspred

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Deep Learning. Boyang Albert Li, Jie Jay Tan

Deep Learning. Boyang Albert Li, Jie Jay Tan Deep Learnng Boyang Albert L, Je Jay Tan An Unrelated Vdeo A bcycle controller learned usng NEAT (Stanley) What do you mean, deep? Shallow Hdden Markov models ANNs wth one hdden layer Manually selected

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radal-Bass uncton Networs v.0 March 00 Mchel Verleysen Radal-Bass uncton Networs - Radal-Bass uncton Networs p Orgn: Cover s theorem p Interpolaton problem p Regularzaton theory p Generalzed RBN p Unversal

More information

A Particle Filter Algorithm based on Mixing of Prior probability density and UKF as Generate Importance Function

A Particle Filter Algorithm based on Mixing of Prior probability density and UKF as Generate Importance Function Advanced Scence and Technology Letters, pp.83-87 http://dx.do.org/10.14257/astl.2014.53.20 A Partcle Flter Algorthm based on Mxng of Pror probablty densty and UKF as Generate Importance Functon Lu Lu 1,1,

More information