An Effective Training Method For Deep Convolutional Neural Network

Size: px
Start display at page:

Download "An Effective Training Method For Deep Convolutional Neural Network"

Transcription

1 An Effectve Tranng Method For Deep Convoutona Neura Network Yang Jang 1*, Zeyang Dou 1*, Qun Hao 1,, Je Cao 1, Kun Gao 1, X Chen 3 1. Schoo of Optoeectronc, Bejng Insttute of Technoogy. Bejng, Chna, Graduate Schoo at Shenzhen, Tsnghua Unversty, Tsnghua Campus, The Unversty Town, Shenzhen, Chna, BGI Research, Beshan Industra Zone, Yantan Dstrct, Shenzhen, Chna, Correspondence: qhao@bt.edu.cn Abstract In ths paper, we propose the nonnearty generaton method to speed up and stabze the tranng of deep convoutona neura networks. The proposed method modfes a famy of actvaton functons as nonnearty generators (NGs). NGs make the actvaton functons near symmetrc for ther nputs to ower mode capacty, and automatcay ntroduce nonnearty to enhance the capacty of the mode durng tranng. The proposed method can be consdered an unusua form of reguarzaton: the mode parameters are obtaned by tranng a reatvey ow-capacty mode, that s reatvey easy to optmze at the begnnng, wth ony a few teratons, and these parameters are reused for the ntazaton of a hgher-capacty mode. We derve the upper and ower bounds of varance of the weght varaton, and show that the nta symmetrc structure of NGs heps stabze tranng. We evauate the proposed method on dfferent frameworks of convoutona neura networks over two object recognton benchmark tasks (CIFAR-10 and CIFAR-100). Expermenta resuts showed that the proposed method aows us to (1) speed up the convergence of tranng, () aow for ess carefu weght ntazaton, (3) mprove or at east mantan the performance of the mode at neggbe extra computatona cost, and (4) easy tran a very deep mode. Introducton Convoutona neura networks (CNNs) have enabed the rapd deveopment of a varety of appcatons, partcuary ones reated to vsua recognton tasks [He et a. 015, 016]. There are two major reasons for ths: the budng of more powerfu modes, and the deveopment of more effcent and robust strateges. On the one hand, recent deep earnng modes are becomng ncreasngy compex owng to ther ncreasng depth [He et a. 016, Smonyan 014, Szegedy 015] and wdth [Zeer 014, Sermanet et a. 013], and decreasng strdes [Sermanet et a. 013]; on the other hand, better generazaton performance s obtaned by usng varous reguarzaton technques [Ioffe 015, Ba and Samans 016], desgnng modes of varyng structure [He et a. 016, Huang et a. 016] and new nonnear actvaton functons [Goodfeow et a. 013, Agostne et a. 014, Esenach et a. 016]. However, most of the aforementoned technques foow the same underyng hypothess: the mode s hghy non-near because many nputs drecty fa nto the nonnear parts of the actvaton functons. Athough the hghy nonnear structure mproves mode capacty, t eads to dffcutes n tranng. Tranng probems have recenty been party addressed through carefuy constructed ntazatons [Mshkn 015, Samans 016] and batch normazaton (BN) [Ioffe 015]. However, these methods may ead to a oss of effcency n the tranng of very deep neura networks. Even though ResNets outperform the pan CNNs, they st requre a warmng up trck to tran a very deep network [He 016]. Thus the chaenges to tranng have not been competey soved. As a counterpart, deep neura networks wth near act-vaton functons, whch have reatvey ow mode capacty, are reatvey easy to optmze at the begnnng of tranng, as we w show. Thus, we need to strke and mantan a baance between the dffcutes n tranng and mode ca-pacty. Wth regard to the possbty of combnng the ad-vantages of owcapacty and hgh-capacty modes, one motvatng exampe s the mutgrd method [Bakhvaov 1966] used to acceerate the convergence of dfferenta equatons usng a herarchy of dscretzaton. The man dea underyng the mutgrd method s to speed up computaton by usng a coarse grd and nterpoatng a correcton computed from ths nto a fne grd. The mode *Authors contrbuted equay

2 wth ow capacty can be consdered a coarse approxmaton of the probem, and the hgh-capacty mode corresponds to a fne approxmaton. Smary, unsupervsed pre-tranng [Hnton 006] and pre-tranng wth shaow networks [Smonyan 014] correspond to the use of modes wth reatvey ow capacty to coarsey ft the tranng data. The pre-traned parameters are then transferred to the compex mode to carefuy ft the dataset. NG(f(x)) (a) (b) Fgure. 1 Exampes of ReLU, LReLU and PReLU and ther NG versons. (a) ReLU, LReLU and PReLU; (b) NG-ReLU, NG-LReLU and NG-PReLU, t= 1. Vaue Fg. The vaues of t n ayer 19 durng the tranng procedure. The mode s 56-ayer pan CNN. Above a, t makes sense to combne the advantages of modes of dfferent capacty by frst restrctng capacty to coarsey ft the probem, and then endowng the mode wth greater capacty durng the tranng procedure, hence enabng t to graduay perform better. Note that the symmetrc structures of actvaton functons aso make the tranng procedure stabe, as we w show. Therefore we modfy a famy of actvaton functons such as Recfed Lnear Unt (ReLU), Leaky ReLU (LReLU) and Parametrc ReLU (PReLU) by ntroducng a tranabe parameter t, whch we ca nonnearty generator (NG), to make the actvaton functons near symmetrc for the nputs at nta stage. NGs then ntroduce nonnearty durng the tranng procedure, endowng the mode wth greater capacty. The ntroduced parameter can be easy ncorporated nto the backpropagaton framework. The proposed method aows us to (1) speed up the convergence of tranng, () aow for ess carefu parameter ntazatons, (3) mprove or at east mantan the performance of convoutona neura network at neggbe extra computatona cost, and (4) easy tran a very deep mode. -1 t n ayer 19 1 Epoch 31 Epoch 61 Epoch 91 Epoch Kerne Index Nonnearty Generator Approach We defne the nonnearty generator (NG) as foows NG( f( x)) = f( x t) + t (1) Where f s an actvaton functon such as ReLU, LReLU and PReLU, x s the nput of actvaton functon on the th node, and t s a tranabe parameter controng the nearty of the generator gven the nput dstrbuton. Note that t s dfferent for each node, we ca equaton (1) the eement-wse verson. We aso consder a channe-wse varant: the dfferent nodes n the same channe share the same t. If t ncreases, we say that NG ntroduces greater nonnearty because ths ncreases the probabty that nputs of the actvaton functons fa nto the nonnear parts. If t s smaer than the mnmum nput, a nputs of the NG are n the near area, makng t a near symmetrc actvaton functon for the nputs. Fgure 1 compares the shapes of dfferent actvaton functons (ReLU, LReLU and PReLU) and ther NG versons (NG-ReLU, NG-LReLU and NG-PReLU). Gven a preprocessed nput mage and some proper weght ntazaton, the property of NG can guarantee that the mode s amost near at the nta stage. As w show n anayss, the capacty of ths mode s reatvey ow, makng tranng reatvey easy. Thus, dffcutes n tranng are aevated durng the nta teratons. t can be easy optmzed usng back propagaton agorthm. Consder the eement-wse verson for exampe; the dervatve of { t } s smpy foowed the chan rue ε ε f( x) = () t f( x) t whereε represents the objectve functon, and

3 Accuracy ReLU tran ReLU test NG-ReLU tran NG-ReLU test Accuracy LReLU tran LReLU test NG-LReLU tran NG-LReLU test (a) (b) (c) Fg. 3 Tranng and test accuracy comparsons of dfferent actvaton functons. (a) 56-ayer ResNet wth ReLUs and NG-ReLUs; (b) 56- ayer ResNet wth LReLUs and NG-LReLUs. (c) 56-ayer ResNet wth PReLUs and NG-PReLUs. Accuracy PReLU tran PReLU test NG-PReLU tran NG-PReLU test ReLU NG-ReLU ReLU NG-ReLU Varance Varance Layer Index Layer Index (a) (b) Fg. 4 Weght varance comparsons of every ayer. (a) 0- Layer pan CNN; (6) 56-ayer ResNet. f ( x ) 0 x = > t. (3) t 1 x t By usng gradent descent, the NG can tsef determne the degree of nonnearty for each ayer based on gradent nformaton durng the tranng process, endowng the mode wth greater capacty. Snce the experments have shown that the performances of the channe-wse and eement-wse versons are comparabe, we use the former because t ntroduces very sma number of extra parameters. We adopt the momentum method when updatng parameter t t = η t + γ ε. (4) t NG Acts as a Reguarzer Anayss Frst, we use a toy exampe to show that NG wth strong nonnearty mproves the capactes of modes. We used a smpe CNN wthout actvaton functons as a basene to ft the test dataset CIFAR-10. The mode had two convoutona ayers, one dense ayer wthout actvaton functons and a softmax ayer. Each convoutona ayer contaned three 3 3 kernes, and the oss functon was cross entropy. We used the stochastc gradent descend (SGD) optmzer to tran the mode. The nta earnng rate was 1, and we dvded t by 10 when the oss no onger decreased n vaue. The batch sze was 18, and we adopt the momentum of 0.9, the same smpe data augmentaton as [He et a. 016], and MSRA weght ntazaton [He et a. 015].We then added NG-ReLUs wth dfferent untranabe vaues of t for the two convoutona ayers and make comparsons wth the basene mode. Fnay, we ntay set t to -1 and made t tranabe to test tranng performance. Tabe 1 shows the maxmum tranng accuraces of the dfferent modes, where None means the basene mode wthout actvaton functons. We see that as t ncreased, so dd the maxmum tranng accuracy. For the NG wth tranabe t, the mean vaue ncreased to at the end of tranng, and thus the performance was between that of -0.5 and -5. Ths toy exampe shows that the nonnearty of NG changes mode capacty.

4 Tabe 1 Tranng accuracy comparsons t None Tranabe (%) Next, we expermentay show that CNN wth ess nonnear NG s reatvey easy to tran. Our goa s to expore the crtca depth, whch s the depth beyond whch the mode does not converge. The test modes were the pan CNNs wth NG- ReLUs, the dfferent parameters t of whch were untranabe. We then set t = 1 ntay, made t as a tranabe parameter and tested the crtca depth. The mode structure was the same as [He et a. 016] wthout BN, and the dataset was CIFAR- 10. We used the same tranng strategy except that Xaver ntazaton [Gorort 010] were used as the weght ntazaton for a the experments. Tabe shows the resuts. Tabe Crtca depth of CNN for dfferent t t Tranabe Crtca Depth We see that as t ncreased, the crtca depth decreased, ndcatng the ncreasng dffcuty n tranng. As dscussed prevousy, a greater vaue of t corresponds to hgher mode capacty, and thus modes wth ower capactes are reatvey easy to tran. Reca the propertes of NG, t ntay makes the mode amost near, and automatcay enhance the mode capacty durng the tranng. Thus, NG can be consdered an unusua form of reguarzaton durng the tranng procedure: parameters n the eary stages are obtaned by tranng a reatvey ow-capacty mode for ony a few teratons, and these parameters are reused for the ntazaton of a hgher-capacty mode. As a the nputs of NG were n the near area usng a proper vaue of t, the mode was amost near n the nta stage; as the tranng contaned, t was updated based on gradent nformaton, hence endowng the mode wth greater capacty. We extracted t from the 56-ayer pan CNN to see how t changed durng the tranng procedure. Fgure shows the vaue of t n ayer 19 for dfferent epochs. We see that as the tranng proceeded, t became ncreasngy oscatory, ncreasng the capacty of the mode. An advantage of ths strategy s that the mode s ess key to overft n the eary stage of the tranng procedure because we restrct the parameters to partcuar regons at frst, foowng whch they are expanded graduay to seek better parameters. As seen n Fgure 3, athough the tranng accuraces of ResNets wth ReLUs, LReLUs and PReLUs ncreased steady, ther test accuracy oscated. By comparson, the test accuraces of ResNets wth NG-ReLUs, NG-LReLUs and NG-PReLUs were far more stabe. Symmetrc Structure of Actvaton Functon Affects Tranng procedure Many researchers have shown that the approxmatey symmetrc structure of actvaton functons can speed up earnng because t aevates the mean shft probem. Mean shft correcton pushes the off-dagona bock of Fsher Informaton Matrx cose to zero [He et a. 015], renderng the SGD sover coser to the natura gradent descent [Cevert 015]. In ths subsecton, we further show that varance of weght varaton may nfuence the tranng as we. We have the foowng theorem. Theorem 1: Gven a fuy-connected neura network wth M ayers, et W and Z be the weght and the nput of ayer, S = W Z, then varance of the weght varaton has the foowng ower and upper bounds E ( ε ) Var( Z ) Var( W ) ( Var( ε ) E ( Z ) Var( Z ) Var( ε η η + ) + Var( Z ) E ( ε )) (5) S S S S where η, W and ε represent the earnng rate, the weght varaton n ayer and the objectve functon respectvey. Proof: Pease see the appendx. Ths theorem shows the varance of weght varaton n ayer s cosey reated to the expectaton of the actvatons E( Z ) and gradents E( ε S ). The asymmetrc structure of the actvaton functon may cause mean shft probem [Cevert 015]. Mean shft caused by prevous ayer acts as bas for the next ayer. The more the unts are correated, the hgher ther mean shft [Cevert 015]. The shft of actvaton expectatons n turn may affect the expectaton of the gradent. From theorem 1, the mean shft probem makes the upper and ower bounds of varance Var( W) n dfferent ayers unstabe, thus they rase the nstabty of the varaton of the weghts n dfferent ayers. The unstabe varaton of the weghts woud resut n unstabe

5 ReLU-MSRA ReLU-Xaver ReLU-Orthogona NG-ReLU-MSRA NG-ReLU-Xaver NG-ReLU-Orthogona LReLU-MSRA LReLU-Xaver LReLU-Orthogona NG-LReLU-MSRA NG-LReLU-Xaver NG-LReLU-Orthogona (a) (b) (c) ReLU-MSRA ReLU-Xaver ReLU-Orthogona NG-ReLU-MSRA NG-ReLU-Xaver NG-ReLU-Orthogona LReLU-MSRA LReLU-Xaver LReLU-Orthogona NG-LReLU-MSRA NG-LReLU-Xaver NG-LReLU-Orthogona (d) (e) (f) PReLU-MSRA PReLU-Xaver PReLU-Orthogona NG-PReLU-MSRA NG-PReLU-Xaver NG-PReLU-Orthogona PReLU-MSRA PReLU-Xaver PReLU-Orthogona NG-PReLU-MSRA NG-PReLU-Xaver NG-PReLU-Orthogona SELU-MSRA SELU-Xaver SELU-Orthogona NG-ReLU-MSRA NG-ReLU-Xaver NG-ReLU-Orthogona NG-LReLU-MSRA NG-LReLU-Xaver NG-LReLU-Orthogona NG-PReLU-MSRA NG-PReLU-Xaver NG-PReLU-Orthogona (g) (h) Fgure 5. Learnng behavor comparsons wth dfferent ntazatons on CIFAR-10. (a), (b), (c), (g) pan CNNs; (d), (e), (f), (h) ResNets. weght varance durng the tranng, whch hampers nformaton fow [Gorot 010], rasng the tranng probems when we tran a very deep mode. To stabze the tranng procedure, we want to keep the two bounds stabe. Because the proposed NG s symmetrc wth respect to the orgna pont for the near area, t can pu E( Z ) and E( ε S ) cose to zero, makng the tranng procedure more stabe. Fgure 4 shows the comparsons of weght varance for the 0-ayer pan CNNs and 56- ayer ResNets. Compared wth ReLU, the weght varance usng NG-ReLU was more stabe, whch supports our anayss above. SELU-MSRA SELU-Xaver SELU-Orthogona NG-ReLU-MSRA NG-ReLU-Xaver NG-ReLU-Orthogona NG-LReLU-MSRA NG-LReLU-Xaver NG-LReLU-Orthogona NG-PReLU-MSRA NG-PReLU-Xaver NG-PReLU-Orthogona Experments We tested our method on two modes (pan CNN wthout BN and ResNet, whose structures were the same as [He et a. 016] and reported n the appendx). Our GPU was a GTX1080-t. We focused on a far performance comparsons of the modes, not on state-of-the-art resuts. Thus, we used the same strateges durng tranng. We modfed the actvaton functons of the mode,.e., ReLU, LReLU, and PReLU, as NG-ReLU, NG-LReLU, and NG-PReLU, respectvey, and compare the mode performances wth ther counterparts and the scaed exponenta near unts (SELU). The datasets are CIFAR-10 (coor mages n 10 casses, 50k tran and 10k test) and CIFAR-100 (coor mages n 100 casses, 50k tran and 10k test). We used the same smpe data augmentaton method as [He et a. 016] for CIFAR-10 and CIFAR-100. t were set to -1 for a experments. For pan CNNs, the nta earnng rate was 1, and we dvded t by 10 durng the tranng procedure when the test error no onger decreased. For ResNets, the nta earnng rate was 0.1, and we dvded t by 10 at 80, 10, and 160 epochs, and termnated tranng at 00 epochs. For both modes, we used a batch sze of 18 and a weght decay of 005, and a

6 momentum of 0.9. Note that we dd not use other technques, such as dropout [Srvastava et a. 014], to reduce the effect of the random factors on the expermenta resuts to a mnmum. Learnng Behavor We evauated our method on the pan CNNs and ResNets over CIFAR-10. The test modes were 0-ayer pan CNNs and 56-ayer ResNets wth dfferent actvaton functons,.e. ReLUs, LReLUs, PReLUs, ther NG versons (NG-ReLUs, NG- LReLUs, NG-PReLUs) and SELUs. Three parameter ntazaton methods Xaver ntazaton, MSRA ntazaton, and orthogona ntazaton [Saxe et a. 013] were used to test the behavor of the modes. Fg. 5 shows the tranng accuracy comparsons of dfferent modes. We see the modes wth our method were nsenstve to the dfferent ntazatons, whe the other modes were not. In addton, the modes wth NGs converged much faster than ther counterparts n a cases. For ResNets, our methods were st robust to the ntazatons and converged much faster than ther counterparts, whe the other modes were reatvey senstve to the ntazatons. Mode Performance We tested and compared varous modes, ncudng 44-ayer, 56-ayer pan CNNs and ResNets, on CIFAR-10 and CIFAR- 100 datasets. MSRA ntazaton was used. Note that pan CNNs wth ReLUs, LReLUs, PReLUs and SELUs dd not converge when the depths were more than 0 ayers, whereas modes wth NGs st converged. Tabe 3 shows the test accuracy comparsons. Athough our method frst traned the modes wth the reatvey ow capacty, they st resuted better or at east comparabe performances compared wth ther counterparts and SELUs n most cases. Tabe 3 Test accuracy comparsons on dfferent datasets. - means not convergent Mode Dataset ReLU LReLU PReLU SELU NG-ReLU NG-LReLU NG-PReLU Pan-44 CIFAR % 88.44% 89.03% Pan-56 CIFAR % 87.56% 87.95% ResNet-44 CIFAR % 9.71% 93.3% 9.66% 93.48% 93.18% 93.33% ResNet-56 CIFAR % 9.18% 93.8% 9.57% 93.90% 93.56% 93.49% Pan-44 CIFAR % 61.5% 61.17% Pan-56 CIFAR % 58.30% 61.38% ResNet-44 CIFAR % 71.9% 79% 71.11% 71.8% 71.69% 71.99% ResNet-56 CIFAR % 71.34% 7.3% 79% 71.90% 7.06% 7.8% Our method can be used wth BN to further mprove the performance. We used CIFAR-10 as a test dataset and tested the performance of the pan CNNs wth BN usng ReLUs and NG-ReLUs, as shown n tabe 4. The test accuracy usng NG- ReLUs wth BN was the hghest n a experments. Pease see the appendx for more comparsons about the pan CNNs wth BN. Tabe 4 Test accuracy comparsons of pan CNNs wth BN Mode depth ReLU wth BN NG-ReLU NG-ReLU wth BN % 89.58% 89.98% % 88.7% 89.59% We aso used a 0-ayer wder ResNet, the structure of whch s shown n the appendx, to test performance. The fna test accuracy was 95.43%, better than the 1001-ayer pre-actvaton ResNet n [He et a. 016]. Athough NG ntroduces new tranng parameters, the extra computatona cost s sma. For 56-ayer pan CNN wth BN, NG-ReLUs took 7 seconds on average to run an epoch, whereas ReLUs and PReLUs take 68 and 77 seconds on average. For 56-ayer ResNet, NG-ReLUs take 94 seconds on average per epoch, whereas ReLUs and PReLUs took 87 seconds and 100 seconds on average per epoch. Exporng Very Deep Mode We expored the crtca depths of the pan CNNs wth NGs. The mode depth was graduay ncreased by ntegra mutpes of 1 ayers, whch s the same as [he et a. 016]. The test dataset was CIFAR-10. Tabe 5 shows the resuts of three weght ntazatons. The crtca depths for our method were much greater.

7 Tabe 5 Crtca depth of dfferent mode Mode Crtca depth SELU-MSRA <0 SELU-Xaver 44 SELU-Orthogona 68 ReLU-MSRA 0 ReLU-Xaver <0 ReLU-Orthogona <0 LReLU-MSRA 0 LReLU-Xaver <0 LReLU-Orthogona <0 PReLU-MSRA 0 PReLU-Xaver 0 PReLU-Orthogona 0 NG-ReLU-MSRA 80 NG-ReLU-Xaver 116 NG-ReLU-Orthogona 15 NG-LReLU-MSRA 68 NG-LReLU-Xaver 104 NG-LReLU-Orthogona 18 NG-PReLU-MSRA 18 NG-PReLU-Xaver 18 NG-PReLU-Orthogona 15 Concuson In ths paper, we proposed the nonnearty generaton method that begns tranng wth a reatvey ow-capacty mode and graduay mproves mode capacty. The proposed tranng method modfes a famy of actvaton functons by ntroducng a tranabe parameter t to make the actvaton functons near symmetrc for the nputs, whch makes the mode wth reatvey ow capacty and easy to optmze at the begnnng. Nonnearty s then ntroduced automatcay durng the tranng procedure to endow the mode wth greater capacty. The ntroduced parameters can be easy ncorporated nto tranng. The proposed method can be consdered an unusua form of reguarzaton durng the tranng procedure: The parameters n the eary stages are obtaned by tranng a reatvey ow-capacty mode for ony a few teratons, and are reused for the ntazaton of a hgher-capacty mode. We derved the upper and ower bounds of the varance for weght varaton and showed that the symmetrc structure of NGs heps stabze tranng. Experments showed that the proposed method speeds up the convergence of tranng, aows for ess carefu ntazaton, mproves or at east mantans the performance of CNNs at neggbe extra computatona cost and can be used wth BN to further mprove the performance. Fnay, we can tran a very deep mode wth the proposed method easy. References Agostne F, Hoffman M, Sadowsk P, et a Learnng actvaton functons to mprove deep neura networks. arxv preprnt arxv: Ba J L, Kros J R, Hnton G E. Layer normazaton arxv preprnt arxv: Bakhvaov N S On the convergence of a reaxaton method wth natura constrants on the eptc operator. USSR Computatona Mathematcs and Mathematca Physcs, 6(5): Cevert D A, Unterthner T, Hochreter S Fast and accurate deep network earnng by exponenta near unts (eus). arxv preprnt arxv: Esenach C, Wang Z, Lu H Nonparametrcay Learnng Actvaton Functons n Deep Neura Nets. ICLR Gorot X, Bengo Y Understandng the dffcuty of tranng deep feedforward neura networks. In Proceedngs of the Thrteenth Internatona Conference on Artfca Integence and Statstcs.: Goodfeow I J, Warde-Farey D, Mrza M, et a Maxout net-works. arxv preprnt arxv:

8 He K, Zhang X, Ren S, et a Devng deep nto rectfers: Sur-passng human-eve performance on magenet cassfcaton. In Proceedngs of the IEEE nternatona conference on computer vson.: He K, Zhang X, Ren S, et a Deep resdua earnng for mage recognton. In Proceedngs of the IEEE conference on computer vson and pattern recognton.: He K, Zhang X, Ren S, et a Identty mappngs n deep resdua networks. In Proceedngs of European Conference on Computer Vson Sprnger. Hnton G E, Saakhutdnov R R Reducng the dmensonaty of data wth neura networks. scence, 313(5786): Huang G, Lu Z, Wenberger K Q, et a. Densey connected convoutona networks. arxv preprnt arxv: , 016. Ioffe S, Szegedy C Batch normazaton: Acceeratng deep network tranng by reducng nterna covarate shft. In Proceedngs of Internatona Conference on Machne Learnng.: LeCun Y A, Bottou L, Orr G B, et a. 01. Effcent back-prop.neura networks: Trcks of the trade. Sprnger Bern He-deberg: Mshkn D, Matas J A you need s a good nt. arxv preprnt arxv: Saxe A M, McCeand J L, Gangu S Exact soutons to the nonnear dynamcs of earnng n deep near neura networks. arxv preprnt arxv: Samans T, Kngma D P Weght normazaton: A smpe reparameterzaton to acceerate tranng of deep neura networks. In Proceedngs of Advances n Neura Informaton Processng Systems.: Sermanet P, Egen D, Zhang X, et a. 013.Overfeat: Integrated recognton, ocazaton and detecton usng convoutona net-works. arxv preprnt arxv: Smonyan K, Zsserman A Very deep convoutona networks for arge-scae mage recognton. arxv preprnt arxv: , 4. Srvastava N, Hnton G E, Krzhevsky A, et a Dropout: a smpe way to prevent neura networks from overfttng. Journa of machne earnng research, 15(1): Szegedy C, Lu W, Ja Y, et a Gong deeper wth convou-tons. In Proceedngs of the IEEE conference on computer vson and pattern recognton.: 1-9. Zeer M D, Fergus R Vsuazng and understandng convoutona networks. In Proceedngs of European conference on computer vson.: Sprnger 1. Proof of Theorem 1 Appendx Proof of theorem 1: Reca the weght update formua for ayer, w, j = η ε zj. s Where, j denote the th and j th nodes from ayer and ayer 1 respectvey. n 1 s denotes w, jzj, and zj= f( s j ). f() s the actvaton functon. w, j s the weght varaton of w, j. Then the varance of the weght varaton s: η ε 1 ε Var( w ) = ( z ( z )) m n m n j k mn j s mn t k st where m and n are the number of the nodes n ayer +1and ayer respectvey. Usng the foowng nequaty n n 1 ( ) a a n We have n m m n η ε 1 ε Var( w ) ( z j ( zk)) mn s n s m n j t k η ε = ( ) ( ) n m zj z j s Var z E ε S = η ( ) ( ) t j

9 Where z 1 n = zj. On the other hand, by denotng n j s End the proof.. Structure detas of the test mode ε as 1 m ε, we have m s η ε ε Var( w )= ( ( z z z ) z ) mn s s m n j + j = ( ( ) ( ) ) η m n z ε ε + z j z ε mn j s s s ( ) ( ε ε η ) + ( ) ( ε ) m n m z η zj z m s s n j m s ε ε ε = + + S S S η ( Var( ) E ( Z ) Var( Z ) Var( ) Var( Z ) E ( )). Pan CNN: The pan CNNs are 44, 56 ayers wth the structure shown n tabe 1. Tabe. 1 Pan CNN structure. [ m, n] z denotes the convoutona ayer wth kerne sze m, number of the kernes n and repeatng ths convouton ayer z tmes. 44-ayer CNN 56-ayer CNN Output sze nput 3 3 [3,16] [16] 14 [16] Max Poong [3] 14 [3] Max Poong [64] 14 [64] Goba Average Poong Softmax ResNet: ResNets are 0, 44, 56 ayers wth the structure shown n tabe. Tabe. ResNet structure. [Bock, mn, ] z denotes the dentty bock, whch s same the bock used n CIFAR-10 as [He et a. 016], wth kerne sze m, number of the kernes n and repeatng ths dentty bock z tmes, downsampeng s performed wth a strde of 44-ayer ResNet 56-ayer ResNet 0-ayer wder ResNet Output sze [Bock, 3,, [Bock, 3,, [Bock, 3, [Bock 3, [Bock 3, Input 3 3 [16] 1 [18] ] 7 [Bock, 16] 9 [Bock, 18] ] 1, strde [Bock, 3] 1, strde [Bock, 56] 1, strde [Bock, 56] 3 3] 6 [B ock, 3] ] 1, strde [Bock, 64] 1, strde [Bock, 51] 1, strde 8 8 [Bock, 51] 3 64] 6 [Bock, 64] 8 Goba Average poong Softmax

10 3. Experments for pan CNN wth batch normazaton 3.1 Learnng Behavor We tested the 56-ayer pan CNNs wth batch normazaton (BN) and made comparsons on CIFAR10 dataset. We used the same tranng strateges as the paper. From Fg. 1 we saw that smar to the experments of the pan CNNs wthout BN, NGs were nsenstve to the weght ntazatons, whe the other modes processed reatvey dfferent behavors wth dfferent ntazatons. In addton, the modes wth NGs converged faster than ther counterparts n the most cases. NG aso enabes arger earnng rate (r), as shown n Fg., the modes wth NG-ReLUs converge wth very arge rs, whe the modes wth ReLUs do not. (a) ReLU-MSRA ReLU-Xaver ReLU-Orthogona NG-ReLU-MSRA NG-ReLU-Xaver NG-ReLU-Orthogona (b) LReLU-MSRA LReLU-Xaver LReLU-Orthogona NG-LReLU-MSRA NG-LReLU-Xaver NG-LReLU-Orthogona PReLU-MSRA PReLU-Xaver PReLU-Orthogona NG-PReLU-MSRA NG-PReLU-Xaver NG-PReLU-Orthogona (c) (d) Fgure. 1 Learnng behavor comparsons wth dfferent ntazatons on CIFAR SELU-MSRA SELU-Xaver SELU-Orthogona NG-ReLU-MSRA NG-ReLU-Xaver NG-ReLU-Orthogona NG-LReLU-MSRA NG-LReLU-Xaver NG-LReLU-Orthogona NG-PReLU-MSRA NG-PReLU-Xaver 0 r=1 0 NG 0 ReLU Epoches Epoches (a) (b) Fgure. Learnng behavors of 56-ayer pan CNNs wth BN usng dfferent earnng rates. (a), r = 1; (b) r = r=1.5 0 NG 0 ReLU

11 We extracted the weght varance of every ayer for the 56-ayer pan CNNs and made comparsons. Fgure 3 shows the comparsons. Compared wth ReLUs, the weght varance usng NG-ReLUs s more stabe, whch supports our anayss n secton 3. of the paper. Varance Layer ndex Fgure. 3 Weght varance comparson of every ayer for the pan CNNs. The test mode s 56-ayer pan CNN wth BN. 3. Mode Performance We tested and compared 44-ayer and 56-ayer pan CNNs on CIFAR-10 and SVHN datasets. MSRA ntazaton was used. Tabe 3 shows the test accuracy comparsons. Smar to the resuts of the pan CNNs wthout BN, NGs perform sghty better or comparabe resuts. NG ReLU Tabe 3 Test accuracy comparsons on dfferent datasets. Mode Dataset ReLU LReLU PReLU SELU NG-ReLU NG-LReLU NG-PReLU pan-44 CNN CIFAR % 89.01% 89.3% 89.09% 89.48% 89.06% 89.70% pan-56 CNN CIFAR % 86.0% 89.13% 88.30% 89.40% 88.90% 89.% pan-44 CNN SVHN 95.99% 95.80% 96.30% 95.39% 96.07% 95.91% 96.31% pan-56 CNN SVHN 95.79% 95.50% 96.14% 95.59% 96.1% 95.84% 96.10% 3.3 Exporng Very Deep Mode Fnay, we expored the crtca depths of the pan CNNs wth BN. The test dataset was CIFAR-10.We coud not test the earnng behavors of the pan CNNs wth more than 48 ayers because of GPU memory constrants. So the argest depth was 48 ayers n ths subsecton. Tabe 4 shows the resuts of ReLU and NG-ReLU wth three weght ntazatons. The crtca depths NG-ReLU are much greater than ReLU. Tabe 4 Crtca depth of dfferent mode Mode Crtca depth SELU-MSRA 4 SELU-Xaver 48 SELU-Orthogona 36 ReLU-MSRA 104 ReLU-Xaver 80 ReLU-Orthogona 116 LReLU-MSRA 80 LReLU-Xaver 80 LReLU-Orthogona 116 PReLU-MSRA 00 PReLU-Xaver 188 PReLU-Orthogona 4 NG-ReLU-MSRA 48 NG-ReLU-Xaver 48 NG-ReLU-Orthogona 36 NG-LReLU-MSRA 48 NG-LReLU-Xaver 48

12 NG-LReLU-Orthogona 48 NG-PReLU-MSRA 48 NG-PReLU-Xaver 48 NG-PReLU-Orthogona 48

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory Advanced Scence and Technoogy Letters Vo.83 (ISA 205), pp.60-65 http://dx.do.org/0.4257/ast.205.83.2 Research on Compex etworks Contro Based on Fuzzy Integra Sdng Theory Dongsheng Yang, Bngqng L, 2, He

More information

Neural network-based athletics performance prediction optimization model applied research

Neural network-based athletics performance prediction optimization model applied research Avaabe onne www.jocpr.com Journa of Chemca and Pharmaceutca Research, 04, 6(6):8-5 Research Artce ISSN : 0975-784 CODEN(USA) : JCPRC5 Neura networ-based athetcs performance predcton optmzaton mode apped

More information

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks Shengyang Sun, Changyou Chen, Lawrence Carn Suppementary Matera: Learnng Structured Weght Uncertanty n Bayesan Neura Networks Shengyang Sun Changyou Chen Lawrence Carn Tsnghua Unversty Duke Unversty Duke

More information

Image Classification Using EM And JE algorithms

Image Classification Using EM And JE algorithms Machne earnng project report Fa, 2 Xaojn Sh, jennfer@soe Image Cassfcaton Usng EM And JE agorthms Xaojn Sh Department of Computer Engneerng, Unversty of Caforna, Santa Cruz, CA, 9564 jennfer@soe.ucsc.edu

More information

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory Proceedngs of the 2009 IEEE Internatona Conference on Systems Man and Cybernetcs San Antono TX USA - October 2009 Mutspectra Remote Sensng Image Cassfcaton Agorthm Based on Rough Set Theory Yng Wang Xaoyun

More information

Associative Memories

Associative Memories Assocatve Memores We consder now modes for unsupervsed earnng probems, caed auto-assocaton probems. Assocaton s the task of mappng patterns to patterns. In an assocatve memory the stmuus of an ncompete

More information

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students.

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students. Exampe: Suppose we want to bud a cassfer that recognzes WebPages of graduate students. How can we fnd tranng data? We can browse the web and coect a sampe of WebPages of graduate students of varous unverstes.

More information

A finite difference method for heat equation in the unbounded domain

A finite difference method for heat equation in the unbounded domain Internatona Conerence on Advanced ectronc Scence and Technoogy (AST 6) A nte derence method or heat equaton n the unbounded doman a Quan Zheng and Xn Zhao Coege o Scence North Chna nversty o Technoogy

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

MARKOV CHAIN AND HIDDEN MARKOV MODEL

MARKOV CHAIN AND HIDDEN MARKOV MODEL MARKOV CHAIN AND HIDDEN MARKOV MODEL JIAN ZHANG JIANZHAN@STAT.PURDUE.EDU Markov chan and hdden Markov mode are probaby the smpest modes whch can be used to mode sequenta data,.e. data sampes whch are not

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Supervised Learning. Neural Networks and Back-Propagation Learning. Credit Assignment Problem. Feedforward Network. Adaptive System.

Supervised Learning. Neural Networks and Back-Propagation Learning. Credit Assignment Problem. Feedforward Network. Adaptive System. Part 7: Neura Networ & earnng /2/05 Superved earnng Neura Networ and Bac-Propagaton earnng Produce dered output for tranng nput Generaze reaonaby & appropratey to other nput Good exampe: pattern recognton

More information

Predicting Model of Traffic Volume Based on Grey-Markov

Predicting Model of Traffic Volume Based on Grey-Markov Vo. No. Modern Apped Scence Predctng Mode of Traffc Voume Based on Grey-Marov Ynpeng Zhang Zhengzhou Muncpa Engneerng Desgn & Research Insttute Zhengzhou 5005 Chna Abstract Grey-marov forecastng mode of

More information

On the Power Function of the Likelihood Ratio Test for MANOVA

On the Power Function of the Likelihood Ratio Test for MANOVA Journa of Mutvarate Anayss 8, 416 41 (00) do:10.1006/jmva.001.036 On the Power Functon of the Lkehood Rato Test for MANOVA Dua Kumar Bhaumk Unversty of South Aabama and Unversty of Inos at Chcago and Sanat

More information

Application of support vector machine in health monitoring of plate structures

Application of support vector machine in health monitoring of plate structures Appcaton of support vector machne n heath montorng of pate structures *Satsh Satpa 1), Yogesh Khandare ), Sauvk Banerjee 3) and Anrban Guha 4) 1), ), 4) Department of Mechanca Engneerng, Indan Insttute

More information

Application of Particle Swarm Optimization to Economic Dispatch Problem: Advantages and Disadvantages

Application of Particle Swarm Optimization to Economic Dispatch Problem: Advantages and Disadvantages Appcaton of Partce Swarm Optmzaton to Economc Dspatch Probem: Advantages and Dsadvantages Kwang Y. Lee, Feow, IEEE, and Jong-Bae Par, Member, IEEE Abstract--Ths paper summarzes the state-of-art partce

More information

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION European Journa of Mathematcs and Computer Scence Vo. No. 1, 2017 ON AUTOMATC CONTNUTY OF DERVATONS FOR BANACH ALGEBRAS WTH NVOLUTON Mohamed BELAM & Youssef T DL MATC Laboratory Hassan Unversty MORO CCO

More information

Sparse Training Procedure for Kernel Neuron *

Sparse Training Procedure for Kernel Neuron * Sparse ranng Procedure for Kerne Neuron * Janhua XU, Xuegong ZHANG and Yanda LI Schoo of Mathematca and Computer Scence, Nanng Norma Unversty, Nanng 0097, Jangsu Provnce, Chna xuanhua@ema.nnu.edu.cn Department

More information

The Application of BP Neural Network principal component analysis in the Forecasting the Road Traffic Accident

The Application of BP Neural Network principal component analysis in the Forecasting the Road Traffic Accident ICTCT Extra Workshop, Bejng Proceedngs The Appcaton of BP Neura Network prncpa component anayss n Forecastng Road Traffc Accdent He Mng, GuoXucheng &LuGuangmng Transportaton Coege of Souast Unversty 07

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

WAVELET-BASED IMAGE COMPRESSION USING SUPPORT VECTOR MACHINE LEARNING AND ENCODING TECHNIQUES

WAVELET-BASED IMAGE COMPRESSION USING SUPPORT VECTOR MACHINE LEARNING AND ENCODING TECHNIQUES WAVELE-BASED IMAGE COMPRESSION USING SUPPOR VECOR MACHINE LEARNING AND ENCODING ECHNIQUES Rakb Ahmed Gppsand Schoo of Computng and Informaton echnoogy Monash Unversty, Gppsand Campus Austraa. Rakb.Ahmed@nfotech.monash.edu.au

More information

Nested case-control and case-cohort studies

Nested case-control and case-cohort studies Outne: Nested case-contro and case-cohort studes Ørnuf Borgan Department of Mathematcs Unversty of Oso NORBIS course Unversty of Oso 4-8 December 217 1 Radaton and breast cancer data Nested case contro

More information

COXREG. Estimation (1)

COXREG. Estimation (1) COXREG Cox (972) frst suggested the modes n whch factors reated to fetme have a mutpcatve effect on the hazard functon. These modes are caed proportona hazards (PH) modes. Under the proportona hazards

More information

The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1

The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1 Abstract The Entre Souton Path for Support Vector Machne n Postve and Unabeed Cassfcaton 1 Yao Lmn, Tang Je, and L Juanz Department of Computer Scence, Tsnghua Unversty 1-308, FIT, Tsnghua Unversty, Bejng,

More information

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques

Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques Energes 20, 4, 73-84; do:0.3390/en40073 Artce OPEN ACCESS energes ISSN 996-073 www.mdp.com/journa/energes Short-Term Load Forecastng for Eectrc Power Systems Usng the PSO-SVR and FCM Custerng Technques

More information

Cyclic Codes BCH Codes

Cyclic Codes BCH Codes Cycc Codes BCH Codes Gaos Feds GF m A Gaos fed of m eements can be obtaned usng the symbos 0,, á, and the eements beng 0,, á, á, á 3 m,... so that fed F* s cosed under mutpcaton wth m eements. The operator

More information

Lower Bounding Procedures for the Single Allocation Hub Location Problem

Lower Bounding Procedures for the Single Allocation Hub Location Problem Lower Boundng Procedures for the Snge Aocaton Hub Locaton Probem Borzou Rostam 1,2 Chrstoph Buchhem 1,4 Fautät für Mathemat, TU Dortmund, Germany J. Faban Meer 1,3 Uwe Causen 1 Insttute of Transport Logstcs,

More information

MATH 567: Mathematical Techniques in Data Science Lab 8

MATH 567: Mathematical Techniques in Data Science Lab 8 1/14 MATH 567: Mathematcal Technques n Data Scence Lab 8 Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 11, 2017 Recall We have: a (2) 1 = f(w (1) 11 x 1 + W (1) 12 x 2 + W

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

Training Convolutional Neural Networks

Training Convolutional Neural Networks Tranng Convolutonal Neural Networks Carlo Tomas November 26, 208 The Soft-Max Smplex Neural networks are typcally desgned to compute real-valued functons y = h(x) : R d R e of ther nput x When a classfer

More information

A DIMENSION-REDUCTION METHOD FOR STOCHASTIC ANALYSIS SECOND-MOMENT ANALYSIS

A DIMENSION-REDUCTION METHOD FOR STOCHASTIC ANALYSIS SECOND-MOMENT ANALYSIS A DIMESIO-REDUCTIO METHOD FOR STOCHASTIC AALYSIS SECOD-MOMET AALYSIS S. Rahman Department of Mechanca Engneerng and Center for Computer-Aded Desgn The Unversty of Iowa Iowa Cty, IA 52245 June 2003 OUTLIE

More information

Greyworld White Balancing with Low Computation Cost for On- Board Video Capturing

Greyworld White Balancing with Low Computation Cost for On- Board Video Capturing reyword Whte aancng wth Low Computaton Cost for On- oard Vdeo Capturng Peng Wu Yuxn Zoe) Lu Hewett-Packard Laboratores Hewett-Packard Co. Pao Ato CA 94304 USA Abstract Whte baancng s a process commony

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

L-Edge Chromatic Number Of A Graph

L-Edge Chromatic Number Of A Graph IJISET - Internatona Journa of Innovatve Scence Engneerng & Technoogy Vo. 3 Issue 3 March 06. ISSN 348 7968 L-Edge Chromatc Number Of A Graph Dr.R.B.Gnana Joth Assocate Professor of Mathematcs V.V.Vannaperuma

More information

G : Statistical Mechanics

G : Statistical Mechanics G25.2651: Statstca Mechancs Notes for Lecture 11 I. PRINCIPLES OF QUANTUM STATISTICAL MECHANICS The probem of quantum statstca mechancs s the quantum mechanca treatment of an N-partce system. Suppose the

More information

Xin Li Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, CHINA

Xin Li Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, CHINA RESEARCH ARTICLE MOELING FIXE OS BETTING FOR FUTURE EVENT PREICTION Weyun Chen eartment of Educatona Informaton Technoogy, Facuty of Educaton, East Chna Norma Unversty, Shangha, CHINA {weyun.chen@qq.com}

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

NONLINEAR SYSTEM IDENTIFICATION BASE ON FW-LSSVM

NONLINEAR SYSTEM IDENTIFICATION BASE ON FW-LSSVM Journa of heoretca and Apped Informaton echnoogy th February 3. Vo. 48 No. 5-3 JAI & LLS. A rghts reserved. ISSN: 99-8645 www.jatt.org E-ISSN: 87-395 NONLINEAR SYSEM IDENIFICAION BASE ON FW-LSSVM, XIANFANG

More information

A Hybrid Learning Algorithm for Locally Recurrent Neural Networks

A Hybrid Learning Algorithm for Locally Recurrent Neural Networks Contemporary Engneerng Scences, Vo. 11, 2018, no. 1, 1-13 HIKARI Ltd, www.m-hkar.com https://do.org/10.12988/ces.2018.711194 A Hybrd Learnng Agorthm for Locay Recurrent Neura Networks Dmtrs Varsams and

More information

22.51 Quantum Theory of Radiation Interactions

22.51 Quantum Theory of Radiation Interactions .51 Quantum Theory of Radaton Interactons Fna Exam - Soutons Tuesday December 15, 009 Probem 1 Harmonc oscator 0 ponts Consder an harmonc oscator descrbed by the Hamtonan H = ω(nˆ + ). Cacuate the evouton

More information

Optimum Selection Combining for M-QAM on Fading Channels

Optimum Selection Combining for M-QAM on Fading Channels Optmum Seecton Combnng for M-QAM on Fadng Channes M. Surendra Raju, Ramesh Annavajjaa and A. Chockangam Insca Semconductors Inda Pvt. Ltd, Bangaore-56000, Inda Department of ECE, Unversty of Caforna, San

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

DISTRIBUTED PROCESSING OVER ADAPTIVE NETWORKS. Cassio G. Lopes and Ali H. Sayed

DISTRIBUTED PROCESSING OVER ADAPTIVE NETWORKS. Cassio G. Lopes and Ali H. Sayed DISTRIBUTED PROCESSIG OVER ADAPTIVE ETWORKS Casso G Lopes and A H Sayed Department of Eectrca Engneerng Unversty of Caforna Los Angees, CA, 995 Ema: {casso, sayed@eeucaedu ABSTRACT Dstrbuted adaptve agorthms

More information

Adaptive and Iterative Least Squares Support Vector Regression Based on Quadratic Renyi Entropy

Adaptive and Iterative Least Squares Support Vector Regression Based on Quadratic Renyi Entropy daptve and Iteratve Least Squares Support Vector Regresson Based on Quadratc Ren Entrop Jngqng Jang, Chu Song, Haan Zhao, Chunguo u,3 and Yanchun Lang Coege of Mathematcs and Computer Scence, Inner Mongoa

More information

Development of whole CORe Thermal Hydraulic analysis code CORTH Pan JunJie, Tang QiFen, Chai XiaoMing, Lu Wei, Liu Dong

Development of whole CORe Thermal Hydraulic analysis code CORTH Pan JunJie, Tang QiFen, Chai XiaoMing, Lu Wei, Liu Dong Deveopment of whoe CORe Therma Hydrauc anayss code CORTH Pan JunJe, Tang QFen, Cha XaoMng, Lu We, Lu Dong cence and technoogy on reactor system desgn technoogy, Nucear Power Insttute of Chna, Chengdu,

More information

Research Article Green s Theorem for Sign Data

Research Article Green s Theorem for Sign Data Internatonal Scholarly Research Network ISRN Appled Mathematcs Volume 2012, Artcle ID 539359, 10 pages do:10.5402/2012/539359 Research Artcle Green s Theorem for Sgn Data Lous M. Houston The Unversty of

More information

Sensitivity Analysis Using Neural Network for Estimating Aircraft Stability and Control Derivatives

Sensitivity Analysis Using Neural Network for Estimating Aircraft Stability and Control Derivatives Internatona Conference on Integent and Advanced Systems 27 Senstvty Anayss Usng Neura Networ for Estmatng Arcraft Stabty and Contro Dervatves Roht Garhwa a, Abhshe Hader b and Dr. Manoranan Snha c Department

More information

arxiv: v3 [cond-mat.str-el] 15 Oct 2009

arxiv: v3 [cond-mat.str-el] 15 Oct 2009 Second Renormazaton of Tensor-Networ States Z. Y. Xe 1, H. C. Jang 2, Q. N. Chen 1, Z. Y. Weng 2, and T. Xang 3,1 1 Insttute of Theoretca Physcs, Chnese Academy of Scences, P.O. Box 2735, Beng 100190,

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

9 Adaptive Soft K-Nearest-Neighbour Classifiers with Large Margin

9 Adaptive Soft K-Nearest-Neighbour Classifiers with Large Margin 9 Adaptve Soft -Nearest-Neghbour Cassfers wth Large argn Abstract- A nove cassfer s ntroduced to overcome the mtatons of the -NN cassfcaton systems. It estmates the posteror cass probabtes usng a oca Parzen

More information

An Effective Space Charge Solver. for DYNAMION Code

An Effective Space Charge Solver. for DYNAMION Code A. Orzhehovsaya W. Barth S. Yaramyshev GSI Hemhotzzentrum für Schweronenforschung (Darmstadt) An Effectve Space Charge Sover for DYNAMION Code Introducton Genera space charge agorthms based on the effectve

More information

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization Journa of Machne Learnng Research 18 17 1-5 Submtted 9/16; Revsed 1/17; Pubshed 1/17 A Genera Dstrbuted Dua Coordnate Optmzaton Framework for Reguarzed Loss Mnmzaton Shun Zheng Insttute for Interdscpnary

More information

D hh ν. Four-body charm semileptonic decay. Jim Wiss University of Illinois

D hh ν. Four-body charm semileptonic decay. Jim Wiss University of Illinois Four-body charm semeptonc decay Jm Wss Unversty of Inos D hh ν 1 1. ector domnance. Expected decay ntensty 3. SU(3) apped to D s φν 4. Anaytc forms for form factors 5. Non-parametrc form factors 6. Future

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

On the correction of the h-index for career length

On the correction of the h-index for career length 1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat

More information

A MIN-MAX REGRET ROBUST OPTIMIZATION APPROACH FOR LARGE SCALE FULL FACTORIAL SCENARIO DESIGN OF DATA UNCERTAINTY

A MIN-MAX REGRET ROBUST OPTIMIZATION APPROACH FOR LARGE SCALE FULL FACTORIAL SCENARIO DESIGN OF DATA UNCERTAINTY A MIN-MAX REGRET ROBST OPTIMIZATION APPROACH FOR ARGE SCAE F FACTORIA SCENARIO DESIGN OF DATA NCERTAINTY Travat Assavapokee Department of Industra Engneerng, nversty of Houston, Houston, Texas 7704-4008,

More information

Numerical integration in more dimensions part 2. Remo Minero

Numerical integration in more dimensions part 2. Remo Minero Numerca ntegraton n more dmensons part Remo Mnero Outne The roe of a mappng functon n mutdmensona ntegraton Gauss approach n more dmensons and quadrature rues Crtca anass of acceptabt of a gven quadrature

More information

Analysis of Block OMP using Block RIP

Analysis of Block OMP using Block RIP Anayss of ock OMP usng ock RIP Jun Wang, Gang L, Hao Zhang, Xqn Wang Department of Eectronc Engneerng, snghua Unversty, eng 00084, Chna Emas: un-wang05@mas.tsnghua.eu.cn, {gang, haozhang, wangq_ee}@tsnghua.eu.cn

More information

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k) ISSN 1749-3889 (prnt), 1749-3897 (onlne) Internatonal Journal of Nonlnear Scence Vol.17(2014) No.2,pp.188-192 Modfed Block Jacob-Davdson Method for Solvng Large Sparse Egenproblems Hongy Mao, College of

More information

The line method combined with spectral chebyshev for space-time fractional diffusion equation

The line method combined with spectral chebyshev for space-time fractional diffusion equation Apped and Computatona Mathematcs 014; 3(6): 330-336 Pubshed onne December 31, 014 (http://www.scencepubshnggroup.com/j/acm) do: 10.1164/j.acm.0140306.17 ISS: 3-5605 (Prnt); ISS: 3-5613 (Onne) The ne method

More information

Non-Linear Back-propagation: Doing. Back-Propagation without Derivatives of. the Activation Function. John Hertz.

Non-Linear Back-propagation: Doing. Back-Propagation without Derivatives of. the Activation Function. John Hertz. Non-Lnear Bac-propagaton: Dong Bac-Propagaton wthout Dervatves of the Actvaton Functon. John Hertz Nordta, Begdamsvej 7, 200 Copenhagen, Denmar Ema: hertz@nordta.d Anders Krogh Eectroncs Insttute, Technca

More information

A General Column Generation Algorithm Applied to System Reliability Optimization Problems

A General Column Generation Algorithm Applied to System Reliability Optimization Problems A Genera Coumn Generaton Agorthm Apped to System Reabty Optmzaton Probems Lea Za, Davd W. Cot, Department of Industra and Systems Engneerng, Rutgers Unversty, Pscataway, J 08854, USA Abstract A genera

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Optimization of JK Flip Flop Layout with Minimal Average Power of Consumption based on ACOR, Fuzzy-ACOR, GA, and Fuzzy-GA

Optimization of JK Flip Flop Layout with Minimal Average Power of Consumption based on ACOR, Fuzzy-ACOR, GA, and Fuzzy-GA Journa of mathematcs and computer Scence 4 (05) - 5 Optmzaton of JK Fp Fop Layout wth Mnma Average Power of Consumpton based on ACOR, Fuzzy-ACOR, GA, and Fuzzy-GA Farshd Kevanan *,, A Yekta *,, Nasser

More information

QUARTERLY OF APPLIED MATHEMATICS

QUARTERLY OF APPLIED MATHEMATICS QUARTERLY OF APPLIED MATHEMATICS Voume XLI October 983 Number 3 DIAKOPTICS OR TEARING-A MATHEMATICAL APPROACH* By P. W. AITCHISON Unversty of Mantoba Abstract. The method of dakoptcs or tearng was ntroduced

More information

Interference Alignment and Degrees of Freedom Region of Cellular Sigma Channel

Interference Alignment and Degrees of Freedom Region of Cellular Sigma Channel 2011 IEEE Internatona Symposum on Informaton Theory Proceedngs Interference Agnment and Degrees of Freedom Regon of Ceuar Sgma Channe Huaru Yn 1 Le Ke 2 Zhengdao Wang 2 1 WINLAB Dept of EEIS Unv. of Sc.

More information

ON THE BEHAVIOR OF THE CONJUGATE-GRADIENT METHOD ON ILL-CONDITIONED PROBLEMS

ON THE BEHAVIOR OF THE CONJUGATE-GRADIENT METHOD ON ILL-CONDITIONED PROBLEMS ON THE BEHAVIOR OF THE CONJUGATE-GRADIENT METHOD ON I-CONDITIONED PROBEM Anders FORGREN Technca Report TRITA-MAT-006-O Department of Mathematcs Roya Insttute of Technoogy January 006 Abstract We study

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased

More information

3. Stress-strain relationships of a composite layer

3. Stress-strain relationships of a composite layer OM PO I O U P U N I V I Y O F W N ompostes ourse 8-9 Unversty of wente ng. &ech... tress-stran reatonshps of a composte ayer - Laurent Warnet & emo Aerman.. tress-stran reatonshps of a composte ayer Introducton

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Chapter 6. Rotations and Tensors

Chapter 6. Rotations and Tensors Vector Spaces n Physcs 8/6/5 Chapter 6. Rotatons and ensors here s a speca knd of near transformaton whch s used to transforms coordnates from one set of axes to another set of axes (wth the same orgn).

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06

Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06 Dervng the Dua Prof. Bennett Math of Data Scence /3/06 Outne Ntty Grtty for SVM Revew Rdge Regresson LS-SVM=KRR Dua Dervaton Bas Issue Summary Ntty Grtty Need Dua of w, b, z w 2 2 mn st. ( x w ) = C z

More information

Lower bounds for the Crossing Number of the Cartesian Product of a Vertex-transitive Graph with a Cycle

Lower bounds for the Crossing Number of the Cartesian Product of a Vertex-transitive Graph with a Cycle Lower bounds for the Crossng Number of the Cartesan Product of a Vertex-transtve Graph wth a Cyce Junho Won MIT-PRIMES December 4, 013 Abstract. The mnmum number of crossngs for a drawngs of a gven graph

More information

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN S. Chtwong, S. Wtthayapradt, S. Intajag, and F. Cheevasuvt Faculty of Engneerng, Kng Mongkut s Insttute of Technology

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

arxiv: v1 [cs.ne] 8 Apr 2016

arxiv: v1 [cs.ne] 8 Apr 2016 Norm-preservng Orthogonal Permutaton Lnear Unt Actvaton Functons (OPLU) 1 Artem Chernodub 2 and Dmtr Nowck 3 Insttute of MMS of NASU, Center for Cybernetcs, 42 Glushkova ave., Kev, Ukrane 03187 Abstract.

More information

IDENTIFICATION OF NONLINEAR SYSTEM VIA SVR OPTIMIZED BY PARTICLE SWARM ALGORITHM

IDENTIFICATION OF NONLINEAR SYSTEM VIA SVR OPTIMIZED BY PARTICLE SWARM ALGORITHM Journa of Theoretca and Apped Informaton Technoogy th February 3. Vo. 48 No. 5-3 JATIT & LLS. A rghts reserved. ISSN: 99-8645 www.att.org E-ISSN: 87-395 IDENTIFICATION OF NONLINEAR SYSTEM VIA SVR OPTIMIZED

More information

A Dissimilarity Measure Based on Singular Value and Its Application in Incremental Discounting

A Dissimilarity Measure Based on Singular Value and Its Application in Incremental Discounting A Dssmarty Measure Based on Snguar Vaue and Its Appcaton n Incrementa Dscountng KE Xaou Department of Automaton, Unversty of Scence and Technoogy of Chna, Hefe, Chna Ema: kxu@ma.ustc.edu.cn MA Lyao Department

More information

Why feed-forward networks are in a bad shape

Why feed-forward networks are in a bad shape Why feed-forward networks are n a bad shape Patrck van der Smagt, Gerd Hrznger Insttute of Robotcs and System Dynamcs German Aerospace Center (DLR Oberpfaffenhofen) 82230 Wesslng, GERMANY emal smagt@dlr.de

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Decentralized Adaptive Control for a Class of Large-Scale Nonlinear Systems with Unknown Interactions

Decentralized Adaptive Control for a Class of Large-Scale Nonlinear Systems with Unknown Interactions Decentrazed Adaptve Contro for a Cass of Large-Scae onnear Systems wth Unknown Interactons Bahram Karm 1, Fatemeh Jahangr, Mohammad B. Menhaj 3, Iman Saboor 4 1. Center of Advanced Computatona Integence,

More information

Open Problem: The landscape of the loss surfaces of multilayer networks

Open Problem: The landscape of the loss surfaces of multilayer networks JMLR: Workshop and Conference Proceedngs vol 4: 5, 5 8th Annual Conference on Learnng Theory Open Problem: The landscape of the loss surfaces of multlayer networks Anna Choromanska Courant Insttute of

More information

NODAL PRICES IN THE DAY-AHEAD MARKET

NODAL PRICES IN THE DAY-AHEAD MARKET NODAL PRICES IN THE DAY-AHEAD MARET Fred Murphy Tempe Unversty AEG Meetng, Washngton, DC Sept. 7, 8 What we cover Two-stage stochastc program for contngency anayss n the day-ahead aucton. Fnd the LMPs

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information