Efficient Discriminative Convolution Using Fisher Weight Map

Size: px

Start display at page:

Download "Efficient Discriminative Convolution Using Fisher Weight Map"

Felicia Webb
5 years ago
Views:

1 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM 1 Effcent Dscrmnatve Convoluton Usng Fsher Weght Map Hdek Nakayama Graduate School of Informaton Scence and Technology The Unversty of Tokyo Tokyo, JAPAN Abstract Convolutonal neural networks (CNNs have been studed for a long tme, and recently ganed ncreasngly more attenton. Deep CNNs have especally acheved remarkably hgh performance on many vsual recognton tasks due to ther hgh levels of flexblty. However, snce CNNs requre numerous parameters to be tuned va teratve operatons through layers, ther computatonal cost s mmense. Moreover, they often requre a huge number of tranng samples and techncal trcks, such as unsupervsed pretranng and heurstc tunng, to successfully tran the system. In ths work, we present a very smple method of layer-wse convoluton. We can obtan dscrmnatve flters by usng a Fsher weght map, whch well separates convolved mages between categores. Ths operaton can be determnstcally solved as a smple egenvalue problem and no back propagaton or hyper-parameters are needed. Because our method s layer-wse and based on a smple egenvalue problem, t s computatonally effcent. Also, t s relatvely stable wth a moderate amount of tranng samples and s capable of learnng densely from hgh-dmensonal descrptors wthout droppng connectons between neurons, whch s a common practce n conventonal mplementatons of CNNs. We demonstrated the promsng performance of our method n extensve experments wth two datasets. Our network used together wth approprate poolng and rectfcaton technques acheved remarkably hgh performance that was dstnctly comparable to those that were state-of-the-art. 1 Introducton Deep neural networks are agan ganng popularty n vsual categorzaton tasks wth the sgnfcant advances n computatonal power and the amount of data [11, 15]. Of these, convolutonal neural networks (CNNs [18, 22, 27] are one of the most successful approaches and have establshed state-of-the-art records n many benchmarks [4, 19, 20]. CNNs have lmted the connectons between layers just wthn nearby neurons n sub-wndows (receptve felds nspred by bologcal analyss of the vsual cortex [17]. Moreover, the weght parameters for convoluton have been shared by all sub-wndows. Thus, CNNs have substantally reduced the number of free parameters compared to fully-connected networks. However, CNNs stll have a number of dsadvantages. Frst, ther computatonal cost s stll mmense and they usually requre specal mplementaton usng graphcs processng unts (GPUs or cluster computers to acheve state-of-the-art results [4, 19]. Snce t s c The copyrght of ths document resdes wth ts authors. It may be dstrbuted unchanged freely n prnt or electronc forms.

2 2 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM occasonally dffcult for even GPUs to tran fully connected CNNs, connectons are often randomly reduced so that tranng s feasble [3, 18]. CNNs are also prone to overfttng due to ther hgh level capactes. We generally requre a huge number of tranng samples to successfully tran systems wthout the help of unsupervsed pre-tranng [10, 24]. In addton, heurstc technques such as dropouts [16] are often ncorporated, although ther theoretcal background has not been fully nvestgated. In ths work, we present a very smple but effcent method of convoluton usng a weght learnng method called the Fsher weght map (FWM. We can obtan weghts for features that well separate convolved mages between categores. Ths operaton can be determnstcally solved as a smple egenvalue problem and no hyper-parameters for back propagaton (e.g. weght decay are needed. Because our method s layer-wse and based on a smple egenvalue problem, t s computatonally effcent. It does not need unsupervsed pre-tranng and t s relatvely stable wth a moderate number of tranng samples. We demonstrated n extensve experments that our convoluton method could reasonably mprove the performance of the orgnal descrptors. Moreover, we found that the key to achevng the best performance was to use t wth approprate methods of poolng and rectfcaton, whch s another contrbuton we made. Our overall network acheved surprsngly hgh-performance comparable to that of state-of-the-art ones, despte ts smplcty. 2 Fsher weght map The Fsher weght map (FWM was orgnally proposed by Shnohara and Otsu [26] for computng spatal weghts for ndvdual pxels n mages, and had ts roots n Egenface [28] and Fsherface [1]. Whle Egenface and Fsherface smply perform prncpal component analyss (PCA or Fsher lnear dscrmnant analyss (FLDA on mage vectors, FWM s desgned for a 2-dmensonal (matrx representaton, where each pxel has multple feature channels. FWM specfcally computes weghts for each pxel such that they maxmze the Fsher crteron of the global feature vector. Its core algorthm s based on FLDA, and s a natural extenson of the egen weght map (EWM [26], whch s based on PCA. Ths dea has recently been revsted by Harada et al. [13, 14], who used FWM and other smlar technques to obtan weghts for each regon of spatal pyramds. Despte ts substantally reduced dmensonalty, ther compact spatal pyramd representaton was found to be as powerful as other state-of-the-art methods of pyramd matchng. In ths work, we mplement the FWM wth the opposte approach that s sutable to convolute local features. Although the mathematcal bass of FWM comes drectly from [26], the objectve here s essentally dfferent and has not been nvestgated, makng our contrbuton novel. More specfcally, we compute weghts for features of local neghborng pxels n receptve felds that maxmze the dscrmnatve ablty of the convolved mage map (Secton 3.1, whle the objectve of [26] s to compute dscrmnatve global features. 3 Network archtecture We consder a multlayer feedforward network for mage classfcaton (Fgure 1. At the k-th layer (L k of the network, each mage s represented by m k feature arrays of P k P k pxels. Ths 2-dmensonal array and ts elements are often called a map 1 and neurons. For example, a raw color mage (L 0 has three maps, each of whch corresponds to R, G, and B channels. We extract local features from mage patches wth standard descrptors that consttute the L 1 layer. They are further followed by poolng and convoluton layers. Fnally, 1 The defnton of a map here s totally unrelated to that for the Fsher weght map or egen weght map.

3 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM 3 Convolutons usng Fsher weght map Poolng & rectfcaton Layer 0 (Raw mage Patch descrptor Layer 3 Layer 4 OUTPUT Full connecton (Logstc regresson Layer 2 Layer 1 Poolng Fgure 1: Illustraton of our network. No back propagaton s necessary for tranng. a logstc regresson classfer s traned usng all neurons n the fnal layer as nput. Our man contrbuton s the mplementaton of convoluton layers usng weght map technques, whch we descrbe n the followng. 3.1 Convoluton wth weght maps Fgure 2 outlnes our algorthm. Let f (k (x,y R m k denote the feature vector of coordnates (x, y at the L k -th layer, each element of whch corresponds to the response of each map. We frst stack neghborng feature vectors for convoluton n a receptve feld. For n n convoluton, we concatenate n 2 vectors as x (k (x,y = ( f (kt (x δ,y δ f (kt (x δ+1,y δ f (kt (x+δ,y δ f (kt (x,y f (kt (x+δ,y+δ T, (1 where δ = n/2 and (x, y are new coordnates for convolved mages. After ths operaton s densely appled, we obtan (P k n + 1 (P k n + 1 stacked vectors. We place them together as a matrx as X = ( x (k (1,1 x(k (2,1 x (k (P k n+1,p k n+1. (2 Our objectve s to fnd a lnear projecton, f (k+1 (x,y = wt x (k (x,y (w Rm k n 2, that convolves all local features n a receptve feld to derve a new map that embeds nformatve local structures. Both EWM and FWM are formulated as an egenvalue problem to acheve ths end. We obtan convolvng projecton that defnes the next maps by usng the top m k+1 egenvectors as ( f (k+1 (1,1 f (k+1 (2,1 f (k+1 ( (P k n+1,p k n+1 = w1 w 2 w T mk+1 (X X, (3 where X s the mean matrx of X n the entre tranng dataset. Below, we descrbe two weght map technques that we tested n the experments.

4 4 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM map 1 map 2 map mk ( x, y ( k f( x 1, y ( k f( x, y ( k f( x+ 1, y x= 1 1 Fsher weght map or Egen weght map w 1 w mk +1 w 2 Layer k Layer k+1 1 ( k ( x, y ( x, y map 1 ( z 1 map 2 ( z 2 map m k+ 1 ( z mk +1 ( k+ 1 ( x, y f Fgure 2: Convoluton wth weght maps (FWM or EWM. Egen weght map Let z = X T w denote a convolved map vector va projecton w. EWM fnds the projecton that maxmzes the varance of z. The varance crteron, J E (w, s wrtten as J E (w = 1 N N (z z T (z z (4 =1 = w T 1 N N (X X(X X T w (5 =1 = w T Σ X w, (6 where N s the total number of tranng samples and z s the mean vector of z. The optmal projecton that maxmzes J E (w under constrant w T w = 1 can be obtaned as the egenvector of the followng egenvalue problem Σ X w = λw. (7 Fsher weght map FWM fnds dscrmnatve projectons by maxmzng between-class dstance of z. Whle EWM s an unsupervsed method, FWM maxmzes the Fsher crteron, J F (w, n a supervsed manner to separate categores. Therefore, t s expected to work better than EWM n terms of classfcaton. Let Σ W and Σ B correspond to the wthn-class and between-class covarance matrces of z, respectvely. More specfcally, Σ W = 1 N Σ B = 1 N C N j j=1 =1 (z ( j z ( j (z ( j z ( j T, (8 C N j ( z ( j z( z ( j z T, (9 j=1

5 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM 5 where C s the number of categores, N j s the number of tranng samples n class j, z ( j s the -th sample n class j, and z ( j s ther mean. The traces of Σ W and Σ B can be wrtten as tr Σ W = 1 N C N j j=1 =1 = w T 1 N (z ( j z ( j T (z ( j C N j j=1 =1 (X ( j X ( j (X ( j z ( j (10 X ( j T w (11 = w T Σ W w. (12 tr Σ B = 1 N C N j ( z ( j z T ( z ( j z (13 j=1 = w T 1 N C N j ( X ( j X( X ( j X T w (14 j=1 Therefore, the Fsher crteron s defned as = w T Σ B w. (15 J F (w = tr Σ B tr Σ W = wt Σ B w w T Σ W w. (16 The optmal weghts that maxmze the above can be analytcally obtaned by solvng the followng egenvalue problem. Σ B w = λσ W w. (17 If the dmensons of Σ W are too large, we can apply EWM before FWM to reduce dmensonalty to cope wth the sngularty problem [26], although we have not done ths n ths study. 3.2 Descrptors We test two descrptors to extract layer-1 features from raw mages. The frst s the Random Flter, whch convolves small mage patches (e.g., 3x3 and 5x5 wth random weghts. The weghts are set randomly between to 0.05 n our experments. Prevous work has shown that deep networks wth random weghts acheve reasonable performance despte ther smplcty [18]. Also, they are frequently used as the ntal weghts for deep CNNs. The second s the K-means encodng proposed by Coates et al. [5, 8]. We frst apply contrast normalzaton and zero component analyss (ZCA whtenng to mage patches as preprocessng. We then generate a codebook usng the K-means as n the standard bag-of-words approach [9]. We use trangular encodng to encode patch features usng the codebook. 3.3 Poolng Poolng s an operaton to summarze the responses of neghborng neurons n a map by obtanng ther statstcal propertes. In ths way, we can systematcally ncorporate local translaton nvarance, whch s crucally mportant for vsual object recognton. Ths was

6 6 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM often done n early work va smple sub-samplng [27]. Recent studes have commonly been based on dense poolng operatons [4, 5, 29]. Here, we use average-poolng (AP and maxpoolng (MP, whch have recently been theoretcally analyzed by Boureau et al. [2]. 3.4 Rectfcaton One of the keys to deep networks s to ncorporate nonlnearty by applyng an approprate rectfcaton functon to the output of neurons. Many functons have been tested for ths purpose such as the tanh and sgmod [18, 22]. Recent state-of-the-art research has demonstrated that the Rectfed Lnear Unts (ReLU functon, whch smply takes R(x = max(0, x for nput x, s surprsngly effectve for deep networks, leadng to hgh levels of performance and fast convergence [19, 23]. Although our archtecture does not need backward propagaton, we mplemented ReLU to check ts effectveness. Motvated by Coates and Ng [7], we also tested the followng two-dmensonal rectfcaton unts to smlarly explot mnus actvaton obtaned from convoluton. 4 Experment ( R 2 (x = max(0, x max(0, x. (18 We compared our methods usng varous poolng and rectfcaton strateges on two benchmarks,.e., the STL-10 [5] and MNIST [21] datasets. The notatons of network archtectures are Rand(n, d: a d-dmensonal random flter appled to n n mage patches, K m (n, d: a K-means descrptor extracted from n n mage patches wth d vsual words, C(n, m: a convolutonal layer wth m maps and n n receptve felds. C E and C F correspond to the convoluton wth EWM and FWM, R, R 2 : Rectfcaton unt (ReLU, Secton 3.4, AP[MP](n, s: Average-poolng [Max-poolng] over n n neghborhood spacng s pxels apart, and AP[MP] q : Average-poolng [Max-poolng] such that pooled regons correspond to the quadrants (2 2 spatal regons of orgnal raw mages. For example, Rand(5, 200-R-AP(4, 4-C F (3, 100-R-AP q represents the followng archtecture: (1 200 random flters, followed by (2 ReLU, (3 average poolng, (4 100 maps of 3 3 convoluton usng FWM, (5 ReLU, and (6 average poolng. We can carry out a far comparson to check the effectveness of each step by usng quadrants (AP q, MP q as the fnal layer. All the experments were done on a sngle desktop PC (Core K wth 32 GB of RAM. 4.1 STL-10 dataset The STL-10 [5] dataset s composed of color mages of 10 object classes such as arplanes, brds, and cars. Ths dataset has especally been desgned to nvestgate problems wth unsupervsed feature learnng. Whle 100,000 unlabeled mages are avalable for ths purpose, there are only 100 labeled examples per class for each predefned fold. We gnored

7 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM 7 Table 1: Classfcaton rates on STL-10 (% usng varous network archtectures. Archtecture Accuracy K m (9, 200-AP q 46.7 ± 1.7 K m (9, 200-MP q 53.4 ± 0.7 EWM FWM K m (9, 200-AP(4, 2-C(3, 100-MP q 45.2 ± ± 1.6 K m (9, 200-MP(4, 2-C(3, 100-MP q 49.3 ± ± 1.3 K m (9, 200-AP(4, 2-C(3, 100-AP q 45.2 ± ± 1.1 K m (9, 200-MP(4, 2-C(3, 100-AP q 54.6 ± ± 0.8 K m (9, 200-AP(4, 2-C(3, 100-R-AP q 50.9 ± ± 0.7 K m (9, 200-MP(4, 2-C(3, 100-R-AP q 54.9 ± ± 0.5 K m (9, 200-MP(4, 2-C(3, 100-R 2 -AP q 56.9 ± ± 0.6 Table 2: Classfcaton rates on STL-10 (% usng FWM wth dfferent numbers of features and network confguratons. Dctonary sze (d Archtecture K m (9, d-mp q 53.4 ± ± ± 0.6 K m (9, d-mp(4, 2-C F (3, 100-R-AP q 59.0 ± ± ± 0.6 K m (9, d-mp(4, 2-C F (3, 100-R 2 -AP q 61.2 ± ± ± 0.4 K m (9, d-mp(8, 4-C F (3, 100-R-AP q 60.4 ± ± ± 0.7 K m (9, d-mp(8, 4-C F (3, 100-R 2 -AP q 62.2 ± ± ± 0.4 K m (9, d-mp(8, 4-C F (3, 100-R 2 -AP(4, ± ± ± 0.7 Table 3: Comparson of classfcaton rates on STL-10 (%. 1-layer Vector Quantzaton [7] 54.9 ± layer Sparse Codng [7] 59.0 ± layer Learned Receptve Feld [6] 60.1 ± 1.0 Dscrmnatve Sum-Product Network [12] 62.3 ± 1.0 Ours, K m (9, 1000-MP(8, 4-C F (3, 100-R 2 -AP(4, ± 0.7 unlabeled samples n ths experment and smply traned our system wth labeled examples as n Gens and Domngos [12]. We frst tested varous combnatons of convoluton and poolng operatons wth K-means features usng 200 vsual words. Table 1 summarzes the results. We could obtan a standard bag-of-words representaton (K m (9, 200-AP[MP] q by smply poolng the descrptors. Max-poolng acheved better performance than average-poolng for ths dataset. After the convoluton layer, however, average-poolng substantally mproved performance from nonconvoluton models, whle max-poolng had no effect at all. Moreover, we found that ReLU after convoluton could consstently mprove performance; the R 2 flter was found to be especally effectve. These results ndcated that t was crucal to use our methods of convoluton wth adequate poolng and rectfcaton technques. Both EWM and FWM outperformed the K m (9, 200-MP q model when used wth average-poolng and the R 2 flter. As expected, FWM, whch s a dscrmnatve convoluton layer, acheved the best results. We next fne-tuned our model wth more features as n Table 2. We found that our method

8 8 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM Table 4: Classfcaton errors on MNIST (% usng varous network archtectures. Archtecture Error Rand(5, 500-AP q Rand(5, 500-MP q 1.02 Rand(5, 500-R-AP q 0.89 Rand(5, 500-R-MP q 1.02 EWM FWM Rand(5, 500-R-MP(3, 2-C(3, 200-AP q Rand(5, 500-R-AP(3, 2-C(3, 200-AP q Rand(5, 500-R-AP(3, 2-C(3, 200-R-AP q Rand(5, 500-R-AP(3, 2-C(3, 200-R 2 -AP q Table 5: Classfcaton errors on MNIST (% usng dfferent sze and feature dmensonalty of receptve felds. Rand(5, d-r-ap(3,2-c F (n,100-r 2 -AP q d n ( ( ( ( ( ( ( ( ( ( ( ( ( ( (7500 of convoluton could successfully learn weght parameters from more features wthout overfttng, although there were only 100 labeled examples per class. Moreover, we could further mprove performance by usng a fner grd for the last poolng layer (AP(4, 3 nstead of quadrants. We have compared the scores for our method and prevous ones n Table 3. Our best model outperformed all prevously publshed scores n the lterature by a large margn. 4.2 MNIST dataset MNIST s a dataset of tny (28 28 pxels 2 handwrtten dgts, and has been a de-facto standard for the study of deep networks. There are 60,000 tranng samples and 10,000 testng samples n ths dataset. Approxmately 6,000 tranng and 1,000 testng samples are specfed per class (0 to 9. We found that random flters worked well for ths dataset. Table 4 summarzes classfcaton error usng dfferent combnatons of convoluton and poolng methods. It was somewhat surprsng that just applyng ReLU and average-poolng could greatly mprove performance (Rand(5, 500-R-AP q. After convoluton, however, performance fell wthout the help of ReLU. The results bascally correspond to those for STL-10, but ReLU appears to be more mportant for ths dataset. Also, EWM, unlke n STL-10, does not mprove performance from the orgnal random features. We next nvestgated what effects the sze of the receptve feld and dmensonalty of features had on convoluton. Table 5 lsts the results. The numbers n parentheses represent the number of features n a receptve feld for each case. We observed that the key factor was 2 We append one column and one row wth zeros so that mages are

9 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM 9 Table 6: Classfcaton errors on MNIST (% wth best confguratons. Flter dmensons (d Archtecture Rand(5, d-r-ap q Rand(5, d-r-ap(3, 2-C F (3, 200-R-AP q Rand(5, d-r-ap(3, 2-C F (3, 200-R 2 -AP q Rand(5, d-r-ap(3, 2-C F (3, 500-R 2 -AP q Table 7: Comparson of classfcaton errors on MNIST (%. We compared our method wth prevous work usng raw tranng dataset. (* Cresan et al. [4] acheved 0.23% usng elastc dstortons. Large CNN (unsup. pretranng [25] 0.60 Large CNN (unsup. pretranng [18] layer CNN + Stochastc Poolng [29] 0.47 Mult-Column Deep Neural Network [4] 0.46 Ours, Rand(5, 1000-R-AP(3, 2-C F (3, 500-R 2 -AP q 0.44 the number of features n a receptve feld, rather than ts sze. Based on ths, we ncreased the dmensons of features fxng n = 3 (Table 6 and effectvely obtaned better scores. Table 7 summarzes our comparson wth prevous work. State-of-the-art work on ths dataset used augmented tranng data wth varous dstortons on orgnal mages. Because ths was not of nterest n ths work, we compared our method wth prevous scores for the orgnal raw dataset. To the best of our knowledge, the best prevous score wth ths setup (no dstorton was that by Cresan et al. [4], whch was based on the combnaton of fve CNNs. We acheved 0.44% usng a sngle network wth a FWM convoluton layer wth 500 maps. Also Zeler and Fergus [29] acheved 0.47% usng a novel technque of poolng called stochastc poolng wth CNNs. It would be nterestng to mplement stochastc poolng n our network, whch we ntend to do n future work. 5 Concluson We presented a smple but effcent method of layer-wse dscrmnatve convoluton based on a Fsher weght map. Our method of convoluton can be analytcally solved and no hyper-parameters for backward learnng are needed. The expermental results revealed that our convoluton layer could reasonably mprove the performance of the orgnal descrptors. However, we also found that just usng FWM for convoluton was nsuffcent to obtan the best performance. Our method, used together wth approprate poolng methods and ReLU operatons, acheved remarkably hgh levels of performance on STL-10 and MNIST datasets that were comparable or better than those of state-of-the-art networks. Superor results on STL-10 especally ndcated that our method could stably learn from a lmted number of tranng examples. Ths s probably because our method s based on a smple egenvalue problem and s capable of densely learnng from hgh-dmensonal descrptors wthout droppng connectons between neurons. Ths fact ndcates the mportance of fullyconnected convolutonal networks, consderng that connectons are often reduced n many other CNN methods.

10 10 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM We ntend to extend our method to handle herarchcal convoluton n future work. Moreover, t would be nterestng to ntalze deep networks wth our method and fne tune them wth state-of-the-art methods of gradent descent. Acknowledgement Ths work s partally supported by Hoso Bunka Foundaton. References [1] P. Belhumeur, J. Hespanha, and D. Kregman. Egenfaces vs. Fsherfaces: Recognton usng class specfc lnear projecton. IEEE Trans. PAMI, 19(7: , [2] Y. Boureau, J. Ponce, and Y. LeCun. A theoretcal analyss of feature poolng n vsual recognton. In Proc. ICML, [3] D. Cresan, U. Meer, J. Masc, L. Gambardella, and J. Schmdhuber. Flexble, hgh performance convolutonal neural networks for mage classfcaton. In Proc. IJCAI, [4] D. Cresan, U. Meer, and J. Schmdhuber. Mult-column deep neural networks for mage classfcaton. In Proc. IEEE CVPR, [5] A. Coates, H. Lee, and A. Ng. An analyss of sngle-layer networks n unsupervsed feature learnng. In Proc. AISTATS, [6] A. Coates and A. Ng. Selectng receptve felds n deep networks. In Proc. NIPS, [7] A. Coates and A. Ng. The mportance of encodng versus tranng wth sparse codng and vector quantzaton. In Proc. ICML, [8] A. Coates and A. Ng. Learnng feature representatons wth K-means. Neural Networks: Trcks of the Trade, [9] G. Csurka, C. R. Dance, L. Fan, J. Wllamowsk, and C. Bray. Vsual categorzaton wth bags of keyponts. In Proc. ECCV Workshop on Statstcal Learnng n Computer Vson, [10] D. Erhan, Y. Bengo, A. Courvlle, P. A. Manzagol, and P. Vncent. Why does unsupervsed pre-tranng help deep learnng? Journal of Machne Learnng Research, 11: , [11] K. Fukushma. Neocogntron for handwrtten dgt recognton. Neurocomputng, 51: , [12] R. Gens and P. Domngos. Dscrmnatve learnng of sum-product networks. In Proc. NIPS, [13] T. Harada, H. Nakayama, and Y. Kunyosh. Improvng local descrptors by embeddng global and local spatal nformaton. In Proc. ECCV, 2010.

11 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM 11 [14] T. Harada, Y. Ushku, Y. Yamashta, and Y. Kunyosh. Dscrmnatve spatal pyramd. In Proc. IEEE CVPR, pages , [15] G. E. Hnton and R. Salakhutdnov. Reducng the dmensonalty of data wth neural networks. Scence, 313: , [16] G. E. Hnton, N. Srvastava, A. Krzhevsky, I. Sutskever, and R. Salakhutdnov. Improvng neural networks by preventng co-adaptaton of feature detectors. arxv preprnt, [17] D. H. Hubel and T. N. Wesel. Receptve felds of sngle neurones n the cat s strate cortex. The Journal of physology, 148: , [18] K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, and Y. Lecun. What s the best mult-stage archtecture for object recognton? In Proc. IEEE ICCV, [19] A. Krzhevsky, I. Sutskever, and G. E. Hnton. ImageNet classfcaton wth deep convolutonal neural networks. In Proc. NIPS, [20] Q. V Le, M. A. Ranzato, M. Devn, G. S. Corrado, and A. Y. Ng. Buldng hgh-level features usng large scale unsupervsed learnng. In Proc. ICML, [21] Y. LeCun. The MNIST database of handwrtten dgts. URL lecun.com/exdb/mnst/. [22] Y. LeCun, L. Bottou, Y. Bengo, and P. Haffner. Gradent-based learnng appled to document recognton. Proc. of the IEEE, [23] V. Nar and G. E. Hnton. Rectfed lnear unts mprove restrcted boltzmann machnes. In Proc. ICML, [24] M. A. Ranzato, F. J. Huang, Y. L. Boureau, and Y. LeCun. Unsupervsed learnng of nvarant feature herarches wth applcatons to object recognton. In Proc. IEEE CVPR, [25] M. A. Ranzato, C. Poultney, S. Chopra, and Y. LeCun. Effcent learnng of sparse representatons wth an energy-based model. In Proc. NIPS, [26] Y. Shnohara and N. Otsu. Facal expresson recognton usng Fsher weght maps. In IEEE FG, pages , [27] P. Y. Smard, D. Stenkraus, and J. C. Platt. Best practces for convolutonal neural networks appled to vsual document analyss. In Proc. ICDAR, [28] M. A. Turk and A. P. Pentland. Face recognton usng egenfaces. In Proc. IEEE CVPR, [29] M. D. Zeler and R. Fergus. Stochastc poolng for regularzaton of deep convolutonal neural networks. In arxv preprnt, 2013.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson