Efficient Discriminative Convolution Using Fisher Weight Map

Size: px
Start display at page:

Download "Efficient Discriminative Convolution Using Fisher Weight Map"

Transcription

1 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM 1 Effcent Dscrmnatve Convoluton Usng Fsher Weght Map Hdek Nakayama Graduate School of Informaton Scence and Technology The Unversty of Tokyo Tokyo, JAPAN Abstract Convolutonal neural networks (CNNs have been studed for a long tme, and recently ganed ncreasngly more attenton. Deep CNNs have especally acheved remarkably hgh performance on many vsual recognton tasks due to ther hgh levels of flexblty. However, snce CNNs requre numerous parameters to be tuned va teratve operatons through layers, ther computatonal cost s mmense. Moreover, they often requre a huge number of tranng samples and techncal trcks, such as unsupervsed pretranng and heurstc tunng, to successfully tran the system. In ths work, we present a very smple method of layer-wse convoluton. We can obtan dscrmnatve flters by usng a Fsher weght map, whch well separates convolved mages between categores. Ths operaton can be determnstcally solved as a smple egenvalue problem and no back propagaton or hyper-parameters are needed. Because our method s layer-wse and based on a smple egenvalue problem, t s computatonally effcent. Also, t s relatvely stable wth a moderate amount of tranng samples and s capable of learnng densely from hgh-dmensonal descrptors wthout droppng connectons between neurons, whch s a common practce n conventonal mplementatons of CNNs. We demonstrated the promsng performance of our method n extensve experments wth two datasets. Our network used together wth approprate poolng and rectfcaton technques acheved remarkably hgh performance that was dstnctly comparable to those that were state-of-the-art. 1 Introducton Deep neural networks are agan ganng popularty n vsual categorzaton tasks wth the sgnfcant advances n computatonal power and the amount of data [11, 15]. Of these, convolutonal neural networks (CNNs [18, 22, 27] are one of the most successful approaches and have establshed state-of-the-art records n many benchmarks [4, 19, 20]. CNNs have lmted the connectons between layers just wthn nearby neurons n sub-wndows (receptve felds nspred by bologcal analyss of the vsual cortex [17]. Moreover, the weght parameters for convoluton have been shared by all sub-wndows. Thus, CNNs have substantally reduced the number of free parameters compared to fully-connected networks. However, CNNs stll have a number of dsadvantages. Frst, ther computatonal cost s stll mmense and they usually requre specal mplementaton usng graphcs processng unts (GPUs or cluster computers to acheve state-of-the-art results [4, 19]. Snce t s c The copyrght of ths document resdes wth ts authors. It may be dstrbuted unchanged freely n prnt or electronc forms.

2 2 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM occasonally dffcult for even GPUs to tran fully connected CNNs, connectons are often randomly reduced so that tranng s feasble [3, 18]. CNNs are also prone to overfttng due to ther hgh level capactes. We generally requre a huge number of tranng samples to successfully tran systems wthout the help of unsupervsed pre-tranng [10, 24]. In addton, heurstc technques such as dropouts [16] are often ncorporated, although ther theoretcal background has not been fully nvestgated. In ths work, we present a very smple but effcent method of convoluton usng a weght learnng method called the Fsher weght map (FWM. We can obtan weghts for features that well separate convolved mages between categores. Ths operaton can be determnstcally solved as a smple egenvalue problem and no hyper-parameters for back propagaton (e.g. weght decay are needed. Because our method s layer-wse and based on a smple egenvalue problem, t s computatonally effcent. It does not need unsupervsed pre-tranng and t s relatvely stable wth a moderate number of tranng samples. We demonstrated n extensve experments that our convoluton method could reasonably mprove the performance of the orgnal descrptors. Moreover, we found that the key to achevng the best performance was to use t wth approprate methods of poolng and rectfcaton, whch s another contrbuton we made. Our overall network acheved surprsngly hgh-performance comparable to that of state-of-the-art ones, despte ts smplcty. 2 Fsher weght map The Fsher weght map (FWM was orgnally proposed by Shnohara and Otsu [26] for computng spatal weghts for ndvdual pxels n mages, and had ts roots n Egenface [28] and Fsherface [1]. Whle Egenface and Fsherface smply perform prncpal component analyss (PCA or Fsher lnear dscrmnant analyss (FLDA on mage vectors, FWM s desgned for a 2-dmensonal (matrx representaton, where each pxel has multple feature channels. FWM specfcally computes weghts for each pxel such that they maxmze the Fsher crteron of the global feature vector. Its core algorthm s based on FLDA, and s a natural extenson of the egen weght map (EWM [26], whch s based on PCA. Ths dea has recently been revsted by Harada et al. [13, 14], who used FWM and other smlar technques to obtan weghts for each regon of spatal pyramds. Despte ts substantally reduced dmensonalty, ther compact spatal pyramd representaton was found to be as powerful as other state-of-the-art methods of pyramd matchng. In ths work, we mplement the FWM wth the opposte approach that s sutable to convolute local features. Although the mathematcal bass of FWM comes drectly from [26], the objectve here s essentally dfferent and has not been nvestgated, makng our contrbuton novel. More specfcally, we compute weghts for features of local neghborng pxels n receptve felds that maxmze the dscrmnatve ablty of the convolved mage map (Secton 3.1, whle the objectve of [26] s to compute dscrmnatve global features. 3 Network archtecture We consder a multlayer feedforward network for mage classfcaton (Fgure 1. At the k-th layer (L k of the network, each mage s represented by m k feature arrays of P k P k pxels. Ths 2-dmensonal array and ts elements are often called a map 1 and neurons. For example, a raw color mage (L 0 has three maps, each of whch corresponds to R, G, and B channels. We extract local features from mage patches wth standard descrptors that consttute the L 1 layer. They are further followed by poolng and convoluton layers. Fnally, 1 The defnton of a map here s totally unrelated to that for the Fsher weght map or egen weght map.

3 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM 3 Convolutons usng Fsher weght map Poolng & rectfcaton Layer 0 (Raw mage Patch descrptor Layer 3 Layer 4 OUTPUT Full connecton (Logstc regresson Layer 2 Layer 1 Poolng Fgure 1: Illustraton of our network. No back propagaton s necessary for tranng. a logstc regresson classfer s traned usng all neurons n the fnal layer as nput. Our man contrbuton s the mplementaton of convoluton layers usng weght map technques, whch we descrbe n the followng. 3.1 Convoluton wth weght maps Fgure 2 outlnes our algorthm. Let f (k (x,y R m k denote the feature vector of coordnates (x, y at the L k -th layer, each element of whch corresponds to the response of each map. We frst stack neghborng feature vectors for convoluton n a receptve feld. For n n convoluton, we concatenate n 2 vectors as x (k (x,y = ( f (kt (x δ,y δ f (kt (x δ+1,y δ f (kt (x+δ,y δ f (kt (x,y f (kt (x+δ,y+δ T, (1 where δ = n/2 and (x, y are new coordnates for convolved mages. After ths operaton s densely appled, we obtan (P k n + 1 (P k n + 1 stacked vectors. We place them together as a matrx as X = ( x (k (1,1 x(k (2,1 x (k (P k n+1,p k n+1. (2 Our objectve s to fnd a lnear projecton, f (k+1 (x,y = wt x (k (x,y (w Rm k n 2, that convolves all local features n a receptve feld to derve a new map that embeds nformatve local structures. Both EWM and FWM are formulated as an egenvalue problem to acheve ths end. We obtan convolvng projecton that defnes the next maps by usng the top m k+1 egenvectors as ( f (k+1 (1,1 f (k+1 (2,1 f (k+1 ( (P k n+1,p k n+1 = w1 w 2 w T mk+1 (X X, (3 where X s the mean matrx of X n the entre tranng dataset. Below, we descrbe two weght map technques that we tested n the experments.

4 4 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM map 1 map 2 map mk ( x, y ( k f( x 1, y ( k f( x, y ( k f( x+ 1, y x= 1 1 Fsher weght map or Egen weght map w 1 w mk +1 w 2 Layer k Layer k+1 1 ( k ( x, y ( x, y map 1 ( z 1 map 2 ( z 2 map m k+ 1 ( z mk +1 ( k+ 1 ( x, y f Fgure 2: Convoluton wth weght maps (FWM or EWM. Egen weght map Let z = X T w denote a convolved map vector va projecton w. EWM fnds the projecton that maxmzes the varance of z. The varance crteron, J E (w, s wrtten as J E (w = 1 N N (z z T (z z (4 =1 = w T 1 N N (X X(X X T w (5 =1 = w T Σ X w, (6 where N s the total number of tranng samples and z s the mean vector of z. The optmal projecton that maxmzes J E (w under constrant w T w = 1 can be obtaned as the egenvector of the followng egenvalue problem Σ X w = λw. (7 Fsher weght map FWM fnds dscrmnatve projectons by maxmzng between-class dstance of z. Whle EWM s an unsupervsed method, FWM maxmzes the Fsher crteron, J F (w, n a supervsed manner to separate categores. Therefore, t s expected to work better than EWM n terms of classfcaton. Let Σ W and Σ B correspond to the wthn-class and between-class covarance matrces of z, respectvely. More specfcally, Σ W = 1 N Σ B = 1 N C N j j=1 =1 (z ( j z ( j (z ( j z ( j T, (8 C N j ( z ( j z( z ( j z T, (9 j=1

5 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM 5 where C s the number of categores, N j s the number of tranng samples n class j, z ( j s the -th sample n class j, and z ( j s ther mean. The traces of Σ W and Σ B can be wrtten as tr Σ W = 1 N C N j j=1 =1 = w T 1 N (z ( j z ( j T (z ( j C N j j=1 =1 (X ( j X ( j (X ( j z ( j (10 X ( j T w (11 = w T Σ W w. (12 tr Σ B = 1 N C N j ( z ( j z T ( z ( j z (13 j=1 = w T 1 N C N j ( X ( j X( X ( j X T w (14 j=1 Therefore, the Fsher crteron s defned as = w T Σ B w. (15 J F (w = tr Σ B tr Σ W = wt Σ B w w T Σ W w. (16 The optmal weghts that maxmze the above can be analytcally obtaned by solvng the followng egenvalue problem. Σ B w = λσ W w. (17 If the dmensons of Σ W are too large, we can apply EWM before FWM to reduce dmensonalty to cope wth the sngularty problem [26], although we have not done ths n ths study. 3.2 Descrptors We test two descrptors to extract layer-1 features from raw mages. The frst s the Random Flter, whch convolves small mage patches (e.g., 3x3 and 5x5 wth random weghts. The weghts are set randomly between to 0.05 n our experments. Prevous work has shown that deep networks wth random weghts acheve reasonable performance despte ther smplcty [18]. Also, they are frequently used as the ntal weghts for deep CNNs. The second s the K-means encodng proposed by Coates et al. [5, 8]. We frst apply contrast normalzaton and zero component analyss (ZCA whtenng to mage patches as preprocessng. We then generate a codebook usng the K-means as n the standard bag-of-words approach [9]. We use trangular encodng to encode patch features usng the codebook. 3.3 Poolng Poolng s an operaton to summarze the responses of neghborng neurons n a map by obtanng ther statstcal propertes. In ths way, we can systematcally ncorporate local translaton nvarance, whch s crucally mportant for vsual object recognton. Ths was

6 6 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM often done n early work va smple sub-samplng [27]. Recent studes have commonly been based on dense poolng operatons [4, 5, 29]. Here, we use average-poolng (AP and maxpoolng (MP, whch have recently been theoretcally analyzed by Boureau et al. [2]. 3.4 Rectfcaton One of the keys to deep networks s to ncorporate nonlnearty by applyng an approprate rectfcaton functon to the output of neurons. Many functons have been tested for ths purpose such as the tanh and sgmod [18, 22]. Recent state-of-the-art research has demonstrated that the Rectfed Lnear Unts (ReLU functon, whch smply takes R(x = max(0, x for nput x, s surprsngly effectve for deep networks, leadng to hgh levels of performance and fast convergence [19, 23]. Although our archtecture does not need backward propagaton, we mplemented ReLU to check ts effectveness. Motvated by Coates and Ng [7], we also tested the followng two-dmensonal rectfcaton unts to smlarly explot mnus actvaton obtaned from convoluton. 4 Experment ( R 2 (x = max(0, x max(0, x. (18 We compared our methods usng varous poolng and rectfcaton strateges on two benchmarks,.e., the STL-10 [5] and MNIST [21] datasets. The notatons of network archtectures are Rand(n, d: a d-dmensonal random flter appled to n n mage patches, K m (n, d: a K-means descrptor extracted from n n mage patches wth d vsual words, C(n, m: a convolutonal layer wth m maps and n n receptve felds. C E and C F correspond to the convoluton wth EWM and FWM, R, R 2 : Rectfcaton unt (ReLU, Secton 3.4, AP[MP](n, s: Average-poolng [Max-poolng] over n n neghborhood spacng s pxels apart, and AP[MP] q : Average-poolng [Max-poolng] such that pooled regons correspond to the quadrants (2 2 spatal regons of orgnal raw mages. For example, Rand(5, 200-R-AP(4, 4-C F (3, 100-R-AP q represents the followng archtecture: (1 200 random flters, followed by (2 ReLU, (3 average poolng, (4 100 maps of 3 3 convoluton usng FWM, (5 ReLU, and (6 average poolng. We can carry out a far comparson to check the effectveness of each step by usng quadrants (AP q, MP q as the fnal layer. All the experments were done on a sngle desktop PC (Core K wth 32 GB of RAM. 4.1 STL-10 dataset The STL-10 [5] dataset s composed of color mages of 10 object classes such as arplanes, brds, and cars. Ths dataset has especally been desgned to nvestgate problems wth unsupervsed feature learnng. Whle 100,000 unlabeled mages are avalable for ths purpose, there are only 100 labeled examples per class for each predefned fold. We gnored

7 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM 7 Table 1: Classfcaton rates on STL-10 (% usng varous network archtectures. Archtecture Accuracy K m (9, 200-AP q 46.7 ± 1.7 K m (9, 200-MP q 53.4 ± 0.7 EWM FWM K m (9, 200-AP(4, 2-C(3, 100-MP q 45.2 ± ± 1.6 K m (9, 200-MP(4, 2-C(3, 100-MP q 49.3 ± ± 1.3 K m (9, 200-AP(4, 2-C(3, 100-AP q 45.2 ± ± 1.1 K m (9, 200-MP(4, 2-C(3, 100-AP q 54.6 ± ± 0.8 K m (9, 200-AP(4, 2-C(3, 100-R-AP q 50.9 ± ± 0.7 K m (9, 200-MP(4, 2-C(3, 100-R-AP q 54.9 ± ± 0.5 K m (9, 200-MP(4, 2-C(3, 100-R 2 -AP q 56.9 ± ± 0.6 Table 2: Classfcaton rates on STL-10 (% usng FWM wth dfferent numbers of features and network confguratons. Dctonary sze (d Archtecture K m (9, d-mp q 53.4 ± ± ± 0.6 K m (9, d-mp(4, 2-C F (3, 100-R-AP q 59.0 ± ± ± 0.6 K m (9, d-mp(4, 2-C F (3, 100-R 2 -AP q 61.2 ± ± ± 0.4 K m (9, d-mp(8, 4-C F (3, 100-R-AP q 60.4 ± ± ± 0.7 K m (9, d-mp(8, 4-C F (3, 100-R 2 -AP q 62.2 ± ± ± 0.4 K m (9, d-mp(8, 4-C F (3, 100-R 2 -AP(4, ± ± ± 0.7 Table 3: Comparson of classfcaton rates on STL-10 (%. 1-layer Vector Quantzaton [7] 54.9 ± layer Sparse Codng [7] 59.0 ± layer Learned Receptve Feld [6] 60.1 ± 1.0 Dscrmnatve Sum-Product Network [12] 62.3 ± 1.0 Ours, K m (9, 1000-MP(8, 4-C F (3, 100-R 2 -AP(4, ± 0.7 unlabeled samples n ths experment and smply traned our system wth labeled examples as n Gens and Domngos [12]. We frst tested varous combnatons of convoluton and poolng operatons wth K-means features usng 200 vsual words. Table 1 summarzes the results. We could obtan a standard bag-of-words representaton (K m (9, 200-AP[MP] q by smply poolng the descrptors. Max-poolng acheved better performance than average-poolng for ths dataset. After the convoluton layer, however, average-poolng substantally mproved performance from nonconvoluton models, whle max-poolng had no effect at all. Moreover, we found that ReLU after convoluton could consstently mprove performance; the R 2 flter was found to be especally effectve. These results ndcated that t was crucal to use our methods of convoluton wth adequate poolng and rectfcaton technques. Both EWM and FWM outperformed the K m (9, 200-MP q model when used wth average-poolng and the R 2 flter. As expected, FWM, whch s a dscrmnatve convoluton layer, acheved the best results. We next fne-tuned our model wth more features as n Table 2. We found that our method

8 8 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM Table 4: Classfcaton errors on MNIST (% usng varous network archtectures. Archtecture Error Rand(5, 500-AP q Rand(5, 500-MP q 1.02 Rand(5, 500-R-AP q 0.89 Rand(5, 500-R-MP q 1.02 EWM FWM Rand(5, 500-R-MP(3, 2-C(3, 200-AP q Rand(5, 500-R-AP(3, 2-C(3, 200-AP q Rand(5, 500-R-AP(3, 2-C(3, 200-R-AP q Rand(5, 500-R-AP(3, 2-C(3, 200-R 2 -AP q Table 5: Classfcaton errors on MNIST (% usng dfferent sze and feature dmensonalty of receptve felds. Rand(5, d-r-ap(3,2-c F (n,100-r 2 -AP q d n ( ( ( ( ( ( ( ( ( ( ( ( ( ( (7500 of convoluton could successfully learn weght parameters from more features wthout overfttng, although there were only 100 labeled examples per class. Moreover, we could further mprove performance by usng a fner grd for the last poolng layer (AP(4, 3 nstead of quadrants. We have compared the scores for our method and prevous ones n Table 3. Our best model outperformed all prevously publshed scores n the lterature by a large margn. 4.2 MNIST dataset MNIST s a dataset of tny (28 28 pxels 2 handwrtten dgts, and has been a de-facto standard for the study of deep networks. There are 60,000 tranng samples and 10,000 testng samples n ths dataset. Approxmately 6,000 tranng and 1,000 testng samples are specfed per class (0 to 9. We found that random flters worked well for ths dataset. Table 4 summarzes classfcaton error usng dfferent combnatons of convoluton and poolng methods. It was somewhat surprsng that just applyng ReLU and average-poolng could greatly mprove performance (Rand(5, 500-R-AP q. After convoluton, however, performance fell wthout the help of ReLU. The results bascally correspond to those for STL-10, but ReLU appears to be more mportant for ths dataset. Also, EWM, unlke n STL-10, does not mprove performance from the orgnal random features. We next nvestgated what effects the sze of the receptve feld and dmensonalty of features had on convoluton. Table 5 lsts the results. The numbers n parentheses represent the number of features n a receptve feld for each case. We observed that the key factor was 2 We append one column and one row wth zeros so that mages are

9 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM 9 Table 6: Classfcaton errors on MNIST (% wth best confguratons. Flter dmensons (d Archtecture Rand(5, d-r-ap q Rand(5, d-r-ap(3, 2-C F (3, 200-R-AP q Rand(5, d-r-ap(3, 2-C F (3, 200-R 2 -AP q Rand(5, d-r-ap(3, 2-C F (3, 500-R 2 -AP q Table 7: Comparson of classfcaton errors on MNIST (%. We compared our method wth prevous work usng raw tranng dataset. (* Cresan et al. [4] acheved 0.23% usng elastc dstortons. Large CNN (unsup. pretranng [25] 0.60 Large CNN (unsup. pretranng [18] layer CNN + Stochastc Poolng [29] 0.47 Mult-Column Deep Neural Network [4] 0.46 Ours, Rand(5, 1000-R-AP(3, 2-C F (3, 500-R 2 -AP q 0.44 the number of features n a receptve feld, rather than ts sze. Based on ths, we ncreased the dmensons of features fxng n = 3 (Table 6 and effectvely obtaned better scores. Table 7 summarzes our comparson wth prevous work. State-of-the-art work on ths dataset used augmented tranng data wth varous dstortons on orgnal mages. Because ths was not of nterest n ths work, we compared our method wth prevous scores for the orgnal raw dataset. To the best of our knowledge, the best prevous score wth ths setup (no dstorton was that by Cresan et al. [4], whch was based on the combnaton of fve CNNs. We acheved 0.44% usng a sngle network wth a FWM convoluton layer wth 500 maps. Also Zeler and Fergus [29] acheved 0.47% usng a novel technque of poolng called stochastc poolng wth CNNs. It would be nterestng to mplement stochastc poolng n our network, whch we ntend to do n future work. 5 Concluson We presented a smple but effcent method of layer-wse dscrmnatve convoluton based on a Fsher weght map. Our method of convoluton can be analytcally solved and no hyper-parameters for backward learnng are needed. The expermental results revealed that our convoluton layer could reasonably mprove the performance of the orgnal descrptors. However, we also found that just usng FWM for convoluton was nsuffcent to obtan the best performance. Our method, used together wth approprate poolng methods and ReLU operatons, acheved remarkably hgh levels of performance on STL-10 and MNIST datasets that were comparable or better than those of state-of-the-art networks. Superor results on STL-10 especally ndcated that our method could stably learn from a lmted number of tranng examples. Ths s probably because our method s based on a smple egenvalue problem and s capable of densely learnng from hgh-dmensonal descrptors wthout droppng connectons between neurons. Ths fact ndcates the mportance of fullyconnected convolutonal networks, consderng that connectons are often reduced n many other CNN methods.

10 10 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM We ntend to extend our method to handle herarchcal convoluton n future work. Moreover, t would be nterestng to ntalze deep networks wth our method and fne tune them wth state-of-the-art methods of gradent descent. Acknowledgement Ths work s partally supported by Hoso Bunka Foundaton. References [1] P. Belhumeur, J. Hespanha, and D. Kregman. Egenfaces vs. Fsherfaces: Recognton usng class specfc lnear projecton. IEEE Trans. PAMI, 19(7: , [2] Y. Boureau, J. Ponce, and Y. LeCun. A theoretcal analyss of feature poolng n vsual recognton. In Proc. ICML, [3] D. Cresan, U. Meer, J. Masc, L. Gambardella, and J. Schmdhuber. Flexble, hgh performance convolutonal neural networks for mage classfcaton. In Proc. IJCAI, [4] D. Cresan, U. Meer, and J. Schmdhuber. Mult-column deep neural networks for mage classfcaton. In Proc. IEEE CVPR, [5] A. Coates, H. Lee, and A. Ng. An analyss of sngle-layer networks n unsupervsed feature learnng. In Proc. AISTATS, [6] A. Coates and A. Ng. Selectng receptve felds n deep networks. In Proc. NIPS, [7] A. Coates and A. Ng. The mportance of encodng versus tranng wth sparse codng and vector quantzaton. In Proc. ICML, [8] A. Coates and A. Ng. Learnng feature representatons wth K-means. Neural Networks: Trcks of the Trade, [9] G. Csurka, C. R. Dance, L. Fan, J. Wllamowsk, and C. Bray. Vsual categorzaton wth bags of keyponts. In Proc. ECCV Workshop on Statstcal Learnng n Computer Vson, [10] D. Erhan, Y. Bengo, A. Courvlle, P. A. Manzagol, and P. Vncent. Why does unsupervsed pre-tranng help deep learnng? Journal of Machne Learnng Research, 11: , [11] K. Fukushma. Neocogntron for handwrtten dgt recognton. Neurocomputng, 51: , [12] R. Gens and P. Domngos. Dscrmnatve learnng of sum-product networks. In Proc. NIPS, [13] T. Harada, H. Nakayama, and Y. Kunyosh. Improvng local descrptors by embeddng global and local spatal nformaton. In Proc. ECCV, 2010.

11 H. NAKAYAMA: EFFICIENT DISCRIMINATIVE CONVOLUTION USING FWM 11 [14] T. Harada, Y. Ushku, Y. Yamashta, and Y. Kunyosh. Dscrmnatve spatal pyramd. In Proc. IEEE CVPR, pages , [15] G. E. Hnton and R. Salakhutdnov. Reducng the dmensonalty of data wth neural networks. Scence, 313: , [16] G. E. Hnton, N. Srvastava, A. Krzhevsky, I. Sutskever, and R. Salakhutdnov. Improvng neural networks by preventng co-adaptaton of feature detectors. arxv preprnt, [17] D. H. Hubel and T. N. Wesel. Receptve felds of sngle neurones n the cat s strate cortex. The Journal of physology, 148: , [18] K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, and Y. Lecun. What s the best mult-stage archtecture for object recognton? In Proc. IEEE ICCV, [19] A. Krzhevsky, I. Sutskever, and G. E. Hnton. ImageNet classfcaton wth deep convolutonal neural networks. In Proc. NIPS, [20] Q. V Le, M. A. Ranzato, M. Devn, G. S. Corrado, and A. Y. Ng. Buldng hgh-level features usng large scale unsupervsed learnng. In Proc. ICML, [21] Y. LeCun. The MNIST database of handwrtten dgts. URL lecun.com/exdb/mnst/. [22] Y. LeCun, L. Bottou, Y. Bengo, and P. Haffner. Gradent-based learnng appled to document recognton. Proc. of the IEEE, [23] V. Nar and G. E. Hnton. Rectfed lnear unts mprove restrcted boltzmann machnes. In Proc. ICML, [24] M. A. Ranzato, F. J. Huang, Y. L. Boureau, and Y. LeCun. Unsupervsed learnng of nvarant feature herarches wth applcatons to object recognton. In Proc. IEEE CVPR, [25] M. A. Ranzato, C. Poultney, S. Chopra, and Y. LeCun. Effcent learnng of sparse representatons wth an energy-based model. In Proc. NIPS, [26] Y. Shnohara and N. Otsu. Facal expresson recognton usng Fsher weght maps. In IEEE FG, pages , [27] P. Y. Smard, D. Stenkraus, and J. C. Platt. Best practces for convolutonal neural networks appled to vsual document analyss. In Proc. ICDAR, [28] M. A. Turk and A. P. Pentland. Face recognton usng egenfaces. In Proc. IEEE CVPR, [29] M. D. Zeler and R. Fergus. Stochastc poolng for regularzaton of deep convolutonal neural networks. In arxv preprnt, 2013.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

MATH 567: Mathematical Techniques in Data Science Lab 8

MATH 567: Mathematical Techniques in Data Science Lab 8 1/14 MATH 567: Mathematcal Technques n Data Scence Lab 8 Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 11, 2017 Recall We have: a (2) 1 = f(w (1) 11 x 1 + W (1) 12 x 2 + W

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Tensor Subspace Analysis

Tensor Subspace Analysis Tensor Subspace Analyss Xaofe He 1 Deng Ca Partha Nyog 1 1 Department of Computer Scence, Unversty of Chcago {xaofe, nyog}@cs.uchcago.edu Department of Computer Scence, Unversty of Illnos at Urbana-Champagn

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN S. Chtwong, S. Wtthayapradt, S. Intajag, and F. Cheevasuvt Faculty of Engneerng, Kng Mongkut s Insttute of Technology

More information

Subspace Learning Based on Tensor Analysis. by Deng Cai, Xiaofei He, and Jiawei Han

Subspace Learning Based on Tensor Analysis. by Deng Cai, Xiaofei He, and Jiawei Han Report No. UIUCDCS-R-2005-2572 UILU-ENG-2005-1767 Subspace Learnng Based on Tensor Analyss by Deng Ca, Xaofe He, and Jawe Han May 2005 Subspace Learnng Based on Tensor Analyss Deng Ca Xaofe He Jawe Han

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Unified Subspace Analysis for Face Recognition

Unified Subspace Analysis for Face Recognition Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

A Novel Biometric Feature Extraction Algorithm using Two Dimensional Fisherface in 2DPCA subspace for Face Recognition

A Novel Biometric Feature Extraction Algorithm using Two Dimensional Fisherface in 2DPCA subspace for Face Recognition A Novel ometrc Feature Extracton Algorthm usng wo Dmensonal Fsherface n 2DPA subspace for Face Recognton R. M. MUELO, W.L. WOO, and S.S. DLAY School of Electrcal, Electronc and omputer Engneerng Unversty

More information

Deep Learning. Boyang Albert Li, Jie Jay Tan

Deep Learning. Boyang Albert Li, Jie Jay Tan Deep Learnng Boyang Albert L, Je Jay Tan An Unrelated Vdeo A bcycle controller learned usng NEAT (Stanley) What do you mean, deep? Shallow Hdden Markov models ANNs wth one hdden layer Manually selected

More information

Improving Local Descriptors by Embedding Global and Local Spatial Information

Improving Local Descriptors by Embedding Global and Local Spatial Information Improvng Local Descrptors by Embeddng Global and Local Spatal Informaton Tatsuya Harada, Hdek Nakayama, and Yasuo Kunyosh Graduate School of Informaton Scence and Technology, The Unversty of Tokyo, 7-3-1

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Training Convolutional Neural Networks

Training Convolutional Neural Networks Tranng Convolutonal Neural Networks Carlo Tomas November 26, 208 The Soft-Max Smplex Neural networks are typcally desgned to compute real-valued functons y = h(x) : R d R e of ther nput x When a classfer

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

A New Facial Expression Recognition Method Based on * Local Gabor Filter Bank and PCA plus LDA

A New Facial Expression Recognition Method Based on * Local Gabor Filter Bank and PCA plus LDA Hong-Bo Deng, Lan-Wen Jn, L-Xn Zhen, Jan-Cheng Huang A New Facal Expresson Recognton Method Based on Local Gabor Flter Bank and PCA plus LDA A New Facal Expresson Recognton Method Based on * Local Gabor

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

LECTURE 9 CANONICAL CORRELATION ANALYSIS

LECTURE 9 CANONICAL CORRELATION ANALYSIS LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of

More information

Fisher Linear Discriminant Analysis

Fisher Linear Discriminant Analysis Fsher Lnear Dscrmnant Analyss Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan Fsher lnear

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

Microwave Diversity Imaging Compression Using Bioinspired

Microwave Diversity Imaging Compression Using Bioinspired Mcrowave Dversty Imagng Compresson Usng Bonspred Neural Networks Youwe Yuan 1, Yong L 1, Wele Xu 1, Janghong Yu * 1 School of Computer Scence and Technology, Hangzhou Danz Unversty, Hangzhou, Zhejang,

More information

Feature Extraction by Maximizing the Average Neighborhood Margin

Feature Extraction by Maximizing the Average Neighborhood Margin Feature Extracton by Maxmzng the Average Neghborhood Margn Fe Wang, Changshu Zhang State Key Laboratory of Intellgent Technologes and Systems Department of Automaton, Tsnghua Unversty, Bejng, Chna. 184.

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

CS294A Lecture notes. Andrew Ng

CS294A Lecture notes. Andrew Ng CS294A Lecture notes Andrew Ng Sparse autoencoder 1 Introducton Supervsed learnng s one of the most powerful tools of AI, and has led to automatc zp code recognton, speech recognton, self-drvng cars, and

More information

Rotation Invariant Shape Contexts based on Feature-space Fourier Transformation

Rotation Invariant Shape Contexts based on Feature-space Fourier Transformation Fourth Internatonal Conference on Image and Graphcs Rotaton Invarant Shape Contexts based on Feature-space Fourer Transformaton Su Yang 1, Yuanyuan Wang Dept of Computer Scence and Engneerng, Fudan Unversty,

More information

Lecture 23: Artificial neural networks

Lecture 23: Artificial neural networks Lecture 23: Artfcal neural networks Broad feld that has developed over the past 20 to 30 years Confluence of statstcal mechancs, appled math, bology and computers Orgnal motvaton: mathematcal modelng of

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

Learning with Tensor Representation

Learning with Tensor Representation Report No. UIUCDCS-R-2006-276 UILU-ENG-2006-748 Learnng wth Tensor Representaton by Deng Ca, Xaofe He, and Jawe Han Aprl 2006 Learnng wth Tensor Representaton Deng Ca Xaofe He Jawe Han Department of Computer

More information

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Multilayer neural networks

Multilayer neural networks Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer

More information

arxiv: v1 [cs.ne] 8 Apr 2016

arxiv: v1 [cs.ne] 8 Apr 2016 Norm-preservng Orthogonal Permutaton Lnear Unt Actvaton Functons (OPLU) 1 Artem Chernodub 2 and Dmtr Nowck 3 Insttute of MMS of NASU, Center for Cybernetcs, 42 Glushkova ave., Kev, Ukrane 03187 Abstract.

More information

A New Evolutionary Computation Based Approach for Learning Bayesian Network

A New Evolutionary Computation Based Approach for Learning Bayesian Network Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 4026 4030 Advanced n Control Engneerng and Informaton Scence A New Evolutonary Computaton Based Approach for Learnng Bayesan Network Yungang

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

A neural network with localized receptive fields for visual pattern classification

A neural network with localized receptive fields for visual pattern classification Unversty of Wollongong Research Onlne Faculty of Informatcs - Papers (Archve) Faculty of Engneerng and Informaton Scences 2005 A neural network wth localzed receptve felds for vsual pattern classfcaton

More information

Manifold Learning for Complex Visual Analytics: Benefits from and to Neural Architectures

Manifold Learning for Complex Visual Analytics: Benefits from and to Neural Architectures Manfold Learnng for Complex Vsual Analytcs: Benefts from and to Neural Archtectures Stephane Marchand-Mallet Vper group Unversty of Geneva Swtzerland Edgar Roman-Rangel, Ke Sun (Vper) A. Agocs, D. Dardans,

More information

arxiv: v1 [cs.cv] 9 Nov 2017

arxiv: v1 [cs.cv] 9 Nov 2017 Feed Forward and Backward Run n Deep Convoluton Neural Network Pushparaja Murugan School of Mechancal and Aerospace Engneerng, Nanyang Technologcal Unversty, Sngapore 63985 arxv:703278v [cscv] 9 Nov 207

More information

CS294A Lecture notes. Andrew Ng

CS294A Lecture notes. Andrew Ng CS294A Lecture notes Andrew Ng Sparse autoencoder 1 Introducton Supervsed learnng s one of the most powerful tools of AI, and has led to automatc zp code recognton, speech recognton, self-drvng cars, and

More information

BACKGROUND SUBTRACTION WITH EIGEN BACKGROUND METHODS USING MATLAB

BACKGROUND SUBTRACTION WITH EIGEN BACKGROUND METHODS USING MATLAB BACKGROUND SUBTRACTION WITH EIGEN BACKGROUND METHODS USING MATLAB 1 Ilmyat Sar 2 Nola Marna 1 Pusat Stud Komputas Matematka, Unverstas Gunadarma e-mal: lmyat@staff.gunadarma.ac.d 2 Pusat Stud Komputas

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information

arxiv:cs.cv/ Jun 2000

arxiv:cs.cv/ Jun 2000 Correlaton over Decomposed Sgnals: A Non-Lnear Approach to Fast and Effectve Sequences Comparson Lucano da Fontoura Costa arxv:cs.cv/0006040 28 Jun 2000 Cybernetc Vson Research Group IFSC Unversty of São

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

Deep Learning for Causal Inference

Deep Learning for Causal Inference Deep Learnng for Causal Inference Vkas Ramachandra Stanford Unversty Graduate School of Busness 655 Knght Way, Stanford, CA 94305 Abstract In ths paper, we propose the use of deep learnng technques n econometrcs,

More information

Discriminative Dictionary Learning with Low-Rank Regularization for Face Recognition

Discriminative Dictionary Learning with Low-Rank Regularization for Face Recognition Dscrmnatve Dctonary Learnng wth Low-Rank Regularzaton for Face Recognton Langyue L, Sheng L, and Yun Fu Department of Electrcal and Computer Engneerng Northeastern Unversty Boston, MA 02115, USA {l.langy,

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Integrating Neural Networks and PCA for Fast Covert Surveillance

Integrating Neural Networks and PCA for Fast Covert Surveillance Integratng Neural Networks and PCA for Fast Covert Survellance Hazem M. El-Bakry, and Mamoon H. Mamoon Faculty of Computer Scence & Informaton Systems, Mansoura Unversty, EGYPT E-mal: helbakry0@yahoo.com

More information

A kernel method for canonical correlation analysis

A kernel method for canonical correlation analysis A kernel method for canoncal correlaton analyss Shotaro Akaho AIST Neuroscence Research Insttute, Central 2, - Umezono, Tsukuba, Ibarak 3058568, Japan s.akaho@ast.go.jp http://staff.ast.go.jp/s.akaho/

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

COMPUTATIONALLY EFFICIENT WAVELET AFFINE INVARIANT FUNCTIONS FOR SHAPE RECOGNITION. Erdem Bala, Dept. of Electrical and Computer Engineering,

COMPUTATIONALLY EFFICIENT WAVELET AFFINE INVARIANT FUNCTIONS FOR SHAPE RECOGNITION. Erdem Bala, Dept. of Electrical and Computer Engineering, COMPUTATIONALLY EFFICIENT WAVELET AFFINE INVARIANT FUNCTIONS FOR SHAPE RECOGNITION Erdem Bala, Dept. of Electrcal and Computer Engneerng, Unversty of Delaware, 40 Evans Hall, Newar, DE, 976 A. Ens Cetn,

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS IJRRAS 8 (3 September 011 www.arpapress.com/volumes/vol8issue3/ijrras_8_3_08.pdf NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS H.O. Bakodah Dept. of Mathematc

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Multi-Task Learning in Heterogeneous Feature Spaces

Multi-Task Learning in Heterogeneous Feature Spaces Proceedngs of the Twenty-Ffth AAAI Conference on Artfcal Intellgence Mult-Task Learnng n Heterogeneous Feature Spaces Yu Zhang & Dt-Yan Yeung Department of Computer Scence and Engneerng Hong Kong Unversty

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

ESE566A Modern System-on-Chip Design, Spring 2017 ECE 566A Modern System-on-Chip Design, Spring 2017 Class Project: CNN hardware accelerator design

ESE566A Modern System-on-Chip Design, Spring 2017 ECE 566A Modern System-on-Chip Design, Spring 2017 Class Project: CNN hardware accelerator design ECE 566A odern System-on-Chp Desgn, Sprng 2017 Class Project: CNN hardware accelerator desgn 1. Overvew... 1 2. Background knowledge... 1 2.1 Convolutonal neural network bref ntroducton... 1 2.2 CNN summarzed

More information

A new Approach for Solving Linear Ordinary Differential Equations

A new Approach for Solving Linear Ordinary Differential Equations , ISSN 974-57X (Onlne), ISSN 974-5718 (Prnt), Vol. ; Issue No. 1; Year 14, Copyrght 13-14 by CESER PUBLICATIONS A new Approach for Solvng Lnear Ordnary Dfferental Equatons Fawz Abdelwahd Department of

More information

Using Immune Genetic Algorithm to Optimize BP Neural Network and Its Application Peng-fei LIU1,Qun-tai SHEN1 and Jun ZHI2,*

Using Immune Genetic Algorithm to Optimize BP Neural Network and Its Application Peng-fei LIU1,Qun-tai SHEN1 and Jun ZHI2,* Advances n Computer Scence Research (ACRS), volume 54 Internatonal Conference on Computer Networks and Communcaton Technology (CNCT206) Usng Immune Genetc Algorthm to Optmze BP Neural Network and Its Applcaton

More information

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing Advanced Scence and Technology Letters, pp.164-168 http://dx.do.org/10.14257/astl.2013 Pop-Clc Nose Detecton Usng Inter-Frame Correlaton for Improved Portable Audtory Sensng Dong Yun Lee, Kwang Myung Jeon,

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Using Random Subspace to Combine Multiple Features for Face Recognition

Using Random Subspace to Combine Multiple Features for Face Recognition Usng Random Subspace to Combne Multple Features for Face Recognton Xaogang Wang and Xaoou ang Department of Informaton Engneerng he Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang1, xtang}@e.cuhk.edu.hk

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Efficient and Robust Feature Extraction by Maximum Margin Criterion

Efficient and Robust Feature Extraction by Maximum Margin Criterion Effcent and Robust Feature Extracton by Maxmum Margn Crteron Hafeng L Tao Jang Department of Computer Scence Unversty of Calforna Rversde, CA 95 {hl,jang}@cs.ucr.edu Keshu Zhang Department of Electrcal

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

MDL-Based Unsupervised Attribute Ranking

MDL-Based Unsupervised Attribute Ranking MDL-Based Unsupervsed Attrbute Rankng Zdravko Markov Computer Scence Department Central Connectcut State Unversty New Brtan, CT 06050, USA http://www.cs.ccsu.edu/~markov/ markovz@ccsu.edu MDL-Based Unsupervsed

More information

Homogenised Virtual Support Vector Machines

Homogenised Virtual Support Vector Machines Homogensed Vrtual Support Vector Machnes Chrstan J. Walder 1,2 1 Max Planck Insttute for Bologcal Cybernetcs Spemannstaße 38, 72076 Tübngen, Germany. Bran C. Lovell 2 2 IRIS Research Group, EMI School

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Fourier Transform. Additive noise. Fourier Tansform. I = S + N. Noise doesn t depend on signal. We ll consider:

Fourier Transform. Additive noise. Fourier Tansform. I = S + N. Noise doesn t depend on signal. We ll consider: Flterng Announcements HW2 wll be posted later today Constructng a mosac by warpng mages. CSE252A Lecture 10a Flterng Exampel: Smoothng by Averagng Kernel: (From Bll Freeman) m=2 I Kernel sze s m+1 by m+1

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Neural Networks & Learning

Neural Networks & Learning Neural Netorks & Learnng. Introducton The basc prelmnares nvolved n the Artfcal Neural Netorks (ANN) are descrbed n secton. An Artfcal Neural Netorks (ANN) s an nformaton-processng paradgm that nspred

More information

1 Derivation of Point-to-Plane Minimization

1 Derivation of Point-to-Plane Minimization 1 Dervaton of Pont-to-Plane Mnmzaton Consder the Chen-Medon (pont-to-plane) framework for ICP. Assume we have a collecton of ponts (p, q ) wth normals n. We want to determne the optmal rotaton and translaton

More information