Sparse Training Procedure for Kernel Neuron *

Sparse ranng Procedure for Kerne Neuron * Janhua XU, Xuegong ZHANG and Yanda LI Schoo of Mathematca and Computer Scence, Nanng Norma Unversty, Nanng 0097, Jangsu Provnce, Chna xuanhua@ema.nnu.edu.cn Department of Automaton, snghua Unversty / State Key Laboratory of Integent echnoogy and Systems, Beng 00084, Chna zhangxg@ma.tsnghua.edu.cn Abstract: he kerne neuron s the generazaton of cassca McCuoch-Ptts neuron usng Mercer kernes. In order to contro generazaton abty and prune structure of kerne neuron, we construct a reguarzed rsk functona ncudng both emprca rsk functona and Lapace reguarzaton term n ths paper. Based on the gradent descent method, a nove tranng agorthm s desgned, whch s referred to as sparse tranng procedure for kerne neuron. Such a procedure can reaze the man deas: kerne, reguarzaton (or arge margn) and sparseness n the kerne machnes (e.g. support vector machnes, kerne Fsher dscrmnant anayss, etc.), and can dea wth the nonnear cassfcaton and regresson probems effectvey. Keywords: Kerne Neuron, Support Vector Machne, Sparseness, Reguarzaton.. Introducton In the artfca neura networks the basc eement s McCuoch-Ptts neuron (or M-P) []. Rosenbatt [] proposed the frst earnabe procedure: Perceptron, whch coud ony dea wth the neary separabe cases as a smpe near cassfer. In order to hande the more compcated rea-word probems, many modes and ther tranng procedures are ntroduced, e.g. back propagaton tranng method for mutayer perceptron [3], adane wth some nonnear transform [4], and rada bass functon net (RBF) [5]. Recenty, severa kerne-based machnes for nonnear probems, such as support vector machnes (SVM) [6-8], kerne Fsher dscrmnant anayss (KFD) [9], and arge margn kerne pocket agorthm [0], are ganng more and more attenton n the nonnear cassfer desgnng. here exst three attractve concepts: kerne dea, arge margn or reguarzaton, and sparseness. he kerne dea s an effectve technque to * hs work s supported by Natona Natura Scence Foundaton of Chna, proect No. 6075007 reazng the nonnear transform mpcty. XU et a [] ntroduced the kerne neuron by generazng M-P neuron through Mercer kernes, and constructed a smpe tranng agorthm based on the gradent descent method. Kerne neuron and t tranng procedure can be consdered as a unfed framework for three nonnear technques mentoned above n the neura networks. he reguarzaton technque deveoped by khonov & Arsenn [] s to hande -posed probems. Such a technque has wdey been used n neura networks. It has been found that addng a proper reguarzaton term to an obectve functona can resut n sgnfcant mprovements n net generazaton [3], and aso can prune the structure of nets [4]. here many are three usua reguarzaton terms: the squared or Gaussan, absoute or Lapace, and normazed or Cauchy reguarzaton terms. Wth respect to the effecency of supervsed earnng, Sato and Nakano [3] gave a detaed comparson on the three reguarzaon terms and dfferent earnng agorthms, and ponted out that the combnaton of the squared reguarzaton term and the second order earnng agorthm drastcay mproves the convergence and generazaton abty. Wams [5] concuded that a Lapace reguarzaton term s more approprate than the Gaussan one from the vewpont of net prunng. Ishkawa [6] used a Lapace reguarzer to construct a smpe but effectve earnng method caed a structura earnng wth forgettng n order to prunng the forward neura nets. In ths paper, n order to mprove the generazaton abty and to get a sparse dscrmnant or regresson functon for kerne neuron, we add the Lapace reguarzaton term to the orgna emprca rsk functona defned n the paper of XU et a []. Based on gradent descent approach, a tranng agorthm s constructed. It can be referred to as the sparse tranng procedure for kerne neuron, and can reaze three man deas n support vector machnes. As a nonnear technque, t can hande both nonnear cassfcaton and regresson probems effectvey.

. Defnton of Kerne Neuron hs paper s devoted to two probems: cassfcaton and regresson probems, n neura networks. Let {( x, y ), ( x, y ),..., ( x, y ),..., ( x, y )} () be the tranng set of..d sampes, where n x R. For the cassfcaton probem of bnary casses ω, ), suppose ( ω +, f x ω y = (), f x ω whe for the regresson probem, suppose y R, =,...,. he cassca M-P neuron s defned as o ( x ) = f (( w x) + b) (3) where o s the output of neuron, x s the nput vector, w and b are the weght vector and threshod respectvey. Functon f mpes the transform functon. For the M-P neuron, f s a hard mtng functon,.e. the sgn functon. In neura networks, f s a contnuousy dfferentabe and monotone functon, e.g., sgmod functon and near functon. In the paper of Xu, et a [], a kerne verson of M-P neuron s defned as o( x) = f( α k( x, x ) + β ) (4) = where α R, =,,... are the coeffcents correspondng to each sampe, k( x, x) s the kerne functon satsfyng Mercer condtons, e.g., poynoma kerne, RBF kerne and two ayers neura net [7,8]. Note that generay speakng the nput-output reatonshp s nonnear n kerne neuron. Ony n the case when the transform functon s near and kerne functon s the near kerne (namey k ( x, x) = x x), the reatonshp s near, and can be consdered as the equvaent form of M-P neuron. he kerne neuron utzes the nonnear kernes to reaze the nonnear transform form the orgna n nput vector space ( R ) to the rea number space ( R ). 3. Sparse ranng Procedure for Kerne Neuron For the kerne neuron, XU et a [] defned an emprca functona and constructed a tranng agorthm based on standard gradent descent scheme. Such a tranng procedure ony reazes the kerne dea. Specay, t s dffcut to contro the generazaton abty and to obtan a sparse decson functon. Addng a proper reguarzaton term n rsk functona to decay the connecton weghts s smpe way to prunng weghts wthout compcatng the earnng agorthm much [4]. In kerne neuron, such a prunng mpes that a sparse representaton woud occur,.e. many α woud be cose to zeros. Smutaneousy reguarzaton method aso can mprove the generazaton and convergence of tranng procedure. hus, we defne a reguarzed rsk functona consstng of emprca rsk (square error summaton between the actua output and desred output) and a Lapace reguarzaton term, that s, E( α, β ) = [ y o( )] µ α x + (5) = = where α = [ α,..., α ], μ s the reguarzaton parameter. Now our goa s to construct an effectve agorthm to fnd the coeffcent vector α and threshod β that mnmze rsk functona (5). hs can st be done by the standard gradent descent scheme. he gradent of (5) s, E = y f( y) f ( y) k( m, ) α x x m = + µ sgn( αm) (6) m=,,, E = y f( y) f ( y) β where y = = α k( x, x ) + β and f ( y ) s = the frst dervatve of f( y ), sgn s a sgn functon. Lke the back-propagaton tranng agorthm n the forward neura networks, we aso use snge sampe correcton and add a momentum term n teratve procedure. herefore, a nove teratve procedure for kerne neuron coud be rewrtten as: Agorthm- (SKN-):. Let t=0 and αm(), t β () t be arbtrary.. Pck up some sampe x. 3. t=t+. 4. Cacuate y() t = α( t ) k( x, x ) + β( t ) = f ( y ( t)) 5. Cacuate

αm( t) = λ ( y f( y( t))) f ( y( t)) k( xm, x ) λ sgn( αm( t )) + λ3 αm( t ) β( t) = λ ( y f( y( t))) f ( y( t)) + λ β( t ) 6. Update 3 αm() t = αm( t ) + αm() t β() t = β( t ) + β() t 7. If otherwse go to step 3. α m() t + β () t < ε or t tmax stop, where m=,,, λ s the earnng rate, λ = λµ, λ3 denotes the momentum parameter, ε s a threshod to stop agorthm, t max s the maxma teratons. Ishkawa [6] ponted out that such a weght decay s constant n contrast to exponenta decay [7] and unnecessary connectons fade away. hs means that a arge number of parameters are cose to zeros and the sparseness happens. Partcuary when λ = λ3 = 0, ths approach s the same as the smpe tranng procedure for kerne neuron []. Ishkawa [6] advsed a seectve prunng procedure to make ony the connecton weghts decay whose absoute vaues s beow a threshod θ after a tranng procedure sted above. Another reguarzed rsk functona for kerne neuron coud be constructed as, E( α, β ) = [ y o( )] + µ α x (7) = = α < θ Such an dea can mprove the goodness of ft of a mode and decay the sma vaues further. hs means that a more sparse representaton can occur. he nove sparse tranng procedure woud be comprsed of two steps as foow: components are forced to zero, the fna dscrmnant functon changes tte. In ths paper, a proper threshod δ s set to mpose some components zeros. If α δ, ths sampe or vector s st caed as support vector. Now the fna dscrmnant or regresson functon can be represented as f( x) = f( α k( x, x ) + β ) (8) = α δ In the case when we hande bnary cassfcaton probem, for new nput vector x, f f ( x ) > 0, then x ω, otherwse x ω. For the regresson probem, we consder the f ( x ) as the regresson resut. 4. Experment Resuts and Anayss o evauate the performance of our new tranng procedure, we devsed three artfca data sets: a neary separabe cases, a nonnear case wth 0 mscassfed sampes, and a nonnear regresson. he exampe n Fg. s a neary separabe case, where there exst 79 sampes of two casses (marked by crosses and ponts n the fgure) whch can be cassfed wthout error by severa near cassfers. Fg. ustrates the seperaton nes obtaned by usng KN [], SKN-, SKN- and SVM ght method [9] wth near kerne k ( x, y) = x y, where the crces ndcate the support vectors (or SV). Agorthm- (SKN-):. Agorthm- sted above.. Agorthm-, but ncudng a threshod θ for α. In ths paper, we ca two tranng agorthms above as sparse tranng procedure and for kerne neuron, or smpy SKN- and SKN- respectvey. As n the sparse LS-SVM [8], we refer to the { α,..., α } as the spectrum of kerne neuron. he sparseness means that many components of spectrum are very cose to zeros. If such Fg.: Some separaton hyperpanes from dfferent agorthms for the near case. 3

Fg. show the correspondng spectrums of these earnng appproaches. In KN (Fg.(a)), the spectrum s not sparse obvousy. In Fg. (b) and (c), the arge number of components are cose to zeros and the sparesness occures n the dcson functon. Fg. (d) ustrates the spectrum of SVM ght method [9]. Fg.3: he separated hyperpanes from dfferent agorthms for nonnear case. Fg.: the spectrums from dfferent approaches for the near case. Fg.4: A nonnear regress exampe wth SKN- and SVM. For nonnear probem, we desgned an exampe whch contans ten mscassfed sampes usng some near cassfer. Fg.3 shows the decson boundares from KN, SKN-, SKN- and SVM ght wth the near kerne. Note that there exsts two contradcton sampes,.e. x = x, y y,. We fnd out that the number of support vectors from our sparse tranng agorthm s ess than that from SVM ght (C=000, other parameters are defaut vaues). he exampe demonstrates that our agorthm works we for nonnear probem. o the nonnear regresson probem, a functon 0.5x ) e f ( x) = ( x + x s used [3]. In the experment, a vaue of x s randomy generated n the range from 4 to +4, and the correspondng vaue of functon s computed and added a guassan nose wth a zero mean and a varance 0.04. he tota number of tranng sampes s 30. When a rada bass functon kernne wth wdth.0 s utzed, Fg. 4(a) demonstrates the regresson resut from SKN-, where the sod curve s the regresson resut, the dashed ne true functon, and the back 4

ponts s actua sampes. In Fg. 4 (b), the resut comes from SVM wth RBF keren (wdth =0.4), C=00.0, ε = 0.. In both (a) and (b), the crces denote support vectors. It s easy to see that the number of support vectros form SKN- s ess than that form SVM, that s 6 versus 6. hree artfca exampes above show sparse tranng procedure can work we for both nonnear cassfcaton and regress probems. It s possbe that our methods can obtan more sparse functon than SVM. 5. Dscussons and Concusons he kerne neuron s the nonnear generazaton of neuron wth kernes. In order to contro the generazaton abty and obtan the sparse decson functon, two reguarzed rsk functons are defned for kerne neuron, whch consst of the emperca rsk fuctona and Lapace reguarzaton term. Based on the gradent descent scheme, two sparse tranng procedures are deveoped,.e. SKN- and SKN-. he new methods can be regarded as a new genera-purpose nonnear earnng machne, snce they can be apped for both nonnear pattern recognton and regresson probems. Experments on artfca data sets show that they work we on both neary seperatabe and non-seperatabe data, and aso on regresson probem. For three usua knds of kerne functons, our kerne neuron and ts sparse tranng procedures can mpement the smar performances of mut-ayer perceptrons (the kerne of two ayer neura networks), rada bass functon net (RBF kerne) and the adane wth nonnear preprocessors (poynoma kerne). Furthermore SKN protects us from desgnng hdden ayer node, custerng the centers and constructng the poynoma transfrom, etc. References [] McCuoch W. S., Ptts W. A ogca cacuus of the deas mmanent n nervous actvty. Buetn of Mathematca Bophyscs 5, 5-3, 943. [] Rosenbatt F. he perceptron: probabstc mode for nformaton storage and organzaton n the bran. Psychoogca Revew, 65, 958 [3] Rumehart D. E., Hnton G. E., Wams R. J. Learnng representatons by back-propagatng errors. Nature 33(9), 533-536, 986. [4] Specht D.F. Generaton of poynoma dscrmnant functon for pattern recognton. IEEE ransactons on Eectronc Computer, EC-6 (308-39), 967. [5] heodords S., Koutroumbas K. Pattern Recognton. Academc Press, San Dego, 999. [6] Cortes C., Vapnk V. N. Support Vector Networks. Machne Learnng, 0(3), 73-97, 995. [7] Vapnk V. N. Statstca Learnng heory. Wey, New York, 998. [8] Vapnk V. N. he Nature of Statstca Learnng heory (nd ed.). Sprnger-Verag, New York, 999. [9] Mka S., Ratsch G., Weston J., Schokopf B., Muer K.-R.. Fsher dscrmnant anayss wth kernes. Neura Networks for Sgna Processng IX, 4-48. IEEE Press, New York, 999. [0] Xu J., Zhang X., L Y. Large margn kerne pocket agorthm. Proceedngs of IJCNN00, 480-485, Washngton DC, 00. [] Xu J., Zhang X., L Y. Kerne Neuron and ts ranng Agorthm. Proceedngs of 8 th Internatona Conference on Neura Informaton Processng, Vo. : 86-866, Shangha Chna, Nov. 4-8, 00, Fudan Unversty Press. [] khonov A. N., Arsenn, V. Y. Souton of -posed probem. W. H. Wanston, Washngton DC, 977. [3] Sato K., Nakano R. Second order earnng agorthm wth squared penaty term. Neura Computaton, (3), 709-79,000. [4] Reed R. Prunng agorthms a survey. IEEE transactons on Neura networks, 4(5), 740-747, 993. [5] Wams, P.M. Bayesan Reguarzaton and prunng usng a Lapace pror. Neura Computaton, 7(), 7-43, 995. [6] Ishkawa M. Structura earnng wth forgettng. Neura Networks, 9(3).509-5, 99. [7] Paut D.C., Nowan S. J., Hnton G.E. Experments on earnng by back propagaton. echnca Report. CMU-CS-86-6, Carnege-Maon Unv. 986. [8] Suykens J. A. K., Lukas L., Vandewae J. Sparse east squares support vector machnes cassfers. In 8 t h European Symposum on Artfca Neura Networks (ESANN 000), 37-4, 000. [9] Joachms. Makng Large-scae SVM Learnng Practca. Advances n Kerne Methods Support Vector Learnng, Schokopf B, Burges C, and Smoa A (ed), 69-84, Cambrdge MA: MI Press, 999. 5