Neural Networks. Incremental learning for ν-support Vector Regression

Size: px

Start display at page:

Download "Neural Networks. Incremental learning for ν-support Vector Regression"

Joanna Adams
5 years ago
Views:

Neura Networks 67 (25) 4 5 Contents sts avaabe at cencedrect Neura Networks journa homepage: www.esever.com/ocate/neunet Incrementa earnng for ν-upport Vector Regresson Bn Gu a,b,c,d,, Vctor.

Center on Atmospherc Envronment and Equpment Technoogy, PR Chna c choo of Computer & oftware, Nanjng Unversty of Informaton cence & Technoogy, Nanjng, PR Chna d Department of Medca Bophyscs, Unversty

Incrementa earnng Onne earnng ν-upport Vector Regresson upport vector machne The ν-upport Vector Regresson (ν-vr) s an effectve regresson earnng agorthm, whch has the advantage of usng a parameter ν

1 Neura Networks 67 (25) 4 5 Contents sts avaabe at cencedrect Neura Networks journa homepage: Incrementa earnng for ν-upport Vector Regresson Bn Gu a,b,c,d,, Vctor. heng e, Zhje Wang f, Derek Ho g, ad Osman h, huo L f,d a Jangsu Engneerng Center of Network Montorng, Nanjng Unversty of Informaton cence & Technoogy, Nanjng, PR Chna b Jangsu Coaboratve Innovaton Center on Atmospherc Envronment and Equpment Technoogy, PR Chna c choo of Computer & oftware, Nanjng Unversty of Informaton cence & Technoogy, Nanjng, PR Chna d Department of Medca Bophyscs, Unversty of Western Ontaro, London, Ontaro, Canada e Department of Computer cence, Unversty of Centra Arkansas, Conway, AR, UA f GE Heath Care, London, Ontaro, Canada g Vctora Hospta, London Heath cence Center, London, Ontaro, Canada h t. Joseph s Heath Care, London, Ontaro, Canada a r t c e n f o a b s t r a c t Artce hstory: Receved 9 May 24 Receved n revsed form 28 December 24 Accepted 2 March 25 Avaabe onne 6 Apr 25 Keywords: Incrementa earnng Onne earnng ν-upport Vector Regresson upport vector machne The ν-upport Vector Regresson (ν-vr) s an effectve regresson earnng agorthm, whch has the advantage of usng a parameter ν on controng the number of support vectors and adjustng the wdth of the tube automatcay. However, compared to ν-upport Vector Cassfcaton (ν-vc) (chökopf et a., 2), ν-vr ntroduces an addtona near term nto ts objectve functon. Thus, drecty appyng the accurate on-ne ν-vc agorthm (AONVM) to ν-vr w not generate an effectve nta souton. It s the man chaenge to desgn an ncrementa ν-vr earnng agorthm. To overcome ths chaenge, we propose a speca procedure caed nta adjustments n ths paper. Ths procedure adjusts the weghts of ν-vc based on the Karush Kuhn Tucker (KKT) condtons to prepare an nta souton for the ncrementa earnng. Combnng the nta adjustments wth the two steps of AONVM produces an exact and effectve ncrementa ν-vr earnng agorthm (INVR). Theoretca anayss has proven the exstence of the three key nverse matrces, whch are the cornerstones of the three steps of INVR (ncudng the nta adjustments), respectvey. The experments on benchmark datasets demonstrate that INVR can avod the nfeasbe updatng paths as far as possbe, and successfuy converges to the optma souton. The resuts aso show that INVR s faster than batch ν-vr agorthms wth both cod and warm starts. 25 Esever Ltd. A rghts reserved.. Introducton In rea-word regresson tasks, such as tme-seres predcton (e.g. Cao and Tay (23); Lu, Lee, and Chu (29)), tranng data s usuay provded sequentay, n the extreme case, one exampe at a tme, whch s an onne scenaro (Murata, 998). Batch agorthms seems computatonay wastefu as they retran a earnng mode from scratch. Incrementa earnng agorthms are more ca- pabe n ths case, because the advantage of the ncrementa earnng agorthms s that they ncorporate addtona tranng data wthout re-tranng the earnng mode from scratch (Laskov et a., 26). ν-upport Vector Regresson (ν-vr) (chökopf, moa, Wamson, & Bartett, 2) s an nterestng upport Vector Regresson (VR) agorthm, whch can automatcay adjust the parameter ϵ of the ϵ-nsenstve oss functon. Gven a tranng sampe set T {(x, y ),..., (x, y )} wth x R d and y R, Correspondng author at: Jangsu Engneerng Center of Network Montorng, Nanjng Unversty of Informaton cence & Technoogy, Nanjng, PR Chna, and choo of Computer and oftware, Nanjng Unversty of Informaton cence and Technoogy, Nanjng, PR Chna. E-ma addresses: jsgubn@nust.edu.cn (B. Gu), ssheng@uca.edu (V.. heng), zhje@uaberta.ca (Z. Wang), derek.ho@hsc.on.ca (D. Ho), sdosman@hotma.com (. Osman), huo.l@ge.com (. L). The ϵ-nsenstve oss functon used n VR s ned as y f (x) ϵ max{, y f (x) ϵ} for a predcted vaue f (x) and a true output y, whch does not penaze errors beow some ϵ >, chose a pror. Thus, the regon of a (x, y) wth {y f (x) ϵ} s caed ϵ-tube (see Fg. ) / 25 Esever Ltd. A rghts reserved.

2 B. Gu et a. / Neura Networks 67 (25) chökopf et a. (2) consdered the foowng prma probem: mn w, w + C νϵ + ξ + ξ w,ϵ,b,ξ () 2 s.t. ( w, φ(x ) + b) y ϵ + ξ, () y ( w, φ(x ) + b) ϵ + ξ, ξ (), ϵ,,...,. The correspondng dua s: mn α,α 2 s.t. (α α )(α j α j )K(x, x j ),j α α, α α () C,,..., (α α )y (2) + α Cν, where, foowng chökopf et a. (2), tranng sampes x are mapped nto a hgh dmensona reproducng kerne Hbert space (RKH) (chökopf & moa, 2) by the transformaton functon φ. K(x, x j ) φ(x ), φ(x j ),, denotes nner product n RKH. () s a shorthand mpyng both the varabes wth and wthout astersks. C s the reguarzaton constant, and ν s the ntroduced proporton parameter wth ν, whch ets one contro the number of support vectors and errors. To be more precse, they proved that ν s an upper bound on the fracton of margn errors, and a ower bound on the fracton of support vectors. In addton, wth probabty, asymptotcay, ν equas both fractons. Compared wth ϵ-upport Vector Regresson (ϵ-vr) (moa & chökopf, 23), ν-vr ntroduces two compcatons: the frst one s that the box constrants are reated to the sze of the tranng sampe set, and the second one s that one more nequaty constrant s ntroduced n the formuaton. Compared wth ν-upport Vector Cassfcaton (ν-vc) (chökopf et a., 2), ν-vr ntroduces an addtona near term nto the objectve functon of (2). To sum up, the formuaton of ν-vr s more compcated than the formuatons of ϵ-vr and ν-vc. Eary studes about VR mosty focus on sovng arge-scae probems. For exampe, Chang and Ln (2, 22) gave MO agorthm and mpementaton for tranng ϵ-vr. Tsang, Kwok, and Zurada (26) proposed core vector regresson for tranng very arge regresson probems. haev-hwartz, nger, rebro, and Cotter (2) proposed stochastc sub-gradent descent agorthm wth expct feature mappng for tranng ϵ-vr. Ho and Ln (22) and Wang and Ln (24) proposed coordnate descent agorthm for near L and L2 VR. Due to the compcatons n the formuaton of ν-vr as mentoned above, there are st no effectve methods proposed for sovng ncrementa ν-vr earnng. Let us pay our attenton to the exact ncrementa and decrementa VM agorthm (Cauwenberghs & Poggo, 2) (herenafter referred to as the C&P agorthm). nce the C&P agorthm was proposed by Cauwenberghs and Poggo n 2, further studes many focus on two aspects. One s focusng on the C&P agorthm tsef. For exampe, Gu, Wang, and Chen (28) and Laskov et a. (26) provded more detaed theoretca anayss for t. Gâmeanu and Andone (28) addressed some mpementaton ssues. Karasuyama and Takeuch (2) proposed an extenson verson whch can update mutpe sampes smutaneousy. The other appes the C&P agorthm to sove other probems. For exampe, Gretton and Desobry (23) and Laskov et a. (26) apped t to mpementng an ncrementa one-cass VM agorthm. Martn (22) and Ma, Theer, and Perkns (23) ntroduced t to ϵ-vr (Vapnk, 998) and deveoped an accurate on-ne support vector regresson (AOVR). Recenty, Gu et a. (22) ntroduced the C&P agorthm to ν-vc and proposed an effectve accurate on-ne ν-vc agorthm (AONVM), whch ncudes the reaxed adabatc ncrementa adjustments and the strct restoraton adjustments. Further, Gu and heng (23) proved the feasbty and fnte convergence of AONVM. Because great resembance exsts n ν-vr and ν-vc, n ths paper, we wsh to desgn an exact and effectve ncrementa ν-vr agorthm based on AONVM. As ν-vr has an addtona near term n the objectve functon compared wth ν-vc, drecty appyng AONVM to ν-vr w not generate an effectve nta souton for the ncrementa ν-vr earnng. To address ths ssue, we propose a new ncrementa ν-vr agorthm (coectvey caed INVR) based on AONVM. In addton to the basc steps of AONVM (.e., the reaxed adabatc ncrementa adjustments and the strct restoraton adjustments), INVR has an especa adjustng process (.e. nta adjustments), whch s used to address the compcatons of the ν-vr formuaton and to prepare the nta souton before the ncrementa earnng. Through theoretca anayss, we can show the exstence of the three key nverse matrces, whch are the cornerstone of the nta adjustments, the reaxed adabatc ncrementa adjustments, and the strct restoraton adjustments, respectvey. The experments on benchmark datasets demonstrate that INVR can avod the nfeasbe updatng path as far as possbe, and successfuy converge to the optma souton. The resuts aso show that INVR s faster than batch ν-vr agorthms wth both cod and warm starts. The rest of ths paper s organzed as foows. In ecton 2, we modfy the formuaton of ν-vr and gve ts KKT condtons. The INVR agorthm s presented n ecton 3. The expermenta setup, resuts and dscussons are presented n ecton 4. The ast secton gves some concudng remarks. Notaton: To make the notatons easer to foow, we gve a summary of the notatons n the foowng st. α, g The th eement of the vector α and g. α c, y c, z c The weght, output, and abe of the canddate extended sampe (x c, y c, z c ). The amount of the change of each varabe. ϵ, ϵ If z, ϵ and ϵ stands for ϵ and ϵ, respectvey. Otherwse, they w be gnored. Q The submatrx of Q wth the rows and coumns ndexed by. Q\ M 2 The submatrx of Q wth deetng the rows and coumns ndexed by M. Ř t, Ř t The row and the coumn of a matrx Ř correspondng to the sampe (x t, y t, z t ), respectvey., The vectors havng a the eements equa to and, respectvey, wth proper dmenson. z, u A -dmensona coumn vector wth a equa to z, and z y respectvey. det( ) The determnant of a square matrx. cos( ) The number of coumns of a matrx. rank( ) The rank of a matrx. 2. Modfed formuaton of ν-vr Obvousy, the correaton between the box constrants and the sze of the tranng sampe set makes t dffcut to desgn an ncrementa ν-vr earnng agorthm. To obtan an equvaent formuaton, whose box constrants are ndependent to the sze of the tranng sampe set, we mutpy the objectve functon of () by the sze of the tranng sampe set. Thus, we consder the foowng prma probem:

3 42 B. Gu et a. / Neura Networks 67 (25) 4 5 mn w, w + C νϵ + ξ + ξ w,ϵ,b,ξ () 2 s.t. ( w, φ(x ) + b) y ϵ + ξ, (3) y ( w, φ(x ) + b) ϵ + ξ, ξ (), ϵ,,...,. It s easy to verfy that the prma probem (3) s equvaent to the prma probem (), and ν s aso an upper bound on the fracton of margn errors and a ower bound of the fracton of support vectors. The dua probem of (3) s: mn α,α 2 s.t. (α α )(α j α j )K(x, x j ),j α α, α α () C,,...,. (α α )y (4) + α Cν, Furthermore, we ntroduce Theorem, whch concudes that for any gven ν n (4), there are aways optma soutons whch happen at the equaty α + α Cν. The orgna verson of ths theorem s proved n Chang and Ln (22). Theorem (Chang & Ln, 22). For dua probem (4), ν, there are aways optma soutons whch happen at α + α Cν. Accordng to Theorem, the nequaty constrant α + α Cν n (4) can be treated as the equaty α + α Cν, whch means that we can consder the foowng mnmzaton probem nstead of the dua (4): mn α,α 2 s.t. (α α )(α j α j )K(x, x j ),j α α, α α () C,,...,. (α α )y (5) + α Cν, Furthermore, to present the mnmzaton probem n a more compact form, we ntroduce the extended tranng sampe set, whch s ned as +, where {(x, y, z )}, + {(x, y, z +)}, and z s the abe of the tranng sampe (x, y ). Thus, the mnmzaton probem (5) can be further rewrtten as: mn α 2 s.t. 2 2,j 2 α α j Q j z y α (6) z α, 2 α C,,..., 2 α Cν, where Q s a postve semnte matrx wth Q j z z j K(x, x j ). Accordng to convex optmzaton theory (Boyd & Vandenberghe, 24), the souton of the mnmzaton probem (6) can aso be obtaned by mnmzng the foowng convex quadratc objectve functon under constrants: mn α C W 2 2,j 2 α α j Q j z y α b z α + ϵ α Cν (7) Fg.. The partton of the tranng sampes nto three ndependent sets by KKTcondtons. (a). (b) E. (c) R. Tabe Two cases of confcts between α + α c and z α + z c α c when z wth a sma ncrement α c. Labe of margn support vectors Labe of the canddate sampe Confct + + Yes/no No Yes Yes No where b and ϵ are Lagrangan mutpers. Then by the KKT theorem (Karush, 939), the frst-order dervatve of W eads to the foowng KKT condtons: W b W ϵ 2 2 : g z α (8) α Cν (9) W 2 α j Q j z y + z b + ϵ α j for α for < α < C for α C. () Accordng to the vaue of the functon g, the extended tranng sampe set s parttoned nto three ndependent sets (see Fg. ): () { : g, < α < C}, the set ncudes margn support vectors strcty on the ϵ-tube; () E { : g, α C}, the set E ncudes error support vectors exceedng the ϵ-tube; () R { : g, α }, the set R ncudes the remanng vectors covered by the ϵ-tube. 3. Incrementa ν-vr agorthm In ths secton, we focus on the ncrementa ν-vr earnng agorthm especay for the mnmzaton probem (6). If a new sampe (x new, y new ) s added nto the tranng sampe set T, there w exst an ncrement n the extended tranng sampe set, whch can be ned as new {(x new, y new, +), (x new, y new, )}. Intay, the weghts of the sampes n new are set zero. If ths assgnment voates the KKT condtons, the adjustments to the weghts w become necessary. As stated n Gu et a. (22), the ncrementa ν-vr agorthm s actuay to fnd an effectve method for updatng the weghts wthout re-tranng from scratch, when any constrant of the KKT condtons s not hed. The cassca C&P agorthm s graduay ncreasng the weght α c of the added sampe (x c, y c ), whe rgorousy ensurng a the sampes satsfyng the KKT condtons. As proved n Gu and heng (23) and Gu et a. (28), there exsts a feasbe updatng path achevng the optma souton for the enarged tranng sampe set. Unfortunatey, Gu et a. (22) ponted out that ths dea cannot hod for ν-vc and ν-vr. In ths paper, we provde an n-depth

4 B. Gu et a. / Neura Networks 67 (25) anayss for ν-vr. As sted n Tabe, f z, and the (9). The three steps consttute the ncrementa ν-vr agorthm INVR (see Agorthm ), whch can fnd the optma souton for the enarged tranng sampe set wthout re-tranng from scratch. To make ths paper sef-contaned, we gve a bref revew of the strct restoraton adjustments and the strct restoraton adjustments n ectons 3.2 and 3.3, respectvey. Ther detas can be found n Gu et a. (22). ecton 3.4 proves the exstence of the three key nverse matrces n INVR. abe of the canddate sampe (x c, y c, z c ) s dfferent from those of the margn support vectors n, there exsts a confct (referred to as Confct-) between Eqs. (8) and (9) wth a sma ncrement of α c. To address ths ssue, AONVM ntroduced two speca steps (.e., the reaxed adabatc ncrementa adjustments and the strct restoraton adjustments). However, drecty appyng AONVM to ν-vr w not generate an effcent nta souton for the ncrementa ν-vr earnng. pecfcay, the objectve of the nta souton s to retreve the optma souton of the mnmzaton probem (6) when Q Q. It s obvous that the nta strategy + of AONVM (.e., settng g g, + b + b, ϵ + ϵ ) does not appy to the ncrementa ν-vr earnng due to an addtona near term n the objectve functon (2). Thus, a new procedure shoud be desgned especay for tackng ths ssue. Agorthm Incrementa ν-vr agorthm INVR : Read a new sampe (x new, y new ), and et new {(x new, y new, +), (x new, y new, )}. 2: Update g + b + b, ϵ + ϵ, η + 3: whe η do 4: Compute ˇβ and ˇγ accordng to (6)-(7). 5: Compute the maxma ncrement η max accordng to (9). 6: Update η, α, g, b, ϵ,, E and R. 7: Update the nverse matrx Ř accordng to (2)-(22). 8: end whe 9: Compute the nverse matrx R based on Ř. : Update new. : Inta the weghts of the sampes n new as, and compute ther vaues of the functon g. 2: f (x c, y c, z c ) new such that g c < then 3: Usng the reaxed adabatc ncrementa adjustments. 4: end f 5: Usng the strct restoraton adjustments. 6: Compute the nverse matrx Ř based on R. {ee ecton 3..3.} To address ths ssue, we propose the nta adjustments. The objectve of ths step s to nta the souton of the mnmzaton probem (6) before addng a new sampe (x new, y new ) nto T,.e. retrevng the optma souton of the mnmzaton probem (6) when Q Q. Our dea s frst settng g g, + + b + b, ϵ + ϵ, next mposng a shrnkage η on + y, 2, then graduay ncreasng η under the condton of rgorousy keepng a sampes satsfyng the KKT condtons. Ths s repeated unt η. Ths procedure s descrbed wth pseudo code n nes 2 8 of Agorthm, and the detas are expounded n ecton 3.. Before presentng the whoe ncrementa ν-vr agorthm, we frst ntroduce Theorem 2, whch tes us ϵ after the nta adjustments. Theorem 2. After the nta adjustments, t can be concuded that the Lagrangan mutper ϵ must be greater than or equa to. The detaed proof to Theorem 2 s proved n Appendx A.. Intay, we set the weghts of two sampes n new to be zero. Accordng to Theorem 2, t can be concuded that there exsts at most one sampe (x c, y c, z c ) from new voatng the KKT condtons (.e., g c < ) after the nta adjustments. If exstng a sampe (x c, y c, z c ) n new wth g c <, the reaxed adabatc ncrementa adjustments w be used to make a sampes satsfyng the KKT condton except the equaty restrcton (9). Fnay, the strct restoraton adjustments s used to restore the equaty restrcton 3.. Inta adjustments To prepare the nta souton of the mnmzaton probem (6) before addng a new sampe (x new, y new ) nto T, our strategy s frst settng g g, + b + b, ϵ + ϵ, next mposng a shrnkage η on y +, 2, then graduay ncreasng η under the condton of rgorousy keepng a sampes satsfyng the KKT condtons, unt η. Durng the nta adjustments, n order to keep a the sampes satsfyng the KKT condtons, we can have the foowng near system: z j α j () j α j (2) j g j α j Q j + z b + ϵ ηz y, (3) where η, α j, b, ϵ and g denote the correspondng varatons. If we ne e [,..., ] T as the -dmensona coumn vector wth a ones, and et z [z,..., z ] T, u [z y,..., z y ] T, then the near system () (3) can be further rewrtten as: z T T z Q Q b ϵ α h u η. (4) upposng Q has the nverse matrx Ř (the detaed dscusson about the reversbty of Q s provded n ecton 3..), the near reatonshp between h and η can be easy soved as foows: b ˇβ b h ϵ α Ř u η ˇβ ϵ η. (5) ubsttutng (5) nto (3), we can get the near reatonshp between g and η as foows: g ˇβ j Q j + z ˇβ b + ˇβ ϵ z y η j ˇγ η,. (6) ˇβ And obvousy,, we have γ c peca cases n nta adjustments Under the condton that Q has the nverse matrx Ř, the nta adjustments can easy obtan the near reatonshps between h and η, and between g and η accordng to (5) (6). In ths secton, we w dscuss how to determne the near reatonshps between h and η, and between g and η, when Q becomes snguar.

5 44 B. Gu et a. / Neura Networks 67 (25) 4 5 We w show that Q becomes a snguar matrx n the foowng two stuatons 2 : () The frst one s that z,.e., the sampes of just have one knd of abes. For exampe: (a) When, z +, we have e z. (b) When, z, we have e + z. In these two cases, Q s ceary a snguar matrx. () The second one s that M + >, where M + s ned as M + {(x, y, ) : (x, y, +) }, whch mpes that ϵ-tube becomes -tube. pecfcay, there exst four sampes ndexed by, 2, k and k 2, respectvey, where (x, y ) (x 2, y 2 ), z z 2, and (x k, y k ) (x k2, y k2 ), z k z k2. Then accordng to (3), we have g + g 2 g k + g k2, whch means Q + Q2 Qk + Qk2. In ths case, t s easy to verfy that Q s a snguar matrx. And f M +, we ne M as the contracted set whch s obtaned by deetng any one sampe from M + ; f M +, we ne M as an empty set. Fnay we et M. Then f M, Q s ceary a snguar matrx. If z, M M {ϵ }, and otherwse M M. Thus, we have the contracted matrx Q\ M 2. Theorem 4 shows that Q\ M 2 has the nverse matrx Ř under Assumpton (the detas of Theorem 4 w be dscussed n ecton 3.4). Furthermore, we et ϵ, α M, then the near reatonshp between h \ M and α c can be obtaned smary as foows: b h \ M ϵ Ř η α \ M u \ M ˇβ b ˇβ ϵ η. (7) \M ˇβ Fnay, ettng ˇβ ϵ, and ˇβ M, substtutng (7) nto (3), we can get the near reatonshp between g and η as (6) Computng maxma ncrement η max The prncpes of nta adjustments cannot be used drecty to obtan the new state, such that a the sampes satsfy the KKT condtons. To hande ths probem, the man strategy s to compute the maxma ncrement η max for each adjustment, such that a certan sampe mgrates among the sets, R and E. Three cases must be consdered to account for such structura changes: () A certan α n reaches a bound (an upper or a ower bound). Compute the sets: I + { : ˇβ > }, I { : ˇβ < }, where the sampes wth ˇβ are gnored due to ther nsenstvty to η. Thus the maxmum possbe weght updates are α max C α, f I + α, f I (8) and the maxma possbe η, before a certan sampe n moves to R or E, s: η α max mn I + I. ˇβ () A certan g correspondng to a sampe n R or E reaches zero. Compute the sets: I E + { E : ˇγ > }, I R { R : ˇγ < }, where sampes wth ˇγ are agan gnored because of ther nsenstvty to η. Thus the maxma possbe η R, E, before a certan sampe n R or E mgrates to, s: η R, E g mn I E + I R. ˇγ () η reaches the upper bound. The maxma possbe η η, before reachng the upper bound, s: η η η. Fnay, the three smaest vaues consttute the maxma ncrement of η. That s η max mn η, η R, E, η η. (9) After the crtca adjustment quantty η max s determned, we can update η, α, g,, E and R, smary to the approaches n Deh and Cauwenberghs (23) Updatng the nverse matrx Ř Once the components of the set are changed, the set and the state of z may aso change. That s a sampe s ether added to or removed from the set, and the state of z s transformed from z to z. 3 Accordngy, there exst changes n Q\ M 2 and Ř. In ths secton, we descrbe the foowng rues for updatng the nverse matrx Ř. () If a sampe (x t, y t, z t ) s added nto, then the nverse matrx Ř can be expanded as foows: β t b β t T b Ř Ř. + γ t t β t b β t ϵ β t β t ϵ β t \ M β t ϵ β t \ M zt, β t ϵ Q, β t M t \ M (2) where Ř, and \ M γ t t j β t j Q tj + z t β t b + β t ϵ + Q tt. () If the state of z transforms nto z from z, then the nverse matrx Ř can be expanded as foows: Ř Ř + ac T c ac ac T a where c e T () If a sampe (x t, y t, z t ) s removed from Ř c Ř, and a c Ř c T. matrx Ř can be contracted as foows: (2), then the nverse Ř Ř \tt Řt Ř t. (22) R tt \tt In summary, durng the nta adjustments, the nverse matrx Ř can be updated as descrbed above. In addton, after the strct restoraton adjustments, we need to recompute the nverse matrx 2 In ths paper, we do not consder the tranng sampe set T havng dupcate tranng sampes. A more genera assumpton of the dataset s that the margn support vectors are a neary ndependent n RKH (coectvey caed Assumpton ). The more detaed expanaton of Assumpton can be found n Gu and heng (23). 3 When the set of margn support vectors wth the abe zo becomes z {(x t, y t, z )}, after removng a sampe wth the abe z o from the set, we can have that ˇβ t accordng to the nton of nverse matrx. Ths means that removng a sampe from w not ead to the transformaton from z to z.

6 B. Gu et a. / Neura Networks 67 (25) Ř for the next round of nta adjustments, whch can be obtaned from R by the foowng steps: () Compute the nverse matrx of Q based on R usng the contracted rue, smar to (22). () Update the nverse matrx of Q + by the rue R + R. () Cacuate the nverse matrx Ř for the next round of the nta adjustments by the expanded rue, smar to (2) Reaxed adabatc ncrementa adjustments Because there may exst Confct- between Eqs. (8) and (9) durng the adjustments for α c, the mtaton mposed by Eq. (9) s removed n ths step. Thus, durng the ncrementa adjustment for α c, n order to keep a the sampes satsfyng the KKT condtons except the restrcton (9), we have the foowng near system under the condton that ϵ : j z j α j + z c α c (23) g j α j Q j + z b + α c Q c,. (24) The near system (23) (24) can be wrtten as: z T b zc α α c. (25) z Q Q c Lke the nta adjustments, we ne M M {ϵ }. Thus, we have the contracted matrx Q\ M 2. Coroary 5 shows that Q\ M 2 has an nverse (the detas of the coroary w be dscussed n ecton 3.4). Let R Q \ M 2 b, α, and α c can be soved as foows: b α R zc Q c α c, the near reatonshp between β c b β c α c. (26) Let β c M, and substtute (26) nto (23), we can get the near reatonshp between g and α c as foows: g β c j Q j + z β c b + Q c α c j γ c α c,. (27) Obvousy,, we have γ c trct restoraton adjustments After the reaxed adabatc ncrementa adjustments, we need to adjust α to restore the equaty α Cν(+). For each adjustment of α, n order to keep a the sampes satsfyng the KKT condtons and to prevent the reoccurrence of the confct (referred to as Confct-2) between Eqs. (8) and (9) effcenty, we have the foowng near system: j z j α j (28) j α j + ε ϵ + ζ (29) g j α j Q j + z b + ϵ, (3) where ζ s the ntroduced varabe for adjustng α. ε s any negatve number. ε ϵ s ncorporated n (29) as an extra term. Gu et a. (22) wsh to prevent the reoccurrence of Confct-2 between Eqs. (8) and (9) effcenty usng ths extra term. The near system (3) (29) can be further rewrtten as: b zt ε T z Q Q Q\M 2 ϵ α ζ. (3) Let Q\M 2 be the contracted matrx of Q. Theorem 6 shows that has an nverse (the detas of Theorem 6 w be dscussed n ecton 3.4). Let R Q, the near reatonshp between \M 2 b, ϵ, α and ζ can be obtaned as foows: b β b ϵ R ζ β ϵ ζ. (32) α β From (32), we have α ( + εβ ϵ ) ζ, whch mpes that the contro of the adjustment of α s acheved by ζ. Fnay, ettng β M, and substtutng (32) nto (3), we get the near reatonshp between g and ζ as foows: g ζ j β j Q j + z β b + β ϵ γ ζ,. (33) Obvousy,, we aso have γ Do the three key nverse matrces exst? As stated above, the nverses of Q\ M 2, Q\ M 2, and Q\M 2 are the cornerstone of the nta adjustments, the reaxed adabatc ncrementa adjustments, and the strct restoraton adjustments, respectvey. In ths secton, we prove ther exstence under Assumpton through Theorem 4, Coroary 5, and Theorem 6. Lemma 3. If A s a k n matrx wth a rank k, B s an n n postve nte matrx, then ABA T w aso be a postve nte matrx. Lemma 3 can be easy proved by Choesky decomposton (Househoder, 974) and yvester s rank nequaty (Househoder, 974). Theorem 4. Durng the nta adjustments, f z, the determnant of Q\ M 2 s aways ess than ; otherwse, t s aways greater than. We prove Theorem 4 detaedy n Appendx A.2. Coroary 5. Durng the reaxed adabatc ncrementa adjustments, the determnant of Q\ M 2 s aways ess than. Coroary 5 can be obtaned easy accordng to Theorem 4. Theorem 6. Durng the strct restoraton adjustments, f ε <, then the determnant of Q\M 2 s aways more than. We prove Theorem 6 detaedy n Appendx A.3. In addton, t s not dffcut to fnd that the tme compextes of updatng/downdatng Ř, R, and R are a O( 2 ), based on the rues n ecton 3..3 and the nverse updatng/downdatng rues n Gu et a. (22). A ot of references (Gunter & Zhu, 27; Haste, Rosset, Tbshran, & Zhu, 24; Wang, Yeung, & Lochovsky, 28) reported

7 46 B. Gu et a. / Neura Networks 67 (25) 4 5 that the average sze of does not ncrease wth the sze of tranng set. Our expermenta resuts aso verfed ths pont, whch means that INVR can effcenty hande arge scae probems. 4. Experments 4.. Desgn of experments In order to demonstrate the effectveness of INVR, and to show the advantage of INVR n terms of computaton effcency, we conduct a detaed expermenta study. To demonstrate the effectveness of INVR (.e., to show that INVR s a workabe and meanngfu agorthm under Assumpton ), we nvestgate the exstence of the two knds of confcts, the snguarty of Q, and the convergence of INVR, respectvey. To vadate the exstence of the two knds of confcts, we count the two knds of confcts (.e., Confct- and Confct-2, the detas can be found n Gu et a. (22)) durng the reaxed adabatc ncrementa adjustments and the strct restoraton adjustments, respectvey, over 5 tras. To nvestgate the snguarty of Q, we count the two speca cases (c.f. ecton 3.., denoted as C- and C-2) durng the nta adjustments, and a the three steps of INVR, respectvey, over 5 tras. To ustrate the fast convergence of INVR emprcay, we nvestgate the average numbers of teratons of the nta adjustments (IA), the reaxed adabatc ncrementa adjustments (RAIA), and the strct restoraton adjustments (RA), respectvey, over 2 tras. To show that INVR has the computatona superorty over the batch earnng agorthm (.e., the equenta Mnma Optmzaton (MO) agorthm of ν-vr) wth both cod start and warm start, we provde the emprca anayss of them n terms of scang of run-tme effcency. It shoud be noted that INVR and the MO agorthm have the same generazaton performance, because our INVR obtans the exact souton of ν-vr, and the MO agorthm aso does Impementaton We mpement our proposed INVR n MATLAB. Chang and Ln (22) proposed a recognzed MO-type agorthm specay desgned for batch ν-vr tranng, whch s mpemented n C++ as a part of the LIBVM software package (Chang & Ln, 2). To compare the run-tme n the same patform, we mpement the ν-vr part of the LIBVM wth both cod start and warm start n MATLAB (Chen, Ln, & chökopf, 25). A experments were performed on a 2.5 GHz Inte Core 5 machne wth 8GB RAM and MATLAB 7. patform. For kernes, the near kerne K(x, x 2 ) x x 2, poynoma kerne K(x, x 2 ) (x x 2 + ) d wth d 2, and Gaussan kerne K(x, x 2 ) exp( x x 2 2 /2σ 2 ) wth σ.77 are used n a the experments. The parameter, ε of the strct restoraton adjustments s fxed at, because any negatve vaue of ε does not change the updatng path. 4 The vaues of ν and C are fxed at.3 and, respectvey, n a the experments Datasets Tabe 2 presents the nne benchmark datasets used n our experments. These datasets are dvded nto two parts: the frst fve are sma datasets, and the ast four are arger datasets. 4 Lke AONVM, t s easy to verfy that ε can determne ζ, but s ndependent wth the structura changes of the sets, R and E. Tabe 2 Benchmark datasets used n the experments. Dataset Max #tranng set #attrbutes Housng 56 3 Forest Fres 57 2 Auto MPG Traznes 86 6 Concrete Compressve trength,3 8 Fredman 5, Cpusma 8,92 2 Cadata 2,64 8 YearPredctonMD 5,63 9 The frst fve n Tabe 2 are Housng, Forest Fres, Auto MPG, Traznes, and Concrete Compressve trength. They are from the UCI machne earnng repostory (Frank & Asuncon, 2). Ther szes vary from 86 to 3. The szes of the ast four datasets vary from 892 to 5,63. Cpusma, Cadata, and YearPredctonMD are avaabe at cjn/bsvmtoos/datasets/ regresson.htm. Fredman s an artfca dataset (Fredman, 99). The nput attrbutes (x,..., x ) are generated ndependenty, each of whch unformy dstrbuted over [, ]. The target s ned by y sn(πx x 2 ) + 2(x 3.5) 2 + x 4 + 5x 5 + σ (, ) (34) where σ (, ) s the nose term whch s normay dstrbuted wth mean and varance. Note that x,..., x 5 ony are used n (34), whe x 6,..., x are nosy rreevant nput attrbutes Expermenta resuts and dscusson When the tranng data szes of the frst sx benchmark dataset are, 5, 2, 25, and 3, respectvey, Tabe 3 presents the correspondng numbers of occurrences of Confct- and Confct- 2. From ths tabe, we fnd that the two knds of confcts happen wth a hgh probabty, especay Confct-. Thus, t s essenta to hande the confcts wthn the ncrementa ν-vr earnng agorthm. Our INVR can avod these confcts successfuy. Tabe 3 aso presents the numbers of occurrences of C- and C-2 on the frst sx benchmark datasets, where the tranng data sze of each dataset s aso set as, 5, 2, 25, and 3, respectvey. From ths tabe, we fnd that C- happens wth a hgher probabty than C-2 does. Athough C-2 happens wth a ow probabty, the possbty of the occurrences st cannot be excuded. Thus, t s very sgnfcant that INVR handes these two speca cases. Fg. 2 presents the average numbers of teratons of IA, RAIA, and RA, respectvey, on the dfferent benchmark datasets and dfferent kernes. It s obvous that these three steps exhbt quck convergence for a benchmark datasets and kernes, especay IA. Combned wth the resuts n Tabe 3, we can concude that INVR avods the nfeasbe updatng paths as far as possbe, and successfuy converges to the optma souton wth a faster convergence speed. Fg. 3 compares the run-tme of our INVR and LbVM wth both cod start and warm start, on the dfferent benchmark datasets and dfferent kernes. The resuts demonstrate that our INVR s generay much faster than the batch mpementatons usng both cod start and warm start. 5. Concudng remarks To desgn an exact ncrementa ν-vr agorthm based on AONVM, we propose a speca procedure caed nta adjustments

8 B. Gu et a. / Neura Networks 67 (25) Tabe 3 The number of occurrences of Confct-, Confct-2, C- and C-2 on the sx benchmark datasets over 5 tras. Note that L, P, and G are the abbrevatons of near, poynoma and Gaussan kernes, respectvey. Dataset ze Confct- Confct-2 C- C-2 Dataset ze Confct- Confct-2 C- C-2 Housng L P G L P G L P G L P G L P G L P G L P G L P G Traznes Forest fres Concrete compressve strength Auto MPG Fredman Fg. 2. Average numbers of teratons of IA, RAIA, and RA on the dfferent benchmark datasets. (a) Housng. (b) Forest Fres. (c) Auto MPG. (d) Traznes. (e) Concrete Compressve trength. (f) Fredman. (g) Cpusma. (h) Cadata. () YearPredctonMD. for preparng the nta souton before the ncrementa earnng. The nta adjustments and the two steps of AONVM consttute INVR. We aso prove the exstence of the three key nverse matrces, whch are the cornerstone of INVR. The expermenta resuts demonstrate that INVR can successfuy converge to the exact optma souton n a fnte number of steps by avodng

9 48 B. Gu et a. / Neura Networks 67 (25) 4 5 Fg. 3. Run-tme of LbVM(Cod tart), LbVM(Warm tart) and INVR (n seconds) on the dfferent benchmark datasets. (a) Housng. (b) Forest Fres. (c) Auto MPG. (d) Traznes. (e) Concrete Compressve trength. (f) Fredman. (g) Cpusma. (h) Cadata. () YearPredctonMD. Confct- and Confct-2, and s faster than batch ν-vr agorthms wth both cod and warm start. Theoretcay, the decrementa ν-vr earnng can aso be desgned n a smar manner. Based on the ncrementa and decrementa ν-vr agorthms, we can mpement eave-one-out cross vadaton (Weston, 999) and the earnng wth mted memory (Laskov et a., 26) effcenty. In the future, we aso pan to mpement approxmate on-ne ν-vr earnng based on the arge-scae VR tranng agorthms, such as stochastc sub-gradent descent agorthm (haev-hwartz et a., 2), and coordnate descent agorthm (Ho & Ln, 22; Wang & Ln, 24), and use the method to anayze the mages of synthetc aperture radar (Zhang, Wu, Nguyen, & un, 24) and vehce (Wen, hao, Fang, & Xue, 25). Acknowedgments Ths work was supported by the Project Funded by the Prorty Academc Program Deveopment (PAPD) of Jangsu Hgher Educaton Insttutons, the U.. Natona cence Foundaton (II- 547), and the Natona Natura cence Foundaton of Chna (Nos: and 62237). Appendx. Proofs to theoretca works A.. Proof to Theorem 2 The nta adjustments s to retreve the optma souton of the foowng probem: mn α 2( + ) s.t. 2 2,j z α, 2 α α j z z j K(x, x j ) z y α 2 α C,,..., 2. α Cν, (35) It s easy to verfy that (35) s the dua of the foowng probem. + mn w, w + C νϵ + ξ + ξ w,ϵ,b,ξ () 2 s.t. ( w, φ(x ) + b) y ϵ + ξ, (36) y ( w, φ(x ) + b) ϵ + ξ, ξ (),,...,. As stated n (7), b and ϵ are Lagrangan mutpers of (35). Furthermore, accordng to the KKT condtons, we have the

10 B. Gu et a. / Neura Networks 67 (25) reatonshps between the optma souton of (36) and the one of (35) as: w 2 α + z φ(x ), b b, and ϵ ϵ. Accordng to the strong duaty theorem (Boyd & Vandenberghe, 24), the duaty gap between (36) and (35) s equa to zero. Assume ϵ < after the nta adjustments, we can fnd a smaer vaue for the objectve functon n (36) by settng ϵ, whch mpes a contradcton. Ths competes the proof. A.2. Proof to Theorem 4 To prove ths theorem, the speca case M + s consdered frst. Accordng to (7) and the nton of an nverse matrx, f {(x t, y t, z t )}, we have ˇβ t Ř t [ z t y t ] T det( Q\ M 2 ) [ y t ] [ z t y t ] T, whch mpes that s aways nonempty durng the nta adjustments. Because M +, t s easy to verfy that:, j, f j, then x ±x j. Accordng to Assumpton, Q s a postve nte matrx, so there must exst the nverse matrx Q, and Q s aso a postve nte matrx. If z, et P z,, and otherwse P z. It s easy to verfy that rank(p) 2, f z, otherwse P z and rank(p). Then from Lemma 3 n Gu and heng (23), we have det Q\ M 2 det(q )det( P T Q P) ( ) rank(p) det(q )det(p T Q P). Because Q s a postve nte matrx, det(q ) >. From Lemma 3, we can aso show that P T Q P s a postve nte matrx, so det(p T Q P) >. Ths competes the proof under the condton M +. Next, we consder the case M +. Frst, et Q\ M 2 Q. We can construct the eementary transformaton matrx Q as foows: Q z T + Q 2 Q Q T + Q 2 Q > z Q \ M 2 Q z z T e T z e z Q \( M ) 2 Q where M M + M, M +, M M M, and Q s obtaned from Q by addng the 2 th row and coumn to the -th row and coumn for a M, among them (x 2, y 2, z 2 ) wth (x, y ) (x 2, y 2 ) and z z 2. Obvousy, Q has the same determnant as Q. Then we can compute the determnant of Q from Q., j, f j, we have x ±x j. o from Assumpton, Q s a postve nte matrx. If z, et P z,, P z, e, otherwse P z, P z. It s easy to verfy that rank(p ). If rank(p ) cos(p ), from Lemma 3 n Gu and heng (23), we have det Q det(q ) det P T P P P T Q P ( ) rank(p ) det Q det P det P T P P. Both P and P TP P are postve nte, because rank(p ) cos(p ) and rank(p ). Ths competes the proof under the condton that M + and rank(p ) cos(p ). ± If rank(p ) cos(p ),.e., z, we can construct another eementary transformaton matrx Q Tprme based on Q as foows: z z T Q ϵ e T ± Q b Q ϵ Q ϵ z e ± Q b Q ϵ > z Q Q z z T e z e z Q Q where Q s obtaned from Q by addng (deetng) the row and coumn ndexed by b to the row and coumn ndexed by ϵ. It s easy to verfy that e. Obvousy, Q has the same determnant as Q. We can compute the determnant of Q from Q. Then from Lemma 3 n Gu and heng (23), we have det Q z det Q det e z e P z T Q z det Q det P e det e z P z P det Q det P det P det e P e P, P and e P e are postve nte because rank(z ), z, and e. Ths competes the proof. A.3. Proof to Theorem 6 Accordng to the Lapace expanson of the determnant of Q (Househoder, 974), we can have det Q\M 2 det Q\M 2 + εdet Q\ M 2. Based on the concusons of Theorem 4 and Coroary 5, and the premse of ε <, we have that det Q\M 2 >. Ths competes the proof. References Boyd, tephen, & Vandenberghe, Leven (24). Convex optmzaton. tanford Unversty, Department of Eectrca Engneerng: Cambrdge Unversty Press. Cao, L. J., & Tay, Francs E. H. (23). upport vector machne wth adaptve parameters n fnanca tme seres forecastng. IEEE Transactons on Neura Networks, 4(6), Cauwenberghs, Gert, & Poggo, Tomaso (2). Incrementa and decrementa support vector machne earnng. In Advances n neura nformaton processng systems 3 (pp ). MIT Press. Chang, Chh-Chung, & Ln, Chh-Jen (2). LIBVM: a brary for support vector machnes. oftware avaabe at Chang, Chh-Chung, & Ln, Chh-Jen (22). Tranng ν-support vector regresson: Theory and agorthms. Neura Computaton, 4(8),

11 5 B. Gu et a. / Neura Networks 67 (25) 4 5 Chen, Pa-Hsuen, Ln, Chh-Jen, & chökopf, Bernhard (25). A tutora on ν-support vector machnes. Apped tochastc Modes n Busness and Industry, 2(2), 36. Deh, Chrstopher P., & Cauwenberghs, Gert (23). vm ncrementa earnng, adaptaton and optmzaton. In Proceedngs of the 23 nternatona jont conference on neura networks (pp ). Frank, A., & Asuncon, A. (2). UCI machne earnng repostory. URL: Fredman, Jerome H. (99). Mutvarate adaptve regresson spnes (wth dscusson). The Annas of tatstcs, 9(), 4. Gâmeanu, Honorus, & Andone, Răzvan (28). Impementaton ssues of an ncrementa and decrementa svm. In Proceedngs of the 8th nternatona conference on artfca neura networks, part I (pp ). Bern, Hedeberg: prnger-verag. Gretton, Arthur, & Desobry, Frédérc (23). On-ne one-cass support vector machnes. An appcaton to sgna segmentaton. In Proceedngs of the 23 IEEE nternatona conference on acoustcs, speech, and sgna processng. Vo. 2 (pp ). Gu, B., & heng, V.. (23). Feasbty and fnte convergence anayss for accurate on-ne ν-support vector machne. IEEE Transactons on Neura Networks and Learnng ystems, 24(8), Gu, B., Wang, J. D., & Chen, H. Y. (28). On-ne off-ne rankng support vector machne and anayss. In Proceedngs of nternatona jont conference on neura networks (pp ). IEEE Press. Gu, Bn, Wang, Jan-Dong, Yu, Yue-Cheng, Zheng, Guan-heng, Huang, Yu-Fan, & Xu, Tao (22). Accurate on-ne ν-support vector earnng. Neura Networks, 27(), Gunter, Lacey, & Zhu, J (27). Effcent computaton and mode seecton for the support vector regresson. Neura Computaton, 9(6), Haste, Trevor, Rosset, aharon, Tbshran, Robert, & Zhu, J. (24). The entre reguarzaton path for the support vector machne. Journa of Machne Learnng Research, Ho, Cha-Hua, & Ln, Chh-Jen (22). Large-scae near support vector regresson. The Journa of Machne Learnng Research, 3(), Househoder, A.. (974). The theory of matrces n numerca anayss. New York: Dover. Karasuyama, Masayuk, & Takeuch, Ichro (2). Mutpe ncrementa decrementa earnng of support vector machnes. IEEE Transactons on Neura Networks, 2(7), Karush, W. (939). Mnma of functons of severa varabes wth nequates as sde constrants. (M.c. dssertaton). Chcago, Inos: Dept. of Mathematcs, Unv. of Chcago. Laskov, Pave, Geh, Chrstan, Krüger, tefan, robert Müer, Kaus, Bennett, Krstn, & Parrado-hern, Emo (26). Incrementa support vector earnng: Anayss, mpementaton and appcatons. Journa of Machne Learnng Research, 7, Lu, Ch-Je, Lee, Tan-hyug, & Chu, Chh-Chou (29). Fnanca tme seres forecastng usng ndependent component anayss and support vector regresson. Decson upport ystems, [IN: ] 47(2), Ma, Junshu, Theer, James, & Perkns, mon (23). Accurate on-ne support vector regresson. Neura Computaton, [IN: ] 5(), Martn, Maro (22). On-ne support vector machne regresson. In Proceedngs of the 3th European conference on machne earnng (pp ). London, UK: prnger-verag. Murata, Noboru (998). A statstca study of on-ne earnng (pp ). chökopf, Bernhard, & moa, Aexander J. (2). Learnng wth kernes: upport vector machnes, reguarzaton, optmzaton, and beyond. Cambrdge, MA, UA: MIT Press. chökopf, Bernhard, moa, Aex J., Wamson, Robert C., & Bartett, Peter L. (2). New support vector agorthms. Neura Computaton, 2(5), haev-hwartz, ha, nger, Yoram, rebro, Nathan, & Cotter, Andrew (2). Pegasos: Prma estmated sub-gradent sover for svm. Mathematca Programmng, 27(), 3 3. moa, Aex J., & chökopf, Bernhard (23). A tutora on support vector regresson. Technca report, statstcs and computng. Tsang, Ivor W., Kwok, J. T.-Y., & Zurada, Jacek M. (26). Generazed core vector machnes. IEEE Transactons on Neura Networks, 7(5), Vapnk, V. (998). tatstca earnng theory. New York, NY: John Wey and ons, Inc. Wang, Po-We, & Ln, Chh-Jen (24). Iteraton compexty of feasbe descent methods for convex optmzaton. Journa of Machne Learnng Research, 5, URL: Wang, Gang, Yeung, Dt-Yan, & Lochovsky, Frederck H. (28). A new souton path agorthm n support vector regresson. IEEE Transactons on Neura Networks, 9(), Wen, Xuezh, hao, Lng, Fang, We, & Xue, Yu (25). Effcent feature seecton and cassfcaton for vehce detecton. IEEE Transactons on Crcuts and ystems for Vdeo Technoogy, 25(3), Weston, Jason (999). Leave-one-out support vector machnes. In IJCAI (pp ). Zhang, Hu, Wu, Q. M. Jonathan, Nguyen, Thanh Mnh, & un, Xngmn (24). ynthetc aperture radar mage segmentaton by modfed student s t-mxture mode. IEEE Transacton on Geoscence and Remote ensng, 52(7),

Image Classification Using EM And JE algorithms

Image Classification Using EM And JE algorithms Machne earnng project report Fa, 2 Xaojn Sh, jennfer@soe Image Cassfcaton Usng EM And JE agorthms Xaojn Sh Department of Computer Engneerng, Unversty of Caforna, Santa Cruz, CA, 9564 jennfer@soe.ucsc.edu