Sequential Minimal Optimization for SVM with Pinball Loss

Size: px
Start display at page:

Download "Sequential Minimal Optimization for SVM with Pinball Loss"

Transcription

1 Sequental Mnal Optzaton for SVM wth Pnball Loss Xaoln Huang a,, Le Sh b, Johan A.K. Suykens a a KU Leuven, Departent of Electrcal Engneerng (ESAT-STADIUS), B-300 Leuven, Belgu b School of Matheatcal Scences, Fudan Unversty, Shangha, , P.R. Chna Abstract To pursue the nsenstvty to feature nose and the stablty to re-saplng, a new type of support vector achne (SVM) has been establshed va replacng the hnge loss n the classcal SVM by the pnball loss and was hence called a pn-svm. Though a dfferent loss functon s used, pn-svm has a slar structure as the classcal SVM. Specfcally, the dual proble of pn-svm s a quadratc prograg proble wth box constrants, for whch the sequental al optzaton (SMO) technque s applcable. In ths paper, we establsh SMO algorths for pn-svm and ts sparse verson. The nuercal experents on real-lfe data sets llustrate both the good perforance of pn-svms and the effectveness of the establshed SMO ethods. Keywords: support vector achne, pnball loss, sequental al optzaton. Introducton Snce proposed n [] [2], the support vector achne (SVM) has been wdely appled and well studed, because of ts fundaental statstcal property and good generalzaton capablty. The basc dea of SVM s to axze the argn between two classes by zng the regularzaton ter. The argn s classcally related to the closest ponts of two sets, snce the hnge loss s zed. For a gven saple set z = x, y } =, where x R n, y, +}, the SVM wth the hnge loss (C- SVM) n the pral space has the followng for, w,b 2 w C = L hnge ( y (w T φ(x ) + b) ), () where φ(x) s a feature appng, L hnge (u) = ax0, u} s the hnge loss, and C s the trade-off paraeter between Ths work was supported: EU: The research leadng to these results has receved fundng fro the European Research Councl under the European Unon s Seventh Fraework Prograe (FP7/ ) / ERC AdG A-DATADRIVE-B (290923). Ths paper reflects only the authors vews, the Unon s not lable for any use that ay be ade of the contaned nforaton. Research Councl KUL: GOA/0/09 MaNet, CoE PFV/0/002 (OPTEC), BIL2/T; PhD/Postdoc grants Flesh Governent: FWO: projects: G (Structured systes), G.0884N (Tensor based data slarty); PhD/Postdoc grants IWT: projects: SBO POM (0003); PhD/Postdoc grants Mnds Medcal Inforaton Technologes SBO 204 Belgan Federal Scence Polcy Offce: IUAP P7/9 (DYSCO, Dynacal systes, control and optzaton, ). L. Sh s also supported by the Natonal Natural Scence Foundaton of Chna (20079) and the Fundaental Research Funds for the Central Unverstes of Chna ( , ). Johan Suykens s a professor at KU Leuven, Belgu. Correspondng author. Eal addresses: huangxl06@als.tsnghua.edu.cn (Xaoln Huang), lesh@fudan.edu.cn (Le Sh), johan.suykens@esat.kuleuven.be (Johan A.K. Suykens) the argn wdth and sclassfcaton loss. Snce the dstance between the closest ponts s easly affected by the nose on feature x, the classfer traned by C-SVM () s senstve to feature nose and unstable to re-saplng. Ths phenoenon has been observed by any researchers and soe technques have been desgned, see, e.g., [3] [7]. An attractve ethod for enhancng the stablty to feature nose s to change the closest dstance easureent to the quantle dstance. However, axzng the quantle dstance s non-convex. The well-known ν-support vector achne (ν-svm, [8]) can be regarded as a convex approach for axzng the quantle dstance and has been successfully appled. In ν-svm, the argn between the surfaces x : yf(x) = ρ} s axzed. Mnzng the hnge loss together wth an addtonal ter νρ pushes ρ to be the quantle value of y f(x ) and the quantle level s controlled by ν. Recently, we establshed a new convex ethod n [9] by extendng the hnge loss n C-SVM to the pnball loss. The pnball loss L τ (u) s defned as u, u 0, L τ (u) = τu, u < 0, whch can be regarded as a generalzed l loss. Partcularly, when τ = 0, the pnball loss L τ (u) reduces to the hnge loss. When a postve τ s used, zng the pnball loss results n the quantle value. Ths lnk has been well studed n quantle regresson, see, e.g., [0] []. Motvated by ths lnk, the pnball loss wth a postve τ value was appled n classfcaton tasks and the related classfcaton ethod can be forulated as, w,b 2 w C ( L τ y (w T φ(x ) + b) ), (2) = Preprnt subtted to Elsever Septeber 8, 204

2 whch s called a support vector achne wth the pnball loss (pn-svm, [9]). Unlke ν-svm, pn-svm pushes the surfaces that defne the argn to quantle postons by penalzng also the correctly classfed saplng ponts. In classfcaton tasks, the pnball loss L τ has been proved to be calbrated,.e., the zer of the pnball loss has the sae sgn as Proby = + x} Proby = x}. The prelary experents reported n [9] llustrate the stablty to feature nose of pn-svm. A odel called sparse pn-svm has been establshed for enhancng the sparseness. The sparsty s obtaned by ntroducng the ε-zone to the pnball loss, whch results n the pnball loss wth an ε nsenstve zone, denoted by L ε τ (u): L ε τ (u) = u ε, u > ε, 0, ε τ u ε, τ(u + ε τ ), u < ε τ. (3) When a tranng pont falls nto the nterval [ ε τ, ε], the correspondng dual varable s zero. In Fg., we plot L ε τ(u) for several τ and ε values. When ε = 0, L ε τ(u) reduces to the pnball loss. Furtherore, f τ = 0, t reduces to the hnge loss. L ǫ τ(u) τ = 0.3, ε = 0 τ = 0., ε = u τ = 0, ε = 0 τ = 0.3, ε = 0.2 Fgure : The plots of the pnball loss wth an ε nsenstve zone. τ = 0, ε = 0 corresponds to the hnge loss and s dsplayed by the sold lne. When ε = 0, L ε τ (u) reduces to the pnball loss, as shown by the dashed lnes. The dotted lne gves the case τ = 0.3, ε = 0.2. Wth properly selected paraeters, pn-svms can perfor better than C-SVM. However, pn-svms currently lack fast tranng algorths, whch s the target of ths paper. Generally, we wll tran pn-svms n the dual space by sequental al optzaton (SMO). SMO s one of the ost popular ethods for solvng SVMs n the dual space. SMO s a knd of decoposton ethod and always uses the sallest possble workng set, whch contans two dual varables and can be updated very effectvely. For C-SVM, the correspondng SMO algorths can be found n [2] [7]. The convergence behavor of SMO has been also well studed n [8] [22]. In the followng, we wll frst nvestgate the dual proble of pn-svm and establsh a SMO ethod n Secton 2. Secton 3 gves the SMO algorth for sparse pn-svm. After that, we use the establshed SMO algorths to tran pn-svms on soe real-lfe probles n Secton 4. The nuercal experents confr the good property of pn- SVM wth the proposed ethods, whch wll be prosng tools n any applcatons, as suarzed n Secton Sequental Mnal Optzaton for pn-svm 2.. Dual proble of pn-svm The dual proble of pn-svm has been dscussed n [9]. In the followng, we wll frst ntroduce the dual proble and then nvestgate the proble structure. In the pral space, pn-svm (2) can be wrtten as the followng constraned quadratc prograg (QP) proble, w,b,ξ 2 wt w + C ξ = y [ w T φ(x ) + b ] ξ, =,...,, (4) y [ w T φ(x ) + b ] + τ ξ, =,...,, where C could be dfferent for dfferent observatons. The value of C s the weght on the loss related to (x, y ) and one can consder any pacts when settng t. For exaple, f (x, y ) s an outler or s heavly nose-polluted, one should choose a sall C. One notceable stuaton s the unbalanced probles, for whch the nubers of postve and negatve labels are not the sae. In ths case, we prefer to the followng typcal settng, C = C 0, : y =, C = #j:yj= #j:y j= C 0, : y =, where C 0 > 0 s a user-defned constant. In ths paper, we always use ths settng, whch gves equal weghts to both classes. The algorths proposed n the rest of the paper also work for other paraeter settngs. One can choose sutable C accordng to dfferent applcatons and pror knowledge. We ntroduce the Lagrange ultplers α, β 0, whch correspond to the constrants n (4). These varables should satsfy the followng copleentary slackness condton, α ( ξ y [ w T φ(x ) + b ]) = 0, =, 2,...,, [ β (y w T φ(x ) + b ] ) τ ξ = 0, =, 2,...,. (5) Consderng the Lagrangan of (4) and KKT condton, we get the followng dual proble for pn-svm, α,β 2 = j= (α β )y K j y j (α j β j ) (α β ) = y (α β ) = 0 (6) = α + τ β = C, =, 2,...,, α 0, β 0, =, 2,...,,

3 where K corresponds to a postve defnte kernel wth K j = K(x, x j ) = φ(x ) T φ(x j ). After obtanng the soluton of (6), we use the sgn of the followng functon to do classfcaton: f(x) = y (α β )K(x, x ) + b, = where b s coputed accordng to the copleentary slackness condtons y f(x ) =, j : α j 0, β j 0}. We further ntroduce = α β and elate the equalty constrant α + τ β = C. Then the equvalent forulaton of (6) can be posed as F() = 2 y K j y j j = j= = y = 0, (7) = τc C, =, 2,...,. We agan observe the relatonshp between pn-svm and C-SVM n the dual space: pn-svm wth τ = 0 reduces to C-SVM. The optzaton proble (7) s a quadratc prograg wth box constrants. Therefore, we can update a part of the dual varables and keep the others unchanged,.e., the sequental zaton optzaton (SMO, [2] [7]) s applcable to tran pn-svm (7). The constrant τc C can be equvalently transfored nto where A = τc, y =, C, y =, A y B, B = C, y =, τc, y =. For a gven, the ndces are dvded nto the followng two sets, I up = : y < B } and I = : y > A }. The subscrpts of the two sets ply that for a par of observatons Iup, j I, one can always fnd a sall postve scalar t such that the odfed soluton + t, j t reans feasble. Therefore, f s an optzer, the followng nequalty should be et where g = y y g y j g j, y j j K j j= stands for the dervatves of the objectve functon of (7) wth respect to α. Otherwse, f y g < y j gj, we can 3 update and j to obtan a strct decrease on the objectve value of (7). Snce the above nequalty holds for any I up () and j I (), a necessary condton of beng optal (7) can be wrtten as: and y = 0, = ρ R such that ax y g Iup ρ j I y j g j. (8) The correspondng condton for C-SVM has been wdely appled n the SMO technque, see, e.g., [20] and [4]. When τ vares, Iup and I are dfferent Dual varable update Sequental al optzaton starts fro an ntal feasble soluton of (7) and updates untl (8) s satsfed. The basc dea of SMO s that we only update the dual varables n a workng set and leave the other varables unchanged. The extree case s that only two varables are nvolved n each teraton, whch follows that there exsts an explct update forulaton. Denote the current soluton by old. Wthout any loss of generalzaton, we assue that Iup old, j I old are the varables n the workng set. That eans that the two eleents volate the optalty condton (8),.e., y g old > y j g old j. (9) Denote u j for a vector of whch the -th coponent s y, the j-th coponent s y j, and the others are zero. Then searchng along u j wll brng the proveent for (7). Specfcally, old + ζu j wth a suffcently sall postve ζ > 0 wll be stll feasble to (7). Moreover, F( old + ( ζu j ) F( old ) = ζ y g old y j gj old ) ζ2 2 (K + K jj 2K j ). (0) Fro ths forulaton and (9), we know that the objectve functon of (7) can be decreased strctly. The best ζ whch gves the largest decrease of the objectve functon s the zer of the followng proble, ζ 0 ζ (y g old y j g old j ζ 2 2 (K + K jj 2K j ) y g old + ζ B, y j g old j ζ A j. ) + For ths -densonal QP, the optal soluton can be explctly gven by ζ = B y g old, y j gj old A j, y } g old y j gj old. K + K jj 2K j

4 Correspondngly, the dual varables are updated to new = old + ζy and new j = old j ζy j. At the sae te, the gradent vector s updated to g new l = g old l ζy l K l + ζy l K jl, l =, 2,..., Workng set selecton and ntal soluton Above we dscussed the update process for pn-svm when Iup old, j Iold are chosen n the workng set. Before establshng the SMO for pn-svm, we frst consder the workng set selecton and ntal soluton generaton. The objectve functon of pn-svm (7) s the sae as that of C-SVM. Thus, the strateges of selectng two dual varables for C-SVM are applcable to pn-svm. The splest selecton s the axal volatng par, whch has been dscussed n [20]. For the current soluton old, we choose and j as = arg ax y l gl old l Iup old and j = arg y l gl old. () l I old Ths strategy s essentally the greedy choce based on the frst order approxaton of F( old +ζu j ) F( old ). One can also consder the second order workng set selecton proposed by [3]. That ethod s based on the second order expanson (0). Ths quadratc gan should be axzed wth the lnear constrants. To quckly and heurstcally fnd a good drecton, we gnore the constrant and then can fnd the axal gan easly: (y g old y j gj old ) 2 2(K + K jj 2K j ). (2) One can choose, j by axzng (2) but t needs parwse coparson. Instead, we frst use () to fnd and then only choose j accordng to (2), whch sply requres eleent coparson. Ths s also the strategy utlzed for C-SVM n LIBSVM [7]. For the ntalzaton, we use = τc. Recallng (5) for the settng of C, one can verfy that = τc gves a feasble soluton of (7). When τ = 0, the ntal soluton s = 0, whch s coonly used for C-SVM. If we know the optal soluton for pn-svm wth τ, denoted by (τ), then we can have a good guess for pn-svm wth τ 2. To observe the lnk between (τ) and (τ2), we llustrate a sple classfcaton task two oons n Fg.2, where the red crosses and the green stars correspond to observatons n class + and class, respectvely. We use pn-svm (7) to tran the classfer. In ths exaple, the sae radal bass functon (RBF) kernel and the sae regularzaton paraeter, but dfferent τ values are used. The surfaces x : f(x) =, +} are dsplayed n Fg.2. Accordng to the copleentary slackness condtons, we know that y f(x ) >, S = j : j = τc j }, y f(x ) = 0, S 0 = j : τc j < j < C j }, y f(x ) <, S + = j : j = C j }. 4 x(2) x() Fgure 2: Saplng ponts and classfcaton results of pn-svm. Ponts n class + and are shown by green stars and red crosses, respectvely. The surfaces x : f(x) = } (blue lnes) and x : f(x) = } (black lnes) for τ = 0, 0.05,0. are dsplayed by sold, dash-dotted, and dotted lnes, respectvely. In other words, the surfaces x : f(x) = ±} partton the tranng sets nto three parts. Most of the dual varables take the values τc or C. The left data are located n x : f(x) = +} or x : f(x) = }. Fro Fg.2, we observe that for any ponts, they are located n the sae part for dfferent τ. Fg.2 also llustrates that wth the ncreasng τ, the surfaces f(x ) = ± ove towards the decson boundary. Ths can be observed as well fro the pral for (2), of whch the optalty condton can be wrtten as the exstence of η [ τ, ] such that w j C y φ j (x )+τ y φ j (x ) η y φ j (x ) = 0, j. S S + S 0 Ths condton ples that generally a larger τ results n ore data fallng nto S. Therefore, f τ > τ 2 and the dfference s not bg, t s wth a hgh probablty that (τ2) = τ 2 C f (τ) = τ C. Followng fro ths dscusson, we suggest Algorth for the ntal soluton. By the proposed procedure, we fnd a new feasble soluton, whch s heurstcally sutable for τ 2. When tunng the paraeter τ, we need to tran pn-svm for a seres of τ values, for whch the above procedure can be appled. Now we gve the SMO algorth for pn-svm () n Algorth 2, where e s a pre-defned accuracy and s set to be 0 6 n ths paper. 3. SMO for Sparse pn-svm Pn-SVM can be regarded as an extenson to C-SVM va ntroducng flexblty on τ. Snce quantle dstances are consdered, pn-svm s nsenstve to feature nose and has shown better classfcaton accuracy over C-SVM. In pn-svm (7), the dual varables are categorzed nto three types: lower bounded support vectors ( = τc ), free support vectors ( τc < < C ), and upper bounded support vectors ( = C ). When τ = 0, pn-svm reduces to C-SVM. Correspondngly, the lower bounded support vectors are zero and C-SVM has sparseness. To pursue

5 Algorth : Intalzaton for pn-svm wth τ 2 fro (τ) Set S (τ ) := : (τ) = τ C }, S (τ ) + := : (τ) = C }; Let := τ 2 C, S (τ ) := C, S (τ ) + ; and Calculate the volaton v := = y ; f τ 2 > τ then repeat select fro : y = sgn(v)} S (τ ) + ; set := axc v, τ 2 C }; update v := ax0, v ( + τ 2 )C }; untl v = 0 ; else repeat select fro : y = sgn(v)} S (τ ) ; set := ax τ 2 C + v, C }; update v := ax0, v ( + τ 2 )C }; untl v = 0 ; end Return as the ntal soluton for pn-svm wth τ 2. Algorth 2: SMO for pn-svm Set := τc or use Algorth to generate ; Calculate g := y j= y j j K j and set A := τc, y = C, y = B C, y := = τc, y = ; repeat Iup := : y < B }, I := : y > A }; select := arg ax y l g l ; l Iup select j := arg ax l I calculate the update length ζ := (y g y lg l )2 2(K +K ll 2K l ) ; B y g, y j g j A j, y g y jg j K +K jj 2K j }; update := + y ζ and j := j + y j ζ, g l := g l ζy l K l + ζy l K jl, l =,...,; untl ax I up y g j I y j g j < e ; Calculate b := 2 (ax I up y g + j I y j g j ). sparseness for pn-svm wth a nonzero τ value, a loss functon wth an ε nsenstve zone was appled. Then a sparse pn-svm has been establshed n [9]. In the pral space, sparse pn-svm can be posed as w,b 2 w 2 + C = L ε τ ( y (w T φ(x ) + b) ), (3) where the pnball loss wth an ε nsenstve zone L ε τ(u) s defned n (3). The dual proble of (3) has been deduced n [9] and takes the followng for,,γ 2 y K j y j j ε = j= = = y = 0, (4) = γ 0, =,...,, τ(c γ ) C γ, =,...,. The possble range of the dual varable γ s 0 γ C. When γ takes value C, the correspondng wll be zero, whch brngs sparsty to pn-svm. Fro the objectve functon of (4), one can see that a large ε wll push γ close to C,.e., there are ore values zero. The last constrant n (4) can be vewed as a box constrant on and the box depends on another dual varable γ. Slarly to the dscusson on pn-svm (7), we can wrte τ(c γ ) C γ as where and A γ y B γ, A γ τ(c γ = ), y =, (C γ ), y =, B γ = C γ, y =, τ(c γ ), y =. Then for gven γ and, we can fnd the two followng sets, and I,γ up = : y < B γ or γ > 0}, I,γ = : y > A γ or γ > 0}. Here γ > 0 can guarantee that ± ζ s feasble for suffcently sall scalar ζ. Then, necessary condtons for, γ beng optal to (4) can be presented as follows: for a gven γ value, should satsfy: ax y g Iup,γ j I,γ γ y j gj, and y = 0; = for a gven value, γ should satsfy: γ = C + τ }, C. 5

6 Notce that n sparse pn-svm (4), the gradent g s dfferent fro that n pn-svm (6), snce there s one addtonal freedo on γ. Specfcally, there are three stuatons. If = C γ, then g = y y j j K j + ε. j= If = τ(c γ ), then we have g = y y j j K j ε τ. j= Otherwse,.e., τ(c γ ) < < C γ, we have g = y y j j K j. j= The above condtons are gven separately for and γ. For sparse pn-svm (4), and γ are coupled n the constrants. Hence these condtons are necessary but not suffcent. However, to pursue an effcent solvng ethod for (4), we apply the above necessary condton to choose two data ponts n a workng set. Then the selected dual varables are odfed and the others are unchanged. Slarly to pn-svm, the workng set for sparse pn- SVM (4) contans at least two data ponts. Suppose that, j are selected. Then to update,j, γ,j, we are to solve the followng QP proble,j,γ,j 2 K 2 + jy j K j y j j + 2 K jj 2 j + y y l l K l + j y j y l l K jl l =,j l =,j j εγ εγ j y + y j j = y l l, (5) γ 0, γ j 0, l =,j τ(c γ ) C γ, τ(c j γ j ) j C j γ j. When γ,j are fxed, (5) reduces to a 2-densonal QP wth one equalty constrant, whch has an explct soluton. Ths s the case for pn-svm (7). However, n sparse pn-svm, γ,j and,j are coupled and there s no explct soluton. Hence, we have to solve (5) to update,j, γ,j at each teraton. Solvng (5) decreases the objectve of (4). We should choose the reasonable workng set accordng to the gan of solvng (5). The gan s better than the case keepng γ,j unchanged. For the case γ,j fxed, the gan s (0), fro whch we can estate the gan for (5) and then select the workng set by the followng rule: j = arg ax l I,γ y lg up l, = arg ax l I,γ (y g y lg l )2 2(K +K ll 2K l ). 6 Ths selecton strategy s slar to that for pn-svm, but now t s dependent on γ. The ntal soluton for pn-svm = τc s also feasble to sparse pn-svm (4). Correspondngly, the ntal γ s set to be γ = C + τ, C }, whch s accordng to the necessary optal condton. Now the sequental al optzaton for sparse pn-svm (4) s suarzed n Algorth 3. Algorth 3: SMO for sparse pn-svm Set := τc and γ := C + τ, C } ; Calculate g := y j= y j j K j ; A γ τ(c γ := ), y = (C γ ), y = ; B γ := C γ, y = τ(c γ ), y = ; repeat I,γ up := : y < B γ or γ > 0}; I,γ := : y > A γ or γ > 0}; select := arg ax y l g l ; l Iup (y g y lg l )2 2(K +K ll 2K l ) ; select j := arg ax l I solve (5) to update,j, γ,j ; update A γ, Bγ, and g l, l =,..., ; untl ax I,γ y g up j I,γ y j g j < e ; ( ax I,γ Calculate b := 2 4. Nuercal Experents up y g + j I,γ y j g j ). In the above sectons, we gave the SMO algorths for tranng pn-svm (7) and sparse pn-svm (4). In the followng, we wll evaluate ther perforance on real-lfe data sets. There are two concerned aspects. Frst, we wll test whether SMO s effectve for tranng pn-svms. Second, wth an effectve tranng ethod, we can consder ore experents and support the theoretcal analyss n [9]. The sparsty of sparse pn-svm s also consdered. The data n these experents are loaded fro the UCI Repostory of Machne Learnng Datasets [23] and LIBSVM data sets [7]. For soe of these data, the tranng and test sets are provded. Otherwse, we randoly select observatons to tran the classfer and use the reanng for test. The proble denson n, the nuber of the tranng data, and the nuber of test data T are suarzed n Table. In pn-svm (7), we use the RBF kernel and apply Algorth 2 to tran the classfers wth dfferent τ values. Wth the data sze grows, cachng for the kernel atrx becoes larger. In our experents, when 5000, we calculate eleent K j only when needed, whch reduces the cachng but costs ore te. To ake a far coparson,

7 Table : Denson, Tranng Data and Test Data Sze nae n T nae n T Spect Pa Monk Breast Monk Splce Haberan Spabase Statlog Gude Monk Magc Ionosphere IJCNN Transfuson Cod RNA we use = τc as the ntal soluton. If the nuber of the tranng data s less than 0000, 0-fold crossvaldaton s utlzed to tune the regularzaton coeffcent C 0 and the bandwdth n the RBF kernel σ. Otherwse, we set C 0 = and tune σ only. The tranng and test process s repeated 0 tes. Then the average accuracy on test sets, the standard devaton, and the average coputng te are reported n Table 2. Table 2: Test Accuracy and Average Tranng Te Data τ = 0 τ = 0. τ = 0.3 τ = 0.5 Spect ± ± ± ± s 9.06 s 8.92 s 8.94 s Monk ± ± ± ± s 20.5 s 25.5 s 28.6 s Monk 8.97 ± ± ± ± s 22.6 s 24.2 s 27.0 s Haber ± ± ± ± s 24.4 s 24.9 s 24.6 s Statlog ± ± ± ± s 27.8 s 30.2 s 32.8 s Monk ± ± ± ± s 34.3 s 37. s 39.4 s Iono ± ± ± ± s 40.6 s 44.4 s 47.4 s Trans ± ± ± ± s 28.4 s 29.0 s 28.8 s Pa 74.4 ± ± ± ± 3.4 s 26 s 35 s 44 s Breast ± ± ± ± s 7.3 s 73.9 s 74.0 s Splce ± ± ± ± s 93.2 s 98.9 s 99.3 s Spab ± ± ± ± s 68 s 7 s 67 s Gude ± ± ± ± s 8 s 95 s 20 s Magc 85.0 ± ± ± ± s 29.0 s 30. s 30.3 s IJCNN ± ± ± ±. 47 s 22 s 23 s 209 s RNA ± ± ± ± s 4 s 24 s 4 s We also llustrate the scalablty of the proposed SMO algorth by plottng the tranng te for dfferent tranng data szes. In Fg.3 we plot the tranng te for data set IJCNN. Notce that there s a sudden change at = 5000, due to dfferent kernel coputaton strateges. Both Table 2 and Fg.3 llustrate that the proposed SMO ethod can tran pn-svm effectvely. For dfferent τ values, the coputatonal te s slar and s not onotonc wth respect to τ. In our ethod, pn-svm s traned n the dual space, whch corresponds to a QP wth box constrants τc C. One can observe that τ controls the sze of the feasble set. In two extree cases,.e., when the box s large enough or very sall, optal solutons can be obtaned easly. Therefore, though a larger τ s generally related to ore tranng te, the dfference s not sgnfcant. In soe applcatons, a larger τ even corresponds to less tranng te. Generally, the proposed te (s) te (s) (a) 0 5,000 0,000 20,000 30,000 40,000 (b) Fgure 3: Tranng te of Algorth 2 (τ = 0.) for IJCNN for dfferent tranng data szes. (a) < 5000; (b) SMO for pn-svm s effectve and t takes slar tranng te as SMO for C-SVM. Wth a properly selected τ, pn-svm provdes better classfcaton accuracy over C-SVM. But the sparseness s lost. If the proble sze s not too large and sparseness s not the an target, then fndng a sutable τ s eanngful for provng the classfcaton accuracy. Moreover, we can use sparse pn-svm (4) to enhance the sparsty. In the followng, we set τ = 0. and apply Algorth 3 for several dfferent ε values. The tranng and test process s slar to the prevous experent, except that the paraeters for sparse pn-svm are tuned based on pn-svm, snce Algorth 3 costs ore te than Algorth 2. In practce, f the allowed te s not strct, one can tune the paraeters based on sparse pn-svm and prove the perforance further. The average classfcaton accuracy, the nuber of support vectors (n brackets), and the tranng te are reported n Table 3, where the results of C-SVM are gven as well for reference. Copared wth pn-svm (7), sparse pn-svm (4) enhances the sparsty, but takes ore tranng te. In Algorth 3, the update forulaton nvolves a 4-densonal QP proble. Though t can be solved effectvely, ts coputaton te s larger than the explct update forulaton n Algorth 2. Roughly, Algorth 3 needs 0 tes ore than Algorth 2. In C-SVM, the ponts wth y f(x ) < are related to zero dual varables and so are the ponts wth ε τ < y f(x ) < ε n sparse pn-svm. Thus, the results of C-SVM are generally ore sparse. But when the feature nose s heavy, t s worthy consderng Algorth 3 to tran sparse pn-svm. 7

8 Table 3: Test Accuracy, Nuber of Nonzero Dual Varables, and Tranng Te for Sparse pn-svm (τ = 0.) Data C-SVM ε = 0.05 ε = 0.0 ε = 0.20 Spect (69) (66) (62) (60) 8.96 s 08 s 75.3 s 90.4 s Monk (83) (97) 9.44 (87) (86) 6.5 s 43 s 3 s 27 s Monk 8.97 (68) 79.5 (00) (93) (88) 8.8 s 26 s 39 s 27 s Haber (40) (40) (39) (37) 26.5 s 54 s 50 s 77 s Statlog (99) 83. (22) 82. (8) 8.4 (0) 24.2 s 55 s 43 s 39 s Monk (0) (07) (98) 8.45 (90) 29.4 s 246 s 277 s 253 s Iono (99) (09) (98) (87) 32.8 s 277 s 243 s 243 s Trans (286) (272) (26) (95) 34.5 s 250 s 252 s 250 s Pa 74.4 (337) 74.0 (354) 7.29 (346) (336) s 535 s 502 s 486 s Breast (89) (37) (26) (99) 57.4 s 445 s 469 s 483 s Splce (27) 83. (392) (322) (234) 02 s 749 s 652 s 659 s Spab (290) 9.28 (906) 9.2 (864) 9.20 (780) 200 s 74 s 755 s 697 s Gude (345) (208) (684) (203) 58 s 2.74 s 2.53 s 2.34 s 5. Concluson In ths paper, sequental al optzaton has been establshed for the support vector achne wth the pnball loss. Snce pn-svm has the sae proble structure as C-SVM, the correspondng SMO s related to that for C- SVM. We nvestgated the detals and pleented SMO for pn-svm. The SMO for tranng sparse pn-svm was gven as well. Then the proposed algorths were evaluated on nuercal experents, showng the effectveness of tranng pn-svms. The proposed SMO algorths ake pn-svms prosng tools n real-lfe applcaton, especally when the data are corrupted by feature nose. Acknowledgent The authors would lke to thank Prof. Chh-Jen Ln n Natonal Tawan Unversty for encouragng us to establsh the SMO algorth for pn-svm. The authors are grateful to the anonyous revewers for helpful coents. [7] H. Xu, C. Caraans, and S. Mannor. Robustness and regularzaton of support vector achnes. The Journal of Machne Learnng Research, 0:485 50, [8] B. Schölkopf, A.J. Sola, R.C. Wllason, and P.L. Bartlett. New support vector algorths. Neural Coputaton, 2(5): , [9] X. Huang, L. Sh, and J.A.K. Suykens. Support vector achne classfer wth pnball loss. IEEE Transactons on Pattern Analyss and Machne Intellgence, 36(5): , 204. [0] R. Koenker. Quantle Regresson. Cabrdge Unversty Press, [] I. Stenwart and A. Chrstann. Estatng condtonal quantles wth the help of the pnball loss. Bernoull, 7(): 2 225, 20. [2] J.C. Platt. Fast tranng of support vector achnes usng sequental al optzaton. In Advances n kernel ethods Support Vector Learnng, pages MIT Press, 999. [3] R.E. Fan, P.H. Chen, and C.J. Ln. Workng set selecton usng second order nforaton for tranng support vector achnes. The Journal of Machne Learnng Research, 6:889 98, [4] L. Bottou and C.-J. Ln. Support vector achne solvers. n Large Scale Kernel achnes, pages MIT Press, [5] Y. Tor and S. Abe. Decoposton technques for tranng lnear prograg support vector achnes. Neurocoputng, 72(4): , [6] J. Shawe-Taylor and S. Sun. A revew of optzaton ethodologes n support vector achnes. Neurocoputng, 74(7): , 20. [7] C.C. Chang and C.J. Ln. LIBSVM: a lbrary for support vector achnes. ACM Transactons on Intellgent Systes and Technology, 2(3):27, 20. [8] C.C. Chang, C.W. Hsu, and C.J. Ln. The analyss of decoposton ethods for support vector achnes. IEEE Transactons on Neural Networks, (4): , [9] C.J. Ln. On the convergence of the decoposton ethod for support vector achnes. IEEE Transactons on Neural Networks, 2(6): , 200. [20] S.S. Keerth and E.G. Glbert. Convergence of a generalzed SMO algorth for SVM classfer desgn. Machne Learnng, 46 (-3):35 360, [2] D. Hush, P. Kelly, C. Scovel, and I. Stenwart. QP algorths wth guaranteed accuracy and run te for support vector achnes. The Journal of Machne Learnng Research, 7: , [22] J. López and J.R. Dorronsoro. Sple proof of convergence of the SMO algorth for dfferent SVM varants. IEEE Transactons on Neural Networks and Learnng Systes, 23(7):42 47, 202. [23] A. Frank and A. Asuncon. UCI achne learnng repostory, 200. References [] C. Cortes and V. Vapnk, Support-vector networks. Machne Learnng, 20: , 995. [2] V. Vapnk. Statstcal Learnng Theory. Wley, New York, 998. [3] X. Zhang. Usng class-center vectors to buld support vector achnes. In Proceedngs of the IEEE Sgnal Processng Socety Workshop, pages 3. IEEE, 999. [4] J. B and T. Zhang. Support vector classfcaton wth nput data uncertanty. In Advances n Neural Inforaton Processng Systes, volue 7, page 6. MIT Press, [5] G.R.G. Lanckret, L.E. Ghaou, C. Bhattacharyya, and M.I. Jordan. A robust ax approach to classfcaton. The Journal of Machne Learnng Research, 3: , [6] P.K. Shvasway, C. Bhattacharyya, and A.J. Sola. Second order cone prograg approaches for handlng ssng and uncertan data. The Journal of Machne Learnng Research, 7:283 34,

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

System in Weibull Distribution

System in Weibull Distribution Internatonal Matheatcal Foru 4 9 no. 9 94-95 Relablty Equvalence Factors of a Seres-Parallel Syste n Webull Dstrbuton M. A. El-Dacese Matheatcs Departent Faculty of Scence Tanta Unversty Tanta Egypt eldacese@yahoo.co

More information

Computational and Statistical Learning theory Assignment 4

Computational and Statistical Learning theory Assignment 4 Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Recap: the SVM problem

Recap: the SVM problem Machne Learnng 0-70/5-78 78 Fall 0 Advanced topcs n Ma-Margn Margn Learnng Erc Xng Lecture 0 Noveber 0 Erc Xng @ CMU 006-00 Recap: the SVM proble We solve the follong constraned opt proble: a s.t. J 0

More information

1 Definition of Rademacher Complexity

1 Definition of Rademacher Complexity COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #9 Scrbe: Josh Chen March 5, 2013 We ve spent the past few classes provng bounds on the generalzaton error of PAClearnng algorths for the

More information

Applied Mathematics Letters

Applied Mathematics Letters Appled Matheatcs Letters 2 (2) 46 5 Contents lsts avalable at ScenceDrect Appled Matheatcs Letters journal hoepage: wwwelseverco/locate/al Calculaton of coeffcents of a cardnal B-splne Gradr V Mlovanovć

More information

COS 511: Theoretical Machine Learning

COS 511: Theoretical Machine Learning COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

CHAPTER 6 CONSTRAINED OPTIMIZATION 1: K-T CONDITIONS

CHAPTER 6 CONSTRAINED OPTIMIZATION 1: K-T CONDITIONS Chapter 6: Constraned Optzaton CHAPER 6 CONSRAINED OPIMIZAION : K- CONDIIONS Introducton We now begn our dscusson of gradent-based constraned optzaton. Recall that n Chapter 3 we looked at gradent-based

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

CHAPTER 7 CONSTRAINED OPTIMIZATION 1: THE KARUSH-KUHN-TUCKER CONDITIONS

CHAPTER 7 CONSTRAINED OPTIMIZATION 1: THE KARUSH-KUHN-TUCKER CONDITIONS CHAPER 7 CONSRAINED OPIMIZAION : HE KARUSH-KUHN-UCKER CONDIIONS 7. Introducton We now begn our dscusson of gradent-based constraned optzaton. Recall that n Chapter 3 we looked at gradent-based unconstraned

More information

Machine Learning. What is a good Decision Boundary? Support Vector Machines

Machine Learning. What is a good Decision Boundary? Support Vector Machines Machne Learnng 0-70/5 70/5-78 78 Sprng 200 Support Vector Machnes Erc Xng Lecture 7 March 5 200 Readng: Chap. 6&7 C.B book and lsted papers Erc Xng @ CMU 2006-200 What s a good Decson Boundar? Consder

More information

What is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner.

What is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner. (C) 998 Gerald B Sheblé, all rghts reserved Lnear Prograng Introducton Contents I. What s LP? II. LP Theor III. The Splex Method IV. Refneents to the Splex Method What s LP? LP s an optzaton technque that

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 1998-2014. All Rghts Reserved. Created: July 15, 1999 Last Modfed: February 9, 2008 Contents 1 Lnear Fttng

More information

XII.3 The EM (Expectation-Maximization) Algorithm

XII.3 The EM (Expectation-Maximization) Algorithm XII.3 The EM (Expectaton-Maxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles

More information

Gradient Descent Learning and Backpropagation

Gradient Descent Learning and Backpropagation Artfcal Neural Networks (art 2) Chrstan Jacob Gradent Descent Learnng and Backpropagaton CSC 533 Wnter 200 Learnng by Gradent Descent Defnton of the Learnng roble Let us start wth the sple case of lnear

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 1998-2015. All Rghts Reserved. Created: July 15, 1999 Last Modfed: January 5, 2015 Contents 1 Lnear Fttng

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

Machine Learning. Support Vector Machines. Eric Xing. Lecture 4, August 12, Reading: Eric CMU,

Machine Learning. Support Vector Machines. Eric Xing. Lecture 4, August 12, Reading: Eric CMU, Machne Learnng Support Vector Machnes Erc Xng Lecture 4 August 2 200 Readng: Erc Xng @ CMU 2006-200 Erc Xng @ CMU 2006-200 2 What s a good Decson Boundar? Wh e a have such boundares? Irregular dstrbuton

More information

An Optimal Bound for Sum of Square Roots of Special Type of Integers

An Optimal Bound for Sum of Square Roots of Special Type of Integers The Sxth Internatonal Syposu on Operatons Research and Its Applcatons ISORA 06 Xnang, Chna, August 8 12, 2006 Copyrght 2006 ORSC & APORC pp. 206 211 An Optal Bound for Su of Square Roots of Specal Type

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

y new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion)

y new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion) Feature Selecton: Lnear ransforatons new = M x old Constrant Optzaton (nserton) 3 Proble: Gven an objectve functon f(x) to be optzed and let constrants be gven b h k (x)=c k, ovng constants to the left,

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS Darusz Bskup 1. Introducton The paper presents a nonparaetrc procedure for estaton of an unknown functon f n the regresson odel y = f x + ε = N. (1) (

More information

COMP th April, 2007 Clement Pang

COMP th April, 2007 Clement Pang COMP 540 12 th Aprl, 2007 Cleent Pang Boostng Cobnng weak classers Fts an Addtve Model Is essentally Forward Stagewse Addtve Modelng wth Exponental Loss Loss Functons Classcaton: Msclasscaton, Exponental,

More information

Xiangwen Li. March 8th and March 13th, 2001

Xiangwen Li. March 8th and March 13th, 2001 CS49I Approxaton Algorths The Vertex-Cover Proble Lecture Notes Xangwen L March 8th and March 3th, 00 Absolute Approxaton Gven an optzaton proble P, an algorth A s an approxaton algorth for P f, for an

More information

On the number of regions in an m-dimensional space cut by n hyperplanes

On the number of regions in an m-dimensional space cut by n hyperplanes 6 On the nuber of regons n an -densonal space cut by n hyperplanes Chungwu Ho and Seth Zeran Abstract In ths note we provde a unfor approach for the nuber of bounded regons cut by n hyperplanes n general

More information

Slobodan Lakić. Communicated by R. Van Keer

Slobodan Lakić. Communicated by R. Van Keer Serdca Math. J. 21 (1995), 335-344 AN ITERATIVE METHOD FOR THE MATRIX PRINCIPAL n-th ROOT Slobodan Lakć Councated by R. Van Keer In ths paper we gve an teratve ethod to copute the prncpal n-th root and

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

PROBABILITY AND STATISTICS Vol. III - Analysis of Variance and Analysis of Covariance - V. Nollau ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE

PROBABILITY AND STATISTICS Vol. III - Analysis of Variance and Analysis of Covariance - V. Nollau ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE V. Nollau Insttute of Matheatcal Stochastcs, Techncal Unversty of Dresden, Gerany Keywords: Analyss of varance, least squares ethod, odels wth fxed effects,

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Designing Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate

Designing Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate The Frst Internatonal Senar on Scence and Technology, Islac Unversty of Indonesa, 4-5 January 009. Desgnng Fuzzy Te Seres odel Usng Generalzed Wang s ethod and Its applcaton to Forecastng Interest Rate

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

Preference and Demand Examples

Preference and Demand Examples Dvson of the Huantes and Socal Scences Preference and Deand Exaples KC Border October, 2002 Revsed Noveber 206 These notes show how to use the Lagrange Karush Kuhn Tucker ultpler theores to solve the proble

More information

Denote the function derivatives f(x) in given points. x a b. Using relationships (1.2), polynomials (1.1) are written in the form

Denote the function derivatives f(x) in given points. x a b. Using relationships (1.2), polynomials (1.1) are written in the form SET OF METHODS FO SOUTION THE AUHY POBEM FO STIFF SYSTEMS OF ODINAY DIFFEENTIA EUATIONS AF atypov and YuV Nulchev Insttute of Theoretcal and Appled Mechancs SB AS 639 Novosbrs ussa Introducton A constructon

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

The Parity of the Number of Irreducible Factors for Some Pentanomials

The Parity of the Number of Irreducible Factors for Some Pentanomials The Party of the Nuber of Irreducble Factors for Soe Pentanoals Wolfra Koepf 1, Ryul K 1 Departent of Matheatcs Unversty of Kassel, Kassel, F. R. Gerany Faculty of Matheatcs and Mechancs K Il Sung Unversty,

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Machine Learning. Support Vector Machines. Eric Xing , Fall Lecture 9, October 6, 2015

Machine Learning. Support Vector Machines. Eric Xing , Fall Lecture 9, October 6, 2015 Machne Learnng 0-70 Fall 205 Support Vector Machnes Erc Xng Lecture 9 Octoer 6 205 Readng: Chap. 6&7 C.B ook and lsted papers Erc Xng @ CMU 2006-205 What s a good Decson Boundar? Consder a nar classfcaton

More information

Finite Fields and Their Applications

Finite Fields and Their Applications Fnte Felds and Ther Applcatons 5 009 796 807 Contents lsts avalable at ScenceDrect Fnte Felds and Ther Applcatons www.elsever.co/locate/ffa Typcal prtve polynoals over nteger resdue rngs Tan Tan a, Wen-Feng

More information

Three Algorithms for Flexible Flow-shop Scheduling

Three Algorithms for Flexible Flow-shop Scheduling Aercan Journal of Appled Scences 4 (): 887-895 2007 ISSN 546-9239 2007 Scence Publcatons Three Algorths for Flexble Flow-shop Schedulng Tzung-Pe Hong, 2 Pe-Yng Huang, 3 Gwoboa Horng and 3 Chan-Lon Wang

More information

Several generation methods of multinomial distributed random number Tian Lei 1, a,linxihe 1,b,Zhigang Zhang 1,c

Several generation methods of multinomial distributed random number Tian Lei 1, a,linxihe 1,b,Zhigang Zhang 1,c Internatonal Conference on Appled Scence and Engneerng Innovaton (ASEI 205) Several generaton ethods of ultnoal dstrbuted rando nuber Tan Le, a,lnhe,b,zhgang Zhang,c School of Matheatcs and Physcs, USTB,

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Support Vector Machines

Support Vector Machines CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at

More information

1 Review From Last Time

1 Review From Last Time COS 5: Foundatons of Machne Learnng Rob Schapre Lecture #8 Scrbe: Monrul I Sharf Aprl 0, 2003 Revew Fro Last Te Last te, we were talkng about how to odel dstrbutons, and we had ths setup: Gven - exaples

More information

Study of Classification Methods Based on Three Learning Criteria and Two Basis Functions

Study of Classification Methods Based on Three Learning Criteria and Two Basis Functions Study of Classfcaton Methods Based on hree Learnng Crtera and wo Bass Functons Jae Kyu Suhr Abstract - hs paper nvestgates several classfcaton ethods based on the three learnng crtera and two bass functons.

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

4 Column generation (CG) 4.1 Basics of column generation. 4.2 Applying CG to the Cutting-Stock Problem. Basic Idea of column generation

4 Column generation (CG) 4.1 Basics of column generation. 4.2 Applying CG to the Cutting-Stock Problem. Basic Idea of column generation 4 Colun generaton (CG) here are a lot of probles n nteger prograng where even the proble defnton cannot be effcently bounded Specfcally, the nuber of coluns becoes very large herefore, these probles are

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Collaborative Filtering Recommendation Algorithm

Collaborative Filtering Recommendation Algorithm Vol.141 (GST 2016), pp.199-203 http://dx.do.org/10.14257/astl.2016.141.43 Collaboratve Flterng Recoendaton Algorth Dong Lang Qongta Teachers College, Haou 570100, Chna, 18689851015@163.co Abstract. Ths

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

Integral Transforms and Dual Integral Equations to Solve Heat Equation with Mixed Conditions

Integral Transforms and Dual Integral Equations to Solve Heat Equation with Mixed Conditions Int J Open Probles Copt Math, Vol 7, No 4, Deceber 214 ISSN 1998-6262; Copyrght ICSS Publcaton, 214 www-csrsorg Integral Transfors and Dual Integral Equatons to Solve Heat Equaton wth Mxed Condtons Naser

More information

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012 Support Vector Machnes Je Tang Knowledge Engneerng Group Department of Computer Scence and Technology Tsnghua Unversty 2012 1 Outlne What s a Support Vector Machne? Solvng SVMs Kernel Trcks 2 What s a

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

1. Statement of the problem

1. Statement of the problem Volue 14, 010 15 ON THE ITERATIVE SOUTION OF A SYSTEM OF DISCRETE TIMOSHENKO EQUATIONS Peradze J. and Tsklaur Z. I. Javakhshvl Tbls State Uversty,, Uversty St., Tbls 0186, Georga Georgan Techcal Uversty,

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Lagrange Multipliers Kernel Trick

Lagrange Multipliers Kernel Trick Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x

More information

Gadjah Mada University, Indonesia. Yogyakarta State University, Indonesia Karangmalang Yogyakarta 55281

Gadjah Mada University, Indonesia. Yogyakarta State University, Indonesia Karangmalang Yogyakarta 55281 Reducng Fuzzy Relatons of Fuzzy Te Seres odel Usng QR Factorzaton ethod and Its Applcaton to Forecastng Interest Rate of Bank Indonesa Certfcate Agus aan Abad Subanar Wdodo 3 Sasubar Saleh 4 Ph.D Student

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Solving Fuzzy Linear Programming Problem With Fuzzy Relational Equation Constraint

Solving Fuzzy Linear Programming Problem With Fuzzy Relational Equation Constraint Intern. J. Fuzz Maeatcal Archve Vol., 0, -0 ISSN: 0 (P, 0 0 (onlne Publshed on 0 Septeber 0 www.researchasc.org Internatonal Journal of Solvng Fuzz Lnear Prograng Proble W Fuzz Relatonal Equaton Constrant

More information

INPUT-OUTPUT PAIRING OF MULTIVARIABLE PREDICTIVE CONTROL

INPUT-OUTPUT PAIRING OF MULTIVARIABLE PREDICTIVE CONTROL INPUT-OUTPUT PAIRING OF MULTIVARIABLE PREDICTIVE CONTROL Lng-Cong Chen #, Pu Yuan*, Gu-L Zhang* *Unversty of Petroleu, P.O. Box 902 Beng 00083, Chna # GAIN Tech Co., P.O. Box 902ext.79, Beng 00083, Chna

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

Power law and dimension of the maximum value for belief distribution with the max Deng entropy Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

LECTURE :FACTOR ANALYSIS

LECTURE :FACTOR ANALYSIS LCUR :FACOR ANALYSIS Rta Osadchy Based on Lecture Notes by A. Ng Motvaton Dstrbuton coes fro MoG Have suffcent aount of data: >>n denson Use M to ft Mture of Gaussans nu. of tranng ponts If

More information

Optimal Marketing Strategies for a Customer Data Intermediary. Technical Appendix

Optimal Marketing Strategies for a Customer Data Intermediary. Technical Appendix Optal Marketng Strateges for a Custoer Data Interedary Techncal Appendx oseph Pancras Unversty of Connectcut School of Busness Marketng Departent 00 Hllsde Road, Unt 04 Storrs, CT 0669-04 oseph.pancras@busness.uconn.edu

More information

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013 ISSN: 2277-375 Constructon of Trend Free Run Orders for Orthogonal rrays Usng Codes bstract: Sometmes when the expermental runs are carred out n a tme order sequence, the response can depend on the run

More information

Solutions for Homework #9

Solutions for Homework #9 Solutons for Hoewor #9 PROBEM. (P. 3 on page 379 n the note) Consder a sprng ounted rgd bar of total ass and length, to whch an addtonal ass s luped at the rghtost end. he syste has no dapng. Fnd the natural

More information

AN ANALYSIS OF A FRACTAL KINETICS CURVE OF SAVAGEAU

AN ANALYSIS OF A FRACTAL KINETICS CURVE OF SAVAGEAU AN ANALYI OF A FRACTAL KINETIC CURE OF AAGEAU by John Maloney and Jack Hedel Departent of Matheatcs Unversty of Nebraska at Oaha Oaha, Nebraska 688 Eal addresses: aloney@unoaha.edu, jhedel@unoaha.edu Runnng

More information

Feature Selection in Multi-instance Learning

Feature Selection in Multi-instance Learning The Nnth Internatonal Symposum on Operatons Research and Its Applcatons (ISORA 10) Chengdu-Juzhagou, Chna, August 19 23, 2010 Copyrght 2010 ORSC & APORC, pp. 462 469 Feature Selecton n Mult-nstance Learnng

More information

Our focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e.

Our focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e. SSTEM MODELLIN In order to solve a control syste proble, the descrptons of the syste and ts coponents ust be put nto a for sutable for analyss and evaluaton. The followng ethods can be used to odel physcal

More information

SimpleMKL. Abstract. Alain Rakotomamonjy LITIS EA 4108 Université de Rouen Saint Etienne du Rouvray, France. Francis R. Bach

SimpleMKL. Abstract. Alain Rakotomamonjy LITIS EA 4108 Université de Rouen Saint Etienne du Rouvray, France. Francis R. Bach Journal of Machne Learnng Research X (28) 1-34 Subtted 1/8; Revsed 8/8; Publshed XX/XX SpleMKL Alan Rakotoaonjy LITIS EA 418 Unversté de Rouen 768 Sant Etenne du Rouvray, France alan.rakotoaonjy@nsa-rouen.fr

More information

Nested Support Vector Machines

Nested Support Vector Machines 1 Nested Support Vector Machnes *Gyen Lee, Student Meber, IEEE, and Clayton Scott, Meber, IEEE Abstract One-class and cost-senstve support vector achnes (SVMs) are state-of-the-art achne learnng ethods

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

17 Support Vector Machines

17 Support Vector Machines 17 We now dscuss an nfluental and effectve classfcaton algorthm called (SVMs). In addton to ther successes n many classfcaton problems, SVMs are responsble for ntroducng and/or popularzng several mportant

More information

Canonical transformations

Canonical transformations Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Modified parallel multisplitting iterative methods for non-hermitian positive definite systems

Modified parallel multisplitting iterative methods for non-hermitian positive definite systems Adv Coput ath DOI 0.007/s0444-0-9262-8 odfed parallel ultsplttng teratve ethods for non-hertan postve defnte systes Chuan-Long Wang Guo-Yan eng Xue-Rong Yong Receved: Septeber 20 / Accepted: 4 Noveber

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Scattering by a perfectly conducting infinite cylinder

Scattering by a perfectly conducting infinite cylinder Scatterng by a perfectly conductng nfnte cylnder Reeber that ths s the full soluton everywhere. We are actually nterested n the scatterng n the far feld lt. We agan use the asyptotc relatonshp exp exp

More information