Sequential Minimal Optimization for SVM with Pinball Loss
|
|
- Erin Hines
- 5 years ago
- Views:
Transcription
1 Sequental Mnal Optzaton for SVM wth Pnball Loss Xaoln Huang a,, Le Sh b, Johan A.K. Suykens a a KU Leuven, Departent of Electrcal Engneerng (ESAT-STADIUS), B-300 Leuven, Belgu b School of Matheatcal Scences, Fudan Unversty, Shangha, , P.R. Chna Abstract To pursue the nsenstvty to feature nose and the stablty to re-saplng, a new type of support vector achne (SVM) has been establshed va replacng the hnge loss n the classcal SVM by the pnball loss and was hence called a pn-svm. Though a dfferent loss functon s used, pn-svm has a slar structure as the classcal SVM. Specfcally, the dual proble of pn-svm s a quadratc prograg proble wth box constrants, for whch the sequental al optzaton (SMO) technque s applcable. In ths paper, we establsh SMO algorths for pn-svm and ts sparse verson. The nuercal experents on real-lfe data sets llustrate both the good perforance of pn-svms and the effectveness of the establshed SMO ethods. Keywords: support vector achne, pnball loss, sequental al optzaton. Introducton Snce proposed n [] [2], the support vector achne (SVM) has been wdely appled and well studed, because of ts fundaental statstcal property and good generalzaton capablty. The basc dea of SVM s to axze the argn between two classes by zng the regularzaton ter. The argn s classcally related to the closest ponts of two sets, snce the hnge loss s zed. For a gven saple set z = x, y } =, where x R n, y, +}, the SVM wth the hnge loss (C- SVM) n the pral space has the followng for, w,b 2 w C = L hnge ( y (w T φ(x ) + b) ), () where φ(x) s a feature appng, L hnge (u) = ax0, u} s the hnge loss, and C s the trade-off paraeter between Ths work was supported: EU: The research leadng to these results has receved fundng fro the European Research Councl under the European Unon s Seventh Fraework Prograe (FP7/ ) / ERC AdG A-DATADRIVE-B (290923). Ths paper reflects only the authors vews, the Unon s not lable for any use that ay be ade of the contaned nforaton. Research Councl KUL: GOA/0/09 MaNet, CoE PFV/0/002 (OPTEC), BIL2/T; PhD/Postdoc grants Flesh Governent: FWO: projects: G (Structured systes), G.0884N (Tensor based data slarty); PhD/Postdoc grants IWT: projects: SBO POM (0003); PhD/Postdoc grants Mnds Medcal Inforaton Technologes SBO 204 Belgan Federal Scence Polcy Offce: IUAP P7/9 (DYSCO, Dynacal systes, control and optzaton, ). L. Sh s also supported by the Natonal Natural Scence Foundaton of Chna (20079) and the Fundaental Research Funds for the Central Unverstes of Chna ( , ). Johan Suykens s a professor at KU Leuven, Belgu. Correspondng author. Eal addresses: huangxl06@als.tsnghua.edu.cn (Xaoln Huang), lesh@fudan.edu.cn (Le Sh), johan.suykens@esat.kuleuven.be (Johan A.K. Suykens) the argn wdth and sclassfcaton loss. Snce the dstance between the closest ponts s easly affected by the nose on feature x, the classfer traned by C-SVM () s senstve to feature nose and unstable to re-saplng. Ths phenoenon has been observed by any researchers and soe technques have been desgned, see, e.g., [3] [7]. An attractve ethod for enhancng the stablty to feature nose s to change the closest dstance easureent to the quantle dstance. However, axzng the quantle dstance s non-convex. The well-known ν-support vector achne (ν-svm, [8]) can be regarded as a convex approach for axzng the quantle dstance and has been successfully appled. In ν-svm, the argn between the surfaces x : yf(x) = ρ} s axzed. Mnzng the hnge loss together wth an addtonal ter νρ pushes ρ to be the quantle value of y f(x ) and the quantle level s controlled by ν. Recently, we establshed a new convex ethod n [9] by extendng the hnge loss n C-SVM to the pnball loss. The pnball loss L τ (u) s defned as u, u 0, L τ (u) = τu, u < 0, whch can be regarded as a generalzed l loss. Partcularly, when τ = 0, the pnball loss L τ (u) reduces to the hnge loss. When a postve τ s used, zng the pnball loss results n the quantle value. Ths lnk has been well studed n quantle regresson, see, e.g., [0] []. Motvated by ths lnk, the pnball loss wth a postve τ value was appled n classfcaton tasks and the related classfcaton ethod can be forulated as, w,b 2 w C ( L τ y (w T φ(x ) + b) ), (2) = Preprnt subtted to Elsever Septeber 8, 204
2 whch s called a support vector achne wth the pnball loss (pn-svm, [9]). Unlke ν-svm, pn-svm pushes the surfaces that defne the argn to quantle postons by penalzng also the correctly classfed saplng ponts. In classfcaton tasks, the pnball loss L τ has been proved to be calbrated,.e., the zer of the pnball loss has the sae sgn as Proby = + x} Proby = x}. The prelary experents reported n [9] llustrate the stablty to feature nose of pn-svm. A odel called sparse pn-svm has been establshed for enhancng the sparseness. The sparsty s obtaned by ntroducng the ε-zone to the pnball loss, whch results n the pnball loss wth an ε nsenstve zone, denoted by L ε τ (u): L ε τ (u) = u ε, u > ε, 0, ε τ u ε, τ(u + ε τ ), u < ε τ. (3) When a tranng pont falls nto the nterval [ ε τ, ε], the correspondng dual varable s zero. In Fg., we plot L ε τ(u) for several τ and ε values. When ε = 0, L ε τ(u) reduces to the pnball loss. Furtherore, f τ = 0, t reduces to the hnge loss. L ǫ τ(u) τ = 0.3, ε = 0 τ = 0., ε = u τ = 0, ε = 0 τ = 0.3, ε = 0.2 Fgure : The plots of the pnball loss wth an ε nsenstve zone. τ = 0, ε = 0 corresponds to the hnge loss and s dsplayed by the sold lne. When ε = 0, L ε τ (u) reduces to the pnball loss, as shown by the dashed lnes. The dotted lne gves the case τ = 0.3, ε = 0.2. Wth properly selected paraeters, pn-svms can perfor better than C-SVM. However, pn-svms currently lack fast tranng algorths, whch s the target of ths paper. Generally, we wll tran pn-svms n the dual space by sequental al optzaton (SMO). SMO s one of the ost popular ethods for solvng SVMs n the dual space. SMO s a knd of decoposton ethod and always uses the sallest possble workng set, whch contans two dual varables and can be updated very effectvely. For C-SVM, the correspondng SMO algorths can be found n [2] [7]. The convergence behavor of SMO has been also well studed n [8] [22]. In the followng, we wll frst nvestgate the dual proble of pn-svm and establsh a SMO ethod n Secton 2. Secton 3 gves the SMO algorth for sparse pn-svm. After that, we use the establshed SMO algorths to tran pn-svms on soe real-lfe probles n Secton 4. The nuercal experents confr the good property of pn- SVM wth the proposed ethods, whch wll be prosng tools n any applcatons, as suarzed n Secton Sequental Mnal Optzaton for pn-svm 2.. Dual proble of pn-svm The dual proble of pn-svm has been dscussed n [9]. In the followng, we wll frst ntroduce the dual proble and then nvestgate the proble structure. In the pral space, pn-svm (2) can be wrtten as the followng constraned quadratc prograg (QP) proble, w,b,ξ 2 wt w + C ξ = y [ w T φ(x ) + b ] ξ, =,...,, (4) y [ w T φ(x ) + b ] + τ ξ, =,...,, where C could be dfferent for dfferent observatons. The value of C s the weght on the loss related to (x, y ) and one can consder any pacts when settng t. For exaple, f (x, y ) s an outler or s heavly nose-polluted, one should choose a sall C. One notceable stuaton s the unbalanced probles, for whch the nubers of postve and negatve labels are not the sae. In ths case, we prefer to the followng typcal settng, C = C 0, : y =, C = #j:yj= #j:y j= C 0, : y =, where C 0 > 0 s a user-defned constant. In ths paper, we always use ths settng, whch gves equal weghts to both classes. The algorths proposed n the rest of the paper also work for other paraeter settngs. One can choose sutable C accordng to dfferent applcatons and pror knowledge. We ntroduce the Lagrange ultplers α, β 0, whch correspond to the constrants n (4). These varables should satsfy the followng copleentary slackness condton, α ( ξ y [ w T φ(x ) + b ]) = 0, =, 2,...,, [ β (y w T φ(x ) + b ] ) τ ξ = 0, =, 2,...,. (5) Consderng the Lagrangan of (4) and KKT condton, we get the followng dual proble for pn-svm, α,β 2 = j= (α β )y K j y j (α j β j ) (α β ) = y (α β ) = 0 (6) = α + τ β = C, =, 2,...,, α 0, β 0, =, 2,...,,
3 where K corresponds to a postve defnte kernel wth K j = K(x, x j ) = φ(x ) T φ(x j ). After obtanng the soluton of (6), we use the sgn of the followng functon to do classfcaton: f(x) = y (α β )K(x, x ) + b, = where b s coputed accordng to the copleentary slackness condtons y f(x ) =, j : α j 0, β j 0}. We further ntroduce = α β and elate the equalty constrant α + τ β = C. Then the equvalent forulaton of (6) can be posed as F() = 2 y K j y j j = j= = y = 0, (7) = τc C, =, 2,...,. We agan observe the relatonshp between pn-svm and C-SVM n the dual space: pn-svm wth τ = 0 reduces to C-SVM. The optzaton proble (7) s a quadratc prograg wth box constrants. Therefore, we can update a part of the dual varables and keep the others unchanged,.e., the sequental zaton optzaton (SMO, [2] [7]) s applcable to tran pn-svm (7). The constrant τc C can be equvalently transfored nto where A = τc, y =, C, y =, A y B, B = C, y =, τc, y =. For a gven, the ndces are dvded nto the followng two sets, I up = : y < B } and I = : y > A }. The subscrpts of the two sets ply that for a par of observatons Iup, j I, one can always fnd a sall postve scalar t such that the odfed soluton + t, j t reans feasble. Therefore, f s an optzer, the followng nequalty should be et where g = y y g y j g j, y j j K j j= stands for the dervatves of the objectve functon of (7) wth respect to α. Otherwse, f y g < y j gj, we can 3 update and j to obtan a strct decrease on the objectve value of (7). Snce the above nequalty holds for any I up () and j I (), a necessary condton of beng optal (7) can be wrtten as: and y = 0, = ρ R such that ax y g Iup ρ j I y j g j. (8) The correspondng condton for C-SVM has been wdely appled n the SMO technque, see, e.g., [20] and [4]. When τ vares, Iup and I are dfferent Dual varable update Sequental al optzaton starts fro an ntal feasble soluton of (7) and updates untl (8) s satsfed. The basc dea of SMO s that we only update the dual varables n a workng set and leave the other varables unchanged. The extree case s that only two varables are nvolved n each teraton, whch follows that there exsts an explct update forulaton. Denote the current soluton by old. Wthout any loss of generalzaton, we assue that Iup old, j I old are the varables n the workng set. That eans that the two eleents volate the optalty condton (8),.e., y g old > y j g old j. (9) Denote u j for a vector of whch the -th coponent s y, the j-th coponent s y j, and the others are zero. Then searchng along u j wll brng the proveent for (7). Specfcally, old + ζu j wth a suffcently sall postve ζ > 0 wll be stll feasble to (7). Moreover, F( old + ( ζu j ) F( old ) = ζ y g old y j gj old ) ζ2 2 (K + K jj 2K j ). (0) Fro ths forulaton and (9), we know that the objectve functon of (7) can be decreased strctly. The best ζ whch gves the largest decrease of the objectve functon s the zer of the followng proble, ζ 0 ζ (y g old y j g old j ζ 2 2 (K + K jj 2K j ) y g old + ζ B, y j g old j ζ A j. ) + For ths -densonal QP, the optal soluton can be explctly gven by ζ = B y g old, y j gj old A j, y } g old y j gj old. K + K jj 2K j
4 Correspondngly, the dual varables are updated to new = old + ζy and new j = old j ζy j. At the sae te, the gradent vector s updated to g new l = g old l ζy l K l + ζy l K jl, l =, 2,..., Workng set selecton and ntal soluton Above we dscussed the update process for pn-svm when Iup old, j Iold are chosen n the workng set. Before establshng the SMO for pn-svm, we frst consder the workng set selecton and ntal soluton generaton. The objectve functon of pn-svm (7) s the sae as that of C-SVM. Thus, the strateges of selectng two dual varables for C-SVM are applcable to pn-svm. The splest selecton s the axal volatng par, whch has been dscussed n [20]. For the current soluton old, we choose and j as = arg ax y l gl old l Iup old and j = arg y l gl old. () l I old Ths strategy s essentally the greedy choce based on the frst order approxaton of F( old +ζu j ) F( old ). One can also consder the second order workng set selecton proposed by [3]. That ethod s based on the second order expanson (0). Ths quadratc gan should be axzed wth the lnear constrants. To quckly and heurstcally fnd a good drecton, we gnore the constrant and then can fnd the axal gan easly: (y g old y j gj old ) 2 2(K + K jj 2K j ). (2) One can choose, j by axzng (2) but t needs parwse coparson. Instead, we frst use () to fnd and then only choose j accordng to (2), whch sply requres eleent coparson. Ths s also the strategy utlzed for C-SVM n LIBSVM [7]. For the ntalzaton, we use = τc. Recallng (5) for the settng of C, one can verfy that = τc gves a feasble soluton of (7). When τ = 0, the ntal soluton s = 0, whch s coonly used for C-SVM. If we know the optal soluton for pn-svm wth τ, denoted by (τ), then we can have a good guess for pn-svm wth τ 2. To observe the lnk between (τ) and (τ2), we llustrate a sple classfcaton task two oons n Fg.2, where the red crosses and the green stars correspond to observatons n class + and class, respectvely. We use pn-svm (7) to tran the classfer. In ths exaple, the sae radal bass functon (RBF) kernel and the sae regularzaton paraeter, but dfferent τ values are used. The surfaces x : f(x) =, +} are dsplayed n Fg.2. Accordng to the copleentary slackness condtons, we know that y f(x ) >, S = j : j = τc j }, y f(x ) = 0, S 0 = j : τc j < j < C j }, y f(x ) <, S + = j : j = C j }. 4 x(2) x() Fgure 2: Saplng ponts and classfcaton results of pn-svm. Ponts n class + and are shown by green stars and red crosses, respectvely. The surfaces x : f(x) = } (blue lnes) and x : f(x) = } (black lnes) for τ = 0, 0.05,0. are dsplayed by sold, dash-dotted, and dotted lnes, respectvely. In other words, the surfaces x : f(x) = ±} partton the tranng sets nto three parts. Most of the dual varables take the values τc or C. The left data are located n x : f(x) = +} or x : f(x) = }. Fro Fg.2, we observe that for any ponts, they are located n the sae part for dfferent τ. Fg.2 also llustrates that wth the ncreasng τ, the surfaces f(x ) = ± ove towards the decson boundary. Ths can be observed as well fro the pral for (2), of whch the optalty condton can be wrtten as the exstence of η [ τ, ] such that w j C y φ j (x )+τ y φ j (x ) η y φ j (x ) = 0, j. S S + S 0 Ths condton ples that generally a larger τ results n ore data fallng nto S. Therefore, f τ > τ 2 and the dfference s not bg, t s wth a hgh probablty that (τ2) = τ 2 C f (τ) = τ C. Followng fro ths dscusson, we suggest Algorth for the ntal soluton. By the proposed procedure, we fnd a new feasble soluton, whch s heurstcally sutable for τ 2. When tunng the paraeter τ, we need to tran pn-svm for a seres of τ values, for whch the above procedure can be appled. Now we gve the SMO algorth for pn-svm () n Algorth 2, where e s a pre-defned accuracy and s set to be 0 6 n ths paper. 3. SMO for Sparse pn-svm Pn-SVM can be regarded as an extenson to C-SVM va ntroducng flexblty on τ. Snce quantle dstances are consdered, pn-svm s nsenstve to feature nose and has shown better classfcaton accuracy over C-SVM. In pn-svm (7), the dual varables are categorzed nto three types: lower bounded support vectors ( = τc ), free support vectors ( τc < < C ), and upper bounded support vectors ( = C ). When τ = 0, pn-svm reduces to C-SVM. Correspondngly, the lower bounded support vectors are zero and C-SVM has sparseness. To pursue
5 Algorth : Intalzaton for pn-svm wth τ 2 fro (τ) Set S (τ ) := : (τ) = τ C }, S (τ ) + := : (τ) = C }; Let := τ 2 C, S (τ ) := C, S (τ ) + ; and Calculate the volaton v := = y ; f τ 2 > τ then repeat select fro : y = sgn(v)} S (τ ) + ; set := axc v, τ 2 C }; update v := ax0, v ( + τ 2 )C }; untl v = 0 ; else repeat select fro : y = sgn(v)} S (τ ) ; set := ax τ 2 C + v, C }; update v := ax0, v ( + τ 2 )C }; untl v = 0 ; end Return as the ntal soluton for pn-svm wth τ 2. Algorth 2: SMO for pn-svm Set := τc or use Algorth to generate ; Calculate g := y j= y j j K j and set A := τc, y = C, y = B C, y := = τc, y = ; repeat Iup := : y < B }, I := : y > A }; select := arg ax y l g l ; l Iup select j := arg ax l I calculate the update length ζ := (y g y lg l )2 2(K +K ll 2K l ) ; B y g, y j g j A j, y g y jg j K +K jj 2K j }; update := + y ζ and j := j + y j ζ, g l := g l ζy l K l + ζy l K jl, l =,...,; untl ax I up y g j I y j g j < e ; Calculate b := 2 (ax I up y g + j I y j g j ). sparseness for pn-svm wth a nonzero τ value, a loss functon wth an ε nsenstve zone was appled. Then a sparse pn-svm has been establshed n [9]. In the pral space, sparse pn-svm can be posed as w,b 2 w 2 + C = L ε τ ( y (w T φ(x ) + b) ), (3) where the pnball loss wth an ε nsenstve zone L ε τ(u) s defned n (3). The dual proble of (3) has been deduced n [9] and takes the followng for,,γ 2 y K j y j j ε = j= = = y = 0, (4) = γ 0, =,...,, τ(c γ ) C γ, =,...,. The possble range of the dual varable γ s 0 γ C. When γ takes value C, the correspondng wll be zero, whch brngs sparsty to pn-svm. Fro the objectve functon of (4), one can see that a large ε wll push γ close to C,.e., there are ore values zero. The last constrant n (4) can be vewed as a box constrant on and the box depends on another dual varable γ. Slarly to the dscusson on pn-svm (7), we can wrte τ(c γ ) C γ as where and A γ y B γ, A γ τ(c γ = ), y =, (C γ ), y =, B γ = C γ, y =, τ(c γ ), y =. Then for gven γ and, we can fnd the two followng sets, and I,γ up = : y < B γ or γ > 0}, I,γ = : y > A γ or γ > 0}. Here γ > 0 can guarantee that ± ζ s feasble for suffcently sall scalar ζ. Then, necessary condtons for, γ beng optal to (4) can be presented as follows: for a gven γ value, should satsfy: ax y g Iup,γ j I,γ γ y j gj, and y = 0; = for a gven value, γ should satsfy: γ = C + τ }, C. 5
6 Notce that n sparse pn-svm (4), the gradent g s dfferent fro that n pn-svm (6), snce there s one addtonal freedo on γ. Specfcally, there are three stuatons. If = C γ, then g = y y j j K j + ε. j= If = τ(c γ ), then we have g = y y j j K j ε τ. j= Otherwse,.e., τ(c γ ) < < C γ, we have g = y y j j K j. j= The above condtons are gven separately for and γ. For sparse pn-svm (4), and γ are coupled n the constrants. Hence these condtons are necessary but not suffcent. However, to pursue an effcent solvng ethod for (4), we apply the above necessary condton to choose two data ponts n a workng set. Then the selected dual varables are odfed and the others are unchanged. Slarly to pn-svm, the workng set for sparse pn- SVM (4) contans at least two data ponts. Suppose that, j are selected. Then to update,j, γ,j, we are to solve the followng QP proble,j,γ,j 2 K 2 + jy j K j y j j + 2 K jj 2 j + y y l l K l + j y j y l l K jl l =,j l =,j j εγ εγ j y + y j j = y l l, (5) γ 0, γ j 0, l =,j τ(c γ ) C γ, τ(c j γ j ) j C j γ j. When γ,j are fxed, (5) reduces to a 2-densonal QP wth one equalty constrant, whch has an explct soluton. Ths s the case for pn-svm (7). However, n sparse pn-svm, γ,j and,j are coupled and there s no explct soluton. Hence, we have to solve (5) to update,j, γ,j at each teraton. Solvng (5) decreases the objectve of (4). We should choose the reasonable workng set accordng to the gan of solvng (5). The gan s better than the case keepng γ,j unchanged. For the case γ,j fxed, the gan s (0), fro whch we can estate the gan for (5) and then select the workng set by the followng rule: j = arg ax l I,γ y lg up l, = arg ax l I,γ (y g y lg l )2 2(K +K ll 2K l ). 6 Ths selecton strategy s slar to that for pn-svm, but now t s dependent on γ. The ntal soluton for pn-svm = τc s also feasble to sparse pn-svm (4). Correspondngly, the ntal γ s set to be γ = C + τ, C }, whch s accordng to the necessary optal condton. Now the sequental al optzaton for sparse pn-svm (4) s suarzed n Algorth 3. Algorth 3: SMO for sparse pn-svm Set := τc and γ := C + τ, C } ; Calculate g := y j= y j j K j ; A γ τ(c γ := ), y = (C γ ), y = ; B γ := C γ, y = τ(c γ ), y = ; repeat I,γ up := : y < B γ or γ > 0}; I,γ := : y > A γ or γ > 0}; select := arg ax y l g l ; l Iup (y g y lg l )2 2(K +K ll 2K l ) ; select j := arg ax l I solve (5) to update,j, γ,j ; update A γ, Bγ, and g l, l =,..., ; untl ax I,γ y g up j I,γ y j g j < e ; ( ax I,γ Calculate b := 2 4. Nuercal Experents up y g + j I,γ y j g j ). In the above sectons, we gave the SMO algorths for tranng pn-svm (7) and sparse pn-svm (4). In the followng, we wll evaluate ther perforance on real-lfe data sets. There are two concerned aspects. Frst, we wll test whether SMO s effectve for tranng pn-svms. Second, wth an effectve tranng ethod, we can consder ore experents and support the theoretcal analyss n [9]. The sparsty of sparse pn-svm s also consdered. The data n these experents are loaded fro the UCI Repostory of Machne Learnng Datasets [23] and LIBSVM data sets [7]. For soe of these data, the tranng and test sets are provded. Otherwse, we randoly select observatons to tran the classfer and use the reanng for test. The proble denson n, the nuber of the tranng data, and the nuber of test data T are suarzed n Table. In pn-svm (7), we use the RBF kernel and apply Algorth 2 to tran the classfers wth dfferent τ values. Wth the data sze grows, cachng for the kernel atrx becoes larger. In our experents, when 5000, we calculate eleent K j only when needed, whch reduces the cachng but costs ore te. To ake a far coparson,
7 Table : Denson, Tranng Data and Test Data Sze nae n T nae n T Spect Pa Monk Breast Monk Splce Haberan Spabase Statlog Gude Monk Magc Ionosphere IJCNN Transfuson Cod RNA we use = τc as the ntal soluton. If the nuber of the tranng data s less than 0000, 0-fold crossvaldaton s utlzed to tune the regularzaton coeffcent C 0 and the bandwdth n the RBF kernel σ. Otherwse, we set C 0 = and tune σ only. The tranng and test process s repeated 0 tes. Then the average accuracy on test sets, the standard devaton, and the average coputng te are reported n Table 2. Table 2: Test Accuracy and Average Tranng Te Data τ = 0 τ = 0. τ = 0.3 τ = 0.5 Spect ± ± ± ± s 9.06 s 8.92 s 8.94 s Monk ± ± ± ± s 20.5 s 25.5 s 28.6 s Monk 8.97 ± ± ± ± s 22.6 s 24.2 s 27.0 s Haber ± ± ± ± s 24.4 s 24.9 s 24.6 s Statlog ± ± ± ± s 27.8 s 30.2 s 32.8 s Monk ± ± ± ± s 34.3 s 37. s 39.4 s Iono ± ± ± ± s 40.6 s 44.4 s 47.4 s Trans ± ± ± ± s 28.4 s 29.0 s 28.8 s Pa 74.4 ± ± ± ± 3.4 s 26 s 35 s 44 s Breast ± ± ± ± s 7.3 s 73.9 s 74.0 s Splce ± ± ± ± s 93.2 s 98.9 s 99.3 s Spab ± ± ± ± s 68 s 7 s 67 s Gude ± ± ± ± s 8 s 95 s 20 s Magc 85.0 ± ± ± ± s 29.0 s 30. s 30.3 s IJCNN ± ± ± ±. 47 s 22 s 23 s 209 s RNA ± ± ± ± s 4 s 24 s 4 s We also llustrate the scalablty of the proposed SMO algorth by plottng the tranng te for dfferent tranng data szes. In Fg.3 we plot the tranng te for data set IJCNN. Notce that there s a sudden change at = 5000, due to dfferent kernel coputaton strateges. Both Table 2 and Fg.3 llustrate that the proposed SMO ethod can tran pn-svm effectvely. For dfferent τ values, the coputatonal te s slar and s not onotonc wth respect to τ. In our ethod, pn-svm s traned n the dual space, whch corresponds to a QP wth box constrants τc C. One can observe that τ controls the sze of the feasble set. In two extree cases,.e., when the box s large enough or very sall, optal solutons can be obtaned easly. Therefore, though a larger τ s generally related to ore tranng te, the dfference s not sgnfcant. In soe applcatons, a larger τ even corresponds to less tranng te. Generally, the proposed te (s) te (s) (a) 0 5,000 0,000 20,000 30,000 40,000 (b) Fgure 3: Tranng te of Algorth 2 (τ = 0.) for IJCNN for dfferent tranng data szes. (a) < 5000; (b) SMO for pn-svm s effectve and t takes slar tranng te as SMO for C-SVM. Wth a properly selected τ, pn-svm provdes better classfcaton accuracy over C-SVM. But the sparseness s lost. If the proble sze s not too large and sparseness s not the an target, then fndng a sutable τ s eanngful for provng the classfcaton accuracy. Moreover, we can use sparse pn-svm (4) to enhance the sparsty. In the followng, we set τ = 0. and apply Algorth 3 for several dfferent ε values. The tranng and test process s slar to the prevous experent, except that the paraeters for sparse pn-svm are tuned based on pn-svm, snce Algorth 3 costs ore te than Algorth 2. In practce, f the allowed te s not strct, one can tune the paraeters based on sparse pn-svm and prove the perforance further. The average classfcaton accuracy, the nuber of support vectors (n brackets), and the tranng te are reported n Table 3, where the results of C-SVM are gven as well for reference. Copared wth pn-svm (7), sparse pn-svm (4) enhances the sparsty, but takes ore tranng te. In Algorth 3, the update forulaton nvolves a 4-densonal QP proble. Though t can be solved effectvely, ts coputaton te s larger than the explct update forulaton n Algorth 2. Roughly, Algorth 3 needs 0 tes ore than Algorth 2. In C-SVM, the ponts wth y f(x ) < are related to zero dual varables and so are the ponts wth ε τ < y f(x ) < ε n sparse pn-svm. Thus, the results of C-SVM are generally ore sparse. But when the feature nose s heavy, t s worthy consderng Algorth 3 to tran sparse pn-svm. 7
8 Table 3: Test Accuracy, Nuber of Nonzero Dual Varables, and Tranng Te for Sparse pn-svm (τ = 0.) Data C-SVM ε = 0.05 ε = 0.0 ε = 0.20 Spect (69) (66) (62) (60) 8.96 s 08 s 75.3 s 90.4 s Monk (83) (97) 9.44 (87) (86) 6.5 s 43 s 3 s 27 s Monk 8.97 (68) 79.5 (00) (93) (88) 8.8 s 26 s 39 s 27 s Haber (40) (40) (39) (37) 26.5 s 54 s 50 s 77 s Statlog (99) 83. (22) 82. (8) 8.4 (0) 24.2 s 55 s 43 s 39 s Monk (0) (07) (98) 8.45 (90) 29.4 s 246 s 277 s 253 s Iono (99) (09) (98) (87) 32.8 s 277 s 243 s 243 s Trans (286) (272) (26) (95) 34.5 s 250 s 252 s 250 s Pa 74.4 (337) 74.0 (354) 7.29 (346) (336) s 535 s 502 s 486 s Breast (89) (37) (26) (99) 57.4 s 445 s 469 s 483 s Splce (27) 83. (392) (322) (234) 02 s 749 s 652 s 659 s Spab (290) 9.28 (906) 9.2 (864) 9.20 (780) 200 s 74 s 755 s 697 s Gude (345) (208) (684) (203) 58 s 2.74 s 2.53 s 2.34 s 5. Concluson In ths paper, sequental al optzaton has been establshed for the support vector achne wth the pnball loss. Snce pn-svm has the sae proble structure as C-SVM, the correspondng SMO s related to that for C- SVM. We nvestgated the detals and pleented SMO for pn-svm. The SMO for tranng sparse pn-svm was gven as well. Then the proposed algorths were evaluated on nuercal experents, showng the effectveness of tranng pn-svms. The proposed SMO algorths ake pn-svms prosng tools n real-lfe applcaton, especally when the data are corrupted by feature nose. Acknowledgent The authors would lke to thank Prof. Chh-Jen Ln n Natonal Tawan Unversty for encouragng us to establsh the SMO algorth for pn-svm. The authors are grateful to the anonyous revewers for helpful coents. [7] H. Xu, C. Caraans, and S. Mannor. Robustness and regularzaton of support vector achnes. The Journal of Machne Learnng Research, 0:485 50, [8] B. Schölkopf, A.J. Sola, R.C. Wllason, and P.L. Bartlett. New support vector algorths. Neural Coputaton, 2(5): , [9] X. Huang, L. Sh, and J.A.K. Suykens. Support vector achne classfer wth pnball loss. IEEE Transactons on Pattern Analyss and Machne Intellgence, 36(5): , 204. [0] R. Koenker. Quantle Regresson. Cabrdge Unversty Press, [] I. Stenwart and A. Chrstann. Estatng condtonal quantles wth the help of the pnball loss. Bernoull, 7(): 2 225, 20. [2] J.C. Platt. Fast tranng of support vector achnes usng sequental al optzaton. In Advances n kernel ethods Support Vector Learnng, pages MIT Press, 999. [3] R.E. Fan, P.H. Chen, and C.J. Ln. Workng set selecton usng second order nforaton for tranng support vector achnes. The Journal of Machne Learnng Research, 6:889 98, [4] L. Bottou and C.-J. Ln. Support vector achne solvers. n Large Scale Kernel achnes, pages MIT Press, [5] Y. Tor and S. Abe. Decoposton technques for tranng lnear prograg support vector achnes. Neurocoputng, 72(4): , [6] J. Shawe-Taylor and S. Sun. A revew of optzaton ethodologes n support vector achnes. Neurocoputng, 74(7): , 20. [7] C.C. Chang and C.J. Ln. LIBSVM: a lbrary for support vector achnes. ACM Transactons on Intellgent Systes and Technology, 2(3):27, 20. [8] C.C. Chang, C.W. Hsu, and C.J. Ln. The analyss of decoposton ethods for support vector achnes. IEEE Transactons on Neural Networks, (4): , [9] C.J. Ln. On the convergence of the decoposton ethod for support vector achnes. IEEE Transactons on Neural Networks, 2(6): , 200. [20] S.S. Keerth and E.G. Glbert. Convergence of a generalzed SMO algorth for SVM classfer desgn. Machne Learnng, 46 (-3):35 360, [2] D. Hush, P. Kelly, C. Scovel, and I. Stenwart. QP algorths wth guaranteed accuracy and run te for support vector achnes. The Journal of Machne Learnng Research, 7: , [22] J. López and J.R. Dorronsoro. Sple proof of convergence of the SMO algorth for dfferent SVM varants. IEEE Transactons on Neural Networks and Learnng Systes, 23(7):42 47, 202. [23] A. Frank and A. Asuncon. UCI achne learnng repostory, 200. References [] C. Cortes and V. Vapnk, Support-vector networks. Machne Learnng, 20: , 995. [2] V. Vapnk. Statstcal Learnng Theory. Wley, New York, 998. [3] X. Zhang. Usng class-center vectors to buld support vector achnes. In Proceedngs of the IEEE Sgnal Processng Socety Workshop, pages 3. IEEE, 999. [4] J. B and T. Zhang. Support vector classfcaton wth nput data uncertanty. In Advances n Neural Inforaton Processng Systes, volue 7, page 6. MIT Press, [5] G.R.G. Lanckret, L.E. Ghaou, C. Bhattacharyya, and M.I. Jordan. A robust ax approach to classfcaton. The Journal of Machne Learnng Research, 3: , [6] P.K. Shvasway, C. Bhattacharyya, and A.J. Sola. Second order cone prograg approaches for handlng ssng and uncertan data. The Journal of Machne Learnng Research, 7:283 34,
Excess Error, Approximation Error, and Estimation Error
E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple
More informationSystem in Weibull Distribution
Internatonal Matheatcal Foru 4 9 no. 9 94-95 Relablty Equvalence Factors of a Seres-Parallel Syste n Webull Dstrbuton M. A. El-Dacese Matheatcs Departent Faculty of Scence Tanta Unversty Tanta Egypt eldacese@yahoo.co
More informationComputational and Statistical Learning theory Assignment 4
Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationRecap: the SVM problem
Machne Learnng 0-70/5-78 78 Fall 0 Advanced topcs n Ma-Margn Margn Learnng Erc Xng Lecture 0 Noveber 0 Erc Xng @ CMU 006-00 Recap: the SVM proble We solve the follong constraned opt proble: a s.t. J 0
More information1 Definition of Rademacher Complexity
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #9 Scrbe: Josh Chen March 5, 2013 We ve spent the past few classes provng bounds on the generalzaton error of PAClearnng algorths for the
More informationApplied Mathematics Letters
Appled Matheatcs Letters 2 (2) 46 5 Contents lsts avalable at ScenceDrect Appled Matheatcs Letters journal hoepage: wwwelseverco/locate/al Calculaton of coeffcents of a cardnal B-splne Gradr V Mlovanovć
More informationCOS 511: Theoretical Machine Learning
COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationCHAPTER 6 CONSTRAINED OPTIMIZATION 1: K-T CONDITIONS
Chapter 6: Constraned Optzaton CHAPER 6 CONSRAINED OPIMIZAION : K- CONDIIONS Introducton We now begn our dscusson of gradent-based constraned optzaton. Recall that n Chapter 3 we looked at gradent-based
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationCHAPTER 7 CONSTRAINED OPTIMIZATION 1: THE KARUSH-KUHN-TUCKER CONDITIONS
CHAPER 7 CONSRAINED OPIMIZAION : HE KARUSH-KUHN-UCKER CONDIIONS 7. Introducton We now begn our dscusson of gradent-based constraned optzaton. Recall that n Chapter 3 we looked at gradent-based unconstraned
More informationMachine Learning. What is a good Decision Boundary? Support Vector Machines
Machne Learnng 0-70/5 70/5-78 78 Sprng 200 Support Vector Machnes Erc Xng Lecture 7 March 5 200 Readng: Chap. 6&7 C.B book and lsted papers Erc Xng @ CMU 2006-200 What s a good Decson Boundar? Consder
More informationWhat is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner.
(C) 998 Gerald B Sheblé, all rghts reserved Lnear Prograng Introducton Contents I. What s LP? II. LP Theor III. The Splex Method IV. Refneents to the Splex Method What s LP? LP s an optzaton technque that
More informationLeast Squares Fitting of Data
Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 1998-2014. All Rghts Reserved. Created: July 15, 1999 Last Modfed: February 9, 2008 Contents 1 Lnear Fttng
More informationXII.3 The EM (Expectation-Maximization) Algorithm
XII.3 The EM (Expectaton-Maxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles
More informationGradient Descent Learning and Backpropagation
Artfcal Neural Networks (art 2) Chrstan Jacob Gradent Descent Learnng and Backpropagaton CSC 533 Wnter 200 Learnng by Gradent Descent Defnton of the Learnng roble Let us start wth the sple case of lnear
More informationLeast Squares Fitting of Data
Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 1998-2015. All Rghts Reserved. Created: July 15, 1999 Last Modfed: January 5, 2015 Contents 1 Lnear Fttng
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far
More informationMachine Learning. Support Vector Machines. Eric Xing. Lecture 4, August 12, Reading: Eric CMU,
Machne Learnng Support Vector Machnes Erc Xng Lecture 4 August 2 200 Readng: Erc Xng @ CMU 2006-200 Erc Xng @ CMU 2006-200 2 What s a good Decson Boundar? Wh e a have such boundares? Irregular dstrbuton
More informationAn Optimal Bound for Sum of Square Roots of Special Type of Integers
The Sxth Internatonal Syposu on Operatons Research and Its Applcatons ISORA 06 Xnang, Chna, August 8 12, 2006 Copyrght 2006 ORSC & APORC pp. 206 211 An Optal Bound for Su of Square Roots of Specal Type
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear
More informationy new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion)
Feature Selecton: Lnear ransforatons new = M x old Constrant Optzaton (nserton) 3 Proble: Gven an objectve functon f(x) to be optzed and let constrants be gven b h k (x)=c k, ovng constants to the left,
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationBAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup
BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS Darusz Bskup 1. Introducton The paper presents a nonparaetrc procedure for estaton of an unknown functon f n the regresson odel y = f x + ε = N. (1) (
More informationCOMP th April, 2007 Clement Pang
COMP 540 12 th Aprl, 2007 Cleent Pang Boostng Cobnng weak classers Fts an Addtve Model Is essentally Forward Stagewse Addtve Modelng wth Exponental Loss Loss Functons Classcaton: Msclasscaton, Exponental,
More informationXiangwen Li. March 8th and March 13th, 2001
CS49I Approxaton Algorths The Vertex-Cover Proble Lecture Notes Xangwen L March 8th and March 3th, 00 Absolute Approxaton Gven an optzaton proble P, an algorth A s an approxaton algorth for P f, for an
More informationOn the number of regions in an m-dimensional space cut by n hyperplanes
6 On the nuber of regons n an -densonal space cut by n hyperplanes Chungwu Ho and Seth Zeran Abstract In ths note we provde a unfor approach for the nuber of bounded regons cut by n hyperplanes n general
More informationSlobodan Lakić. Communicated by R. Van Keer
Serdca Math. J. 21 (1995), 335-344 AN ITERATIVE METHOD FOR THE MATRIX PRINCIPAL n-th ROOT Slobodan Lakć Councated by R. Van Keer In ths paper we gve an teratve ethod to copute the prncpal n-th root and
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationPROBABILITY AND STATISTICS Vol. III - Analysis of Variance and Analysis of Covariance - V. Nollau ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE
ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE V. Nollau Insttute of Matheatcal Stochastcs, Techncal Unversty of Dresden, Gerany Keywords: Analyss of varance, least squares ethod, odels wth fxed effects,
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationDesigning Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate
The Frst Internatonal Senar on Scence and Technology, Islac Unversty of Indonesa, 4-5 January 009. Desgnng Fuzzy Te Seres odel Usng Generalzed Wang s ethod and Its applcaton to Forecastng Interest Rate
More informationLecture 3: Dual problems and Kernels
Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM
More informationNatural Language Processing and Information Retrieval
Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts
More informationPreference and Demand Examples
Dvson of the Huantes and Socal Scences Preference and Deand Exaples KC Border October, 2002 Revsed Noveber 206 These notes show how to use the Lagrange Karush Kuhn Tucker ultpler theores to solve the proble
More informationDenote the function derivatives f(x) in given points. x a b. Using relationships (1.2), polynomials (1.1) are written in the form
SET OF METHODS FO SOUTION THE AUHY POBEM FO STIFF SYSTEMS OF ODINAY DIFFEENTIA EUATIONS AF atypov and YuV Nulchev Insttute of Theoretcal and Appled Mechancs SB AS 639 Novosbrs ussa Introducton A constructon
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationThe Parity of the Number of Irreducible Factors for Some Pentanomials
The Party of the Nuber of Irreducble Factors for Soe Pentanoals Wolfra Koepf 1, Ryul K 1 Departent of Matheatcs Unversty of Kassel, Kassel, F. R. Gerany Faculty of Matheatcs and Mechancs K Il Sung Unversty,
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING
1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N
More informationThe Study of Teaching-learning-based Optimization Algorithm
Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationMachine Learning. Support Vector Machines. Eric Xing , Fall Lecture 9, October 6, 2015
Machne Learnng 0-70 Fall 205 Support Vector Machnes Erc Xng Lecture 9 Octoer 6 205 Readng: Chap. 6&7 C.B ook and lsted papers Erc Xng @ CMU 2006-205 What s a good Decson Boundar? Consder a nar classfcaton
More informationFinite Fields and Their Applications
Fnte Felds and Ther Applcatons 5 009 796 807 Contents lsts avalable at ScenceDrect Fnte Felds and Ther Applcatons www.elsever.co/locate/ffa Typcal prtve polynoals over nteger resdue rngs Tan Tan a, Wen-Feng
More informationThree Algorithms for Flexible Flow-shop Scheduling
Aercan Journal of Appled Scences 4 (): 887-895 2007 ISSN 546-9239 2007 Scence Publcatons Three Algorths for Flexble Flow-shop Schedulng Tzung-Pe Hong, 2 Pe-Yng Huang, 3 Gwoboa Horng and 3 Chan-Lon Wang
More informationSeveral generation methods of multinomial distributed random number Tian Lei 1, a,linxihe 1,b,Zhigang Zhang 1,c
Internatonal Conference on Appled Scence and Engneerng Innovaton (ASEI 205) Several generaton ethods of ultnoal dstrbuted rando nuber Tan Le, a,lnhe,b,zhgang Zhang,c School of Matheatcs and Physcs, USTB,
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationSupport Vector Machines
CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at
More information1 Review From Last Time
COS 5: Foundatons of Machne Learnng Rob Schapre Lecture #8 Scrbe: Monrul I Sharf Aprl 0, 2003 Revew Fro Last Te Last te, we were talkng about how to odel dstrbutons, and we had ths setup: Gven - exaples
More informationStudy of Classification Methods Based on Three Learning Criteria and Two Basis Functions
Study of Classfcaton Methods Based on hree Learnng Crtera and wo Bass Functons Jae Kyu Suhr Abstract - hs paper nvestgates several classfcaton ethods based on the three learnng crtera and two bass functons.
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More information4 Column generation (CG) 4.1 Basics of column generation. 4.2 Applying CG to the Cutting-Stock Problem. Basic Idea of column generation
4 Colun generaton (CG) here are a lot of probles n nteger prograng where even the proble defnton cannot be effcently bounded Specfcally, the nuber of coluns becoes very large herefore, these probles are
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationCollaborative Filtering Recommendation Algorithm
Vol.141 (GST 2016), pp.199-203 http://dx.do.org/10.14257/astl.2016.141.43 Collaboratve Flterng Recoendaton Algorth Dong Lang Qongta Teachers College, Haou 570100, Chna, 18689851015@163.co Abstract. Ths
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationIntegral Transforms and Dual Integral Equations to Solve Heat Equation with Mixed Conditions
Int J Open Probles Copt Math, Vol 7, No 4, Deceber 214 ISSN 1998-6262; Copyrght ICSS Publcaton, 214 www-csrsorg Integral Transfors and Dual Integral Equatons to Solve Heat Equaton wth Mxed Condtons Naser
More informationSupport Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012
Support Vector Machnes Je Tang Knowledge Engneerng Group Department of Computer Scence and Technology Tsnghua Unversty 2012 1 Outlne What s a Support Vector Machne? Solvng SVMs Kernel Trcks 2 What s a
More informationYong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )
Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often
More information1. Statement of the problem
Volue 14, 010 15 ON THE ITERATIVE SOUTION OF A SYSTEM OF DISCRETE TIMOSHENKO EQUATIONS Peradze J. and Tsklaur Z. I. Javakhshvl Tbls State Uversty,, Uversty St., Tbls 0186, Georga Georgan Techcal Uversty,
More informationThe Order Relation and Trace Inequalities for. Hermitian Operators
Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence
More informationLagrange Multipliers Kernel Trick
Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x
More informationGadjah Mada University, Indonesia. Yogyakarta State University, Indonesia Karangmalang Yogyakarta 55281
Reducng Fuzzy Relatons of Fuzzy Te Seres odel Usng QR Factorzaton ethod and Its Applcaton to Forecastng Interest Rate of Bank Indonesa Certfcate Agus aan Abad Subanar Wdodo 3 Sasubar Saleh 4 Ph.D Student
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationSolving Fuzzy Linear Programming Problem With Fuzzy Relational Equation Constraint
Intern. J. Fuzz Maeatcal Archve Vol., 0, -0 ISSN: 0 (P, 0 0 (onlne Publshed on 0 Septeber 0 www.researchasc.org Internatonal Journal of Solvng Fuzz Lnear Prograng Proble W Fuzz Relatonal Equaton Constrant
More informationINPUT-OUTPUT PAIRING OF MULTIVARIABLE PREDICTIVE CONTROL
INPUT-OUTPUT PAIRING OF MULTIVARIABLE PREDICTIVE CONTROL Lng-Cong Chen #, Pu Yuan*, Gu-L Zhang* *Unversty of Petroleu, P.O. Box 902 Beng 00083, Chna # GAIN Tech Co., P.O. Box 902ext.79, Beng 00083, Chna
More informationOn the Multicriteria Integer Network Flow Problem
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of
More informationPower law and dimension of the maximum value for belief distribution with the max Deng entropy
Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationLECTURE :FACTOR ANALYSIS
LCUR :FACOR ANALYSIS Rta Osadchy Based on Lecture Notes by A. Ng Motvaton Dstrbuton coes fro MoG Have suffcent aount of data: >>n denson Use M to ft Mture of Gaussans nu. of tranng ponts If
More informationOptimal Marketing Strategies for a Customer Data Intermediary. Technical Appendix
Optal Marketng Strateges for a Custoer Data Interedary Techncal Appendx oseph Pancras Unversty of Connectcut School of Busness Marketng Departent 00 Hllsde Road, Unt 04 Storrs, CT 0669-04 oseph.pancras@busness.uconn.edu
More informationISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013
ISSN: 2277-375 Constructon of Trend Free Run Orders for Orthogonal rrays Usng Codes bstract: Sometmes when the expermental runs are carred out n a tme order sequence, the response can depend on the run
More informationSolutions for Homework #9
Solutons for Hoewor #9 PROBEM. (P. 3 on page 379 n the note) Consder a sprng ounted rgd bar of total ass and length, to whch an addtonal ass s luped at the rghtost end. he syste has no dapng. Fnd the natural
More informationAN ANALYSIS OF A FRACTAL KINETICS CURVE OF SAVAGEAU
AN ANALYI OF A FRACTAL KINETIC CURE OF AAGEAU by John Maloney and Jack Hedel Departent of Matheatcs Unversty of Nebraska at Oaha Oaha, Nebraska 688 Eal addresses: aloney@unoaha.edu, jhedel@unoaha.edu Runnng
More informationFeature Selection in Multi-instance Learning
The Nnth Internatonal Symposum on Operatons Research and Its Applcatons (ISORA 10) Chengdu-Juzhagou, Chna, August 19 23, 2010 Copyrght 2010 ORSC & APORC, pp. 462 469 Feature Selecton n Mult-nstance Learnng
More informationOur focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e.
SSTEM MODELLIN In order to solve a control syste proble, the descrptons of the syste and ts coponents ust be put nto a for sutable for analyss and evaluaton. The followng ethods can be used to odel physcal
More informationSimpleMKL. Abstract. Alain Rakotomamonjy LITIS EA 4108 Université de Rouen Saint Etienne du Rouvray, France. Francis R. Bach
Journal of Machne Learnng Research X (28) 1-34 Subtted 1/8; Revsed 8/8; Publshed XX/XX SpleMKL Alan Rakotoaonjy LITIS EA 418 Unversté de Rouen 768 Sant Etenne du Rouvray, France alan.rakotoaonjy@nsa-rouen.fr
More informationNested Support Vector Machines
1 Nested Support Vector Machnes *Gyen Lee, Student Meber, IEEE, and Clayton Scott, Meber, IEEE Abstract One-class and cost-senstve support vector achnes (SVMs) are state-of-the-art achne learnng ethods
More informationThe Minimum Universal Cost Flow in an Infeasible Flow Network
Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationCSE 252C: Computer Vision III
CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel
More informationAdditional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty
Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,
More information17 Support Vector Machines
17 We now dscuss an nfluental and effectve classfcaton algorthm called (SVMs). In addton to ther successes n many classfcaton problems, SVMs are responsble for ntroducng and/or popularzng several mportant
More informationCanonical transformations
Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,
More informationSupport Vector Machines
Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More informationModified parallel multisplitting iterative methods for non-hermitian positive definite systems
Adv Coput ath DOI 0.007/s0444-0-9262-8 odfed parallel ultsplttng teratve ethods for non-hertan postve defnte systes Chuan-Long Wang Guo-Yan eng Xue-Rong Yong Receved: Septeber 20 / Accepted: 4 Noveber
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More informationScattering by a perfectly conducting infinite cylinder
Scatterng by a perfectly conductng nfnte cylnder Reeber that ths s the full soluton everywhere. We are actually nterested n the scatterng n the far feld lt. We agan use the asyptotc relatonshp exp exp
More information