SimpleMKL. Abstract. Alain Rakotomamonjy LITIS EA 4108 Université de Rouen Saint Etienne du Rouvray, France. Francis R. Bach

Size: px
Start display at page:

Download "SimpleMKL. Abstract. Alain Rakotomamonjy LITIS EA 4108 Université de Rouen Saint Etienne du Rouvray, France. Francis R. Bach"

Transcription

1 Journal of Machne Learnng Research X (28) 1-34 Subtted 1/8; Revsed 8/8; Publshed XX/XX SpleMKL Alan Rakotoaonjy LITIS EA 418 Unversté de Rouen 768 Sant Etenne du Rouvray, France alan.rakotoaonjy@nsa-rouen.fr Francs R. Bach francs.bach@nes.org INRIA - WILLOW Project - Tea Laboratore d Inforatque de l Ecole Norale Supéreure(CNRS/ENS/INRIA UMR 8548) 45, Rue d Ul, 7523 Pars, France Stéphane Canu LITIS EA 418 INSA de Rouen 7681 Sant Etenne du Rouvray, France stephane.canu@nsa-rouen.fr Yves Grandvalet yves.grandvalet@utc.fr Idap Research Insttute, Centre du Parc, 192 Martgny, Swtzerland Heudasyc, CNRS/Unversté de Technologe de Copègne (UMR 6599), 625 Copègne, France Edtor: Nello Crstann Abstract Multple kernel learnng (MKL) as at sultaneously learnng a kernel and the assocated predctor n supervsed learnng settngs. For the support vector achne, an effcent and general ultple kernel learnng algorth, based on se-nfnte lnear progang, has been recently proposed. Ths approach has opened new perspectves snce t akes MKL tractable for large-scale probles, by teratvely usng exstng support vector achne code. However, t turns out that ths teratve algorth needs nuerous teratons for convergng towards a reasonable soluton. In ths paper, we address the MKL proble through a weghted 2-nor regularzaton forulaton wth an addtonal constrant on the weghts that encourages sparse kernel cobnatons. Apart fro learnng the cobnaton, we solve a standard SVM optzaton proble, where the kernel s defned as a lnear cobnaton of ultple kernels. We propose an algorth, naed SpleMKL, for solvng ths MKL proble and provde a new nsght on MKL algorths based on xed-nor regularzaton by showng that the two approaches are equvalent. We show how SpleMKL can be appled beyond bnary classfcaton, for probles lke regresson, clusterng (one-class classfcaton) or ultclass classfcaton. Experental results show that the proposed algorth converges rapdly and that ts effcency copares favorably to other MKL algorths. Fnally, we llustrate the usefulness of MKL for soe regressors based on wavelet kernels and on soe odel selecton probles related to ultclass classfcaton probles. c 28 Rakotoaonjy et al..

2 Rakotoaonjy et al. 1. Introducton Durng the last few years, kernel ethods, such as support vector achnes (SVM) have proved to be effcent tools for solvng learnng probles lke classfcaton or regresson (Schölkopf and Sola, 21). For such tasks, the perforance of the learnng algorth strongly depends on the data representaton. In kernel ethods, the data representaton s plctly chosen through the so-called kernel K(x, x ). Ths kernel actually plays two roles: t defnes the slarty between two exaples x and x, whle defnng an approprate regularzaton ter for the learnng proble. Let {x, y } l =1 be the learnng set, where x belongs to soe nput space X and y s the target value for pattern x. For kernel algorths, the soluton of the learnng proble s of the for l f(x) = α K(x, x ) + b, (1) =1 where α and b are soe coeffcents to be learned fro exaples, whle K(, ) s a gven postve defnte kernel assocated wth a reproducng kernel Hlbert space (RKHS) H. In soe stuatons, a achne learnng practtoner ay be nterested n ore flexble odels. Recent applcatons have shown that usng ultple kernels nstead of a sngle one can enhance the nterpretablty of the decson functon and prove perforances (Lanckret et al., 24a). In such cases, a convenent approach s to consder that the kernel K(x, x ) s actually a convex cobnaton of bass kernels: M K(x, x ) = d K (x, x ), wth d, =1 M d = 1, =1 where M s the total nuber of kernels. Each bass kernel K ay ether use the full set of varables descrbng x or subsets of varables steng fro dfferent data sources (Lanckret et al., 24a). Alternatvely, the kernels K can sply be classcal kernels (such as Gaussan kernels) wth dfferent paraeters. Wthn ths fraework, the proble of data representaton through the kernel s then transferred to the choce of weghts d. Learnng both the coeffcents α and the weghts d n a sngle optzaton proble s known as the ultple kernel learnng (MKL) proble. For bnary classfcaton, the MKL proble has been ntroduced by Lanckret et al. (24b), resultng n a quadratcally constraned quadratc prograng proble that becoes rapdly ntractable as the nuber of learnng exaples or kernels becoe large. What akes ths proble dffcult s that t s actually a convex but non-sooth nzaton proble. Indeed, Bach et al. (24a) have shown that the MKL forulaton of Lanckret et al. (24b) s actually the dual of a SVM proble n whch the weght vector has been regularzed accordng to a xed (l 2, l 1 )-nor nstead of the classcal squared l 2 -nor. Bach et al. (24a) have consdered a soothed verson of the proble for whch they proposed a SMO-lke algorth that enables to tackle edu-scale probles. Sonnenburg et al. (26) reforulate the MKL proble of Lanckret et al. (24b) as a se-nfnte lnear progra (SILP). The advantage of the latter forulaton s that the algorth addresses the proble by teratvely solvng a classcal SVM proble wth a sngle kernel, for whch any effcent toolboxes exst (Vshwanathan et al., 23; Loosl et al., 2

3 SpleMKL 25; Chang and Ln, 21), and a lnear progra whose nuber of constrants ncreases along wth teratons. A very nce feature of ths algorth s that s can be extended to a large class of convex loss functons. For nstance, Zen and Ong (27) have proposed a ultclass MKL algorth based on slar deas. In ths paper, we present another forulaton of the ultple learnng proble. We frst depart fro the pral forulaton proposed by Bach et al. (24a) and further used by Bach et al. (24b) and Sonnenburg et al. (26). Indeed, we replace the xed-nor regularzaton by a weghted l 2 -nor regularzaton, where the sparsty of the lnear cobnaton of kernels s controlled by a l 1 -nor constrant on the kernel weghts. Ths new forulaton of MKL leads to a sooth and convex optzaton proble. By usng a varatonal forulaton of the xed-nor regularzaton, we show that our forulaton s equvalent to the ones of Lanckret et al. (24b), Bach et al. (24a) and Sonnenburg et al. (26). The an contrbuton of ths paper s to propose an effcent algorth, naed SpleMKL, for solvng the MKL proble, through a pral forulaton nvolvng a weghted l 2 -nor regularzaton. Indeed, our algorth s sple, essentally based on a gradent descent on the SVM objectve value. We teratvely deterne the cobnaton of kernels by a gradent descent wrappng a standard SVM solver, whch s SpleSVM n our case. Our schee s slar to the one of Sonnenburg et al. (26), and both algorths nze the sae objectve functon. However, they dffer n that we use reduced gradent descent n the pral, whereas Sonnenburg et al. s SILP reles on cuttng planes. We wll eprcally show that our optzaton strategy s ore effcent, wth new evdences confrng the prelnary results reported n Rakotoaonjy et al. (27). Then, extensons of SpleMKL to other supervsed learnng probles such as regresson SVM, one-class SVM or ultclass SVM probles based on parwse couplng are proposed. Although t s not the an purpose of the paper, we wll also dscuss the applcablty of our approach to general convex loss functons. Ths paper also presents several llustratons of the usefulness of our algorth. For nstance, n addton to the eprcal effcency coparson, we also show, n a SVM regresson proble nvolvng wavelet kernels, that autoatc learnng of the kernels leads to far better perforances. Then we depct how our MKL algorth behaves on soe ultclass probles. The paper s organzed as follows. Secton 2 presents the functonal settngs of our MKL proble and ts forulaton. Detals on the algorth and dscusson of convergence and coputatonal coplexty are gven n Secton 3. Extensons of our algorth to other SVM probles are dscussed n Secton 4 whle experental results dealng wth coputatonal coplexty or wth coparson wth other odel selecton ethods are presented n Secton 5. A SpleMKL toolbox based on Matlab code s avalable at Ths toolbox s an extenson of our SVM-KM toolbox (Canu et al., 23). 2. Multple Kernel Learnng Fraework In ths secton, we present our MKL forulaton and derve ts dual. In the sequel, and j are ndces on exaples, wheras s the kernel ndex. In order to lghten notatons, 3

4 Rakotoaonjy et al. we ot to specfy that suatons on and j go fro 1 to l, and that suatons on go fro 1 to M. 2.1 Functonal fraework Before enterng nto the detals of the MKL optzaton proble, we frst present the functonal fraework adopted for ultple kernel learnng. Assue K, = 1,..., M are M postve defnte kernels on the sae nput space X, each of the beng assocated wth an RKHS H endowed wth an nner product,. For any, let d be a non-negatve coeffcent and H be the Hlbert space derved fro H as follows: endowed wth the nner product H = {f f H : f H d < }, f, g H = 1 d f, g. In ths paper, we use the conventon that x = f x = and otherwse. Ths eans that, f d = then a functon f belongs to the Hlbert space H only f f = H. In such a case, H s restrcted to the null eleent of H. Wthn ths fraework, H s a RKHS wth kernel K(x, x ) = d K (x, x ) snce f H H, f(x) = f( ), K (x, ) = 1 d f( ), d K (x, ) = f( ), d K (x, ) H. Now, f we defne H as the drect su of the spaces H,.e., H = M H, then, a classcal result on RKHS (Aronszajn, 195) says that H s a RKHS of kernel K(x, x ) = =1 M d K (x, x ). =1 Owng to ths sple constructon, we have bult a RKHS H for whch any functon s a su of functons belongng to H. In our fraework, MKL as at deternng the set of coeffcents {d } wthn the learnng process of the decson functon. The ultple kernel learnng proble can thus be envsoned as learnng a predctor belongng to an adaptve hypothess space endowed wth an adaptve nner product. The forthcong sectons explan how we solve ths proble. 4

5 SpleMKL 2.2 Multple kernel learnng pral proble In the SVM ethodology, the decson functon s of the for gven n equaton (1), where the optal paraeters α and b are obtaned by solvng the dual of the followng optzaton proble: n f,b,ξ 1 2 f 2 H + C ξ s.t. y (f(x ) + b) 1 ξ ξ. In the MKL fraework, one looks for a decson functon of the for f(x) + b = f (x) + b, where each functon f belongs to a dfferent RKHS H assocated wth a kernel K. Accordng to the above functonal fraework and nspred by the ultple soothng splnes fraework of Wahba (199, chap. 1), we propose to address the MKL SVM proble by solvng the followng convex proble (see proof n appendx), whch we wll be referred to as the pral MKL proble: n {f },b,ξ,d s.t. 1 1 f 2 H 2 d + C y f (x ) + y b 1 ξ ξ ξ d = 1, d, where each d controls the squared nor of f n the objectve functon. The saller d s, the soother f (as easured by f H ) should be. When d =, f H has also to be equal to zero to yeld a fnte objectve value. The l 1 -nor constrant on the vector d s a sparsty constrant that wll force soe d to be zero, thus encouragng sparse bass kernel expansons. 2.3 Connectons wth xed-nor regularzaton forulaton of MKL The MKL forulaton ntroduced by Bach et al. (24a) and further developed by Sonnenburg et al. (26) conssts n solvng an optzaton proble expressed n a functonal for as n {f},b,ξ s.t. ( ) 2 1 f H + C ξ 2 y f (x ) + y b 1 ξ ξ. Note that the objectve functon of ths proble s not sooth snce f H s not dfferentable at f =. However, what akes ths forulaton nterestng s that the xed-nor penalzaton of f = f s a soft-thresholdng penalzer that leads to a sparse soluton, for whch the algorth perfors kernel selecton (Bach, 28). We have stated n the prevous secton that our proble should also lead to sparse solutons. In the followng, we show that the forulatons (2) and (3) are equvalent. (2) (3) 5

6 Rakotoaonjy et al. For ths purpose, we sply show that the varatonal forulaton of the xed-nor regularzaton s equal to the weghted 2-nor regularzaton, (whch s a partcular case of a ore general equvalence proposed by Mcchell and Pontl (25)).e., by Cauchy-Schwartz nequalty, for any vector d on the splex: ( ) 2 f H = ( ( f H d 1/2 d 1/2 f 2 H d f 2 H d, ) 2 ) ( ) d where equalty s et when d 1/2 s proportonal to f H /d 1/2, that s: d = f H, (4) f q Hq q whch leads to ( ) f 2 2 n d, P H = f H. (5) d =1 d Hence, owng to ths varatonal forulaton, the non-sooth xed-nor objectve functon of proble (3) has been turned nto a sooth objectve functon n proble (2). Although the nuber of varables has ncreased, we wll see that ths proble can be solved ore effcently. 2.4 The MKL dual proble The dual proble s a key pont for dervng MKL algorths and for studyng ther convergence propertes. Snce our pral proble (2) s equvalent to the one of Bach et al. (24a), they lead to the sae dual. However, our pral forulaton beng convex and dfferentable, t provdes a sple dervaton of the dual, that does not use conc dualty. The Lagrangan of proble (2) s L = 1 1 f 2 H 2 d + C ξ + ( ) α 1 ξ y f (x ) y b ν ξ ( ) +λ d 1 η d, (6) where α and ν are the Lagrange ultplers of the constrants related to the usual SVM proble, whereas λ and η are assocated to the constrants on d. When settng to zero 6

7 SpleMKL the gradent of the Lagrangan wth respect to the pral varables, we get the followng 1 (a) f ( ) = α y K (, x ), d (b) α y = (c) C α ν =, (7) (d) 1 2 f 2 H d 2 + λ η =,. We note agan here that f ( ) has to go to f the coeffcent d vanshes. Pluggng these optalty condtons n the Lagrangan gves the dual proble ax α λ α,λ s.t. α y = (8) α C 1 α α j y y j K (x, x j ) λ,. 2,j Ths dual proble 1 s dffcult to optze due to the last constrant. Ths constrant ay be oved to the objectve functon, but then, the latter becoes non-dfferentable causng new dffcultes (Bach et al., 24a). Hence, n the forthcong secton, we propose an approach based on the nzaton of the pral. In ths fraework, we beneft fro dfferentablty whch allows for an effcent dervaton of an approxate pral soluton, whose accuracy wll be ontored by the dualty gap. 3. Algorth for solvng the MKL pral proble One possble approach for solvng proble (2) s to use the alternate optzaton algorth appled by Grandvalet and Canu (1999, 23) n another context. In the frst step, proble (2) s optzed wth respect to f, b and ξ, wth d fxed. Then, n the second step, the weght vector d s updated to decrease the objectve functon of proble (2), wth f, b and ξ beng fxed. In Secton 2.3, we showed that the second step can be carred out n closed for. However, ths approach lacks convergence guarantees and ay lead to nuercal probles, n partcular when soe eleents of d approach zero (Grandvalet, 1998). Note that these nuercal probles can be handled by ntroducng a perturbed verson of the alternate algorth as shown by Argyrou et al. (28). Instead of usng an alternate optzaton algorth, we prefer to consder here the followng constraned optzaton proble: n d J(d) such that M d = 1, d, (9) =1 1. Note that Bach et al. (24a) forulaton dffers slghtly, n that the kernels are weghted by soe pre-defned coeffcents that were not consdered here. 7

8 Rakotoaonjy et al. where n {f},b,ξ J(d) = s.t. 1 1 f 2 H 2 d + C y f (x ) + y b 1 ξ ξ. We show below how to solve proble (9) on the splex by a sple gradent ethod. We wll frst note that the objectve functon J(d) s actually an optal SVM objectve value. We wll then dscuss the exstence and coputaton of the gradent of J( ), whch s at the core of the proposed approach. 3.1 Coputng the optal SVM value and ts dervatves The Lagrangan of proble (1) s dentcal to the frst lne of equaton (6). By settng to zero the dervatves of ths Lagrangan accordng to the pral varables, we get condtons (7) (a) to (c), fro whch we derve the assocated dual proble ax α wth 1 α α j y y j d K (x, x j ) + 2,j α y = C α, whch s dentfed as the standard SVM dual forulaton usng the cobned kernel K(x, x j ) = d K (x, x j ). Functon J(d) s defned as the optal objectve value of proble (1). Because of strong dualty, J(d) s also the objectve value of the dual proble: J(d) = 1 α α 2 jy y j d K (x, x j ) + α, (12),j where α axzes (11). Note that the objectve value J(d) can be obtaned by any SVM algorth. Our ethod can thus take advantage of any progress n sngle kernel algorths. In partcular, f the SVM algorth we use s able to handle large-scale probles, so wll our MKL algorth. Thus, the overall coplexty of SpleMKL s ted to the one of the sngle kernel SVM algorth. Fro now on, we assue that each Gra atrx (K (x, x j )),j s postve defnte, wth all egenvalues greater than soe η > (to enforce ths property, a sall rdge ay be added to the dagonal of the Gra atrces). Ths ples that, for any adssble value of d, the dual proble s strctly concave wth convexty paraeter η (Learéchal and Sagastzabal, 1997). In turn, ths strct concavty property ensures that α s unque, a characterstc that eases the analyss of the dfferentablty of J( ). Exstence and coputaton of dervatves of optal value functons such as J( ) have been largely dscussed n the lterature. For our purpose, the approprate reference s Theore 4.1 n Bonnans and Shapro (1998), whch has already been appled by Chapelle et al. (22) for tunng squared-hnge loss SVM. Ths theore s reproduced n the appendx for self-contanedness. In a nutshell, t says that dfferentablty of J(d) s ensured by ξ α (1) (11) 8

9 SpleMKL the uncty of α, and by the dfferentablty of the objectve functon that gves J(d). Furtherore, the dervatves of J(d) can be coputed as f α were not to depend on d. Thus, by sple dfferentaton of the dual functon (11) wth respect to d, we have: J d = 1 2 α αjy y j K (x, x j ). (13),j We wll see n the sequel that the applcablty of ths theore can be extended to other SVM probles. Note that coplexty of the gradent coputaton s of the order of n 2 SV, wth n SV beng the nuber of support vectors for the current d. 3.2 Reduced gradent algorth The optzaton proble we have to deal wth n (9) s a non-lnear objectve functon wth constrants over the splex. Wth our postvty assupton on the kernel atrces, J( ) s convex and dfferentable wth Lpschtz gradent (Learéchal and Sagastzabal, 1997). The approach we use for solvng ths proble s a reduced gradent ethod, whch converges for such functons (Luenberger, 1984). Once the gradent of J(d) s coputed, d s updated by usng a descent drecton ensurng that the equalty constrant and the non-negatvty constrants on d are satsfed. We handle the equalty constrant by coputng the reduced gradent (Luenberger, 1984, Chap. 11). Let d µ be a non-zero entry of d, the reduced gradent of J(d), denoted red J, has coponents: [ red J] = J d J d µ µ, and [ red J] µ = µ ( J J ) d µ d We chose µ to be the ndex of the largest coponent of vector d, for better nuercal stablty (Bonnans, 26). The postvty constrants have also to be taken nto account n the descent drecton. Snce we want to nze J( ), red J s a descent drecton. However, f there s an ndex such that d = and [ red J] >, usng ths drecton would volate the postvty constrant for d. Hence, the descent drecton for that coponent s set to. Ths gves the descent drecton for updatng d as D = f d = and J d J d µ > J + J f d > and µ d d µ ( J J ) for = µ. d ν d µ g µ,d ν > The usual updatng schee s d d+γd, where γ s the step sze. Here, as detaled n Algorth 1, we go one step beyond: once a descent drecton D has been coputed, we frst look for the axal adssble step sze n that drecton and check whether the objectve value decreases or not. The axal adssble step sze corresponds to a coponent, say d ν, set to zero. If the objectve value decreases, d s updated, we set D ν = and noralze D to coply wth the equalty constrant. Ths procedure s repeated untl the objectve value. (14) 9

10 Rakotoaonjy et al. Algorth 1 SpleMKL algorth set d = 1 M for = 1,..., M whle stoppng crteron not et do copute J(d) by usng an SVM solver wth K = d K copute J d for = 1,..., M and descent drecton D (14). set µ = argax d, J =, d = d, D = D whle J < J(d) do {descent drecton update} d = d, D = D ν = argn d /D, γ ax = d ν /D ν { D <} d = d + γ ax D, D µ = D µ D ν, D ν = copute J by usng an SVM solver wth K = d K end whle lne search along D for γ [, γ ax ] {calls an SVM solver for each γ tral value} d d + γd end whle stops decreasng. At ths pont, we look for the optal step sze γ, whch s deterned by usng a one-densonal lne search, wth proper stoppng crteron, such as Arjo s rule, to ensure global convergence. In ths algorth, coputng the descent drecton and the lne search are based on the evaluaton of the objectve functon J( ), whch requres solvng an SVM proble. Ths ay see very costly but, for sall varatons of d, learnng s very fast when the SVM solver s ntalzed wth the prevous values of α (DeCoste and Wagstaff., 2). Note that the gradent of the cost functon s not coputed after each update of the weght vector d. Instead, we take advantage of an easly updated descent drecton as long as the objectve value decreases. We wll see n the nuercal experents that ths approach saves a substantal aount of coputaton te copared to the usual update schee where the descent drecton s recoputed after each update of d. Note that we have also nvestgated gradent projecton algorths (Bertsekas, 1999, Chap 2.3), but ths turned out to be slghtly less effcent than the proposed approach, and we wll not report these results. The algorth s ternated when a stoppng crteron s et. Ths stoppng crteron can be ether based on the dualty gap, the KKT condtons, the varaton of d between two consecutve steps or, even ore sply, on a axal nuber of teratons. Our pleentaton, based on the dualty gap, s detaled n the forthcong secton. 3.3 Optalty condtons In a convex constraned optzaton algorth such as the one we are consderng, we have the opportunty to check for proper optalty condtons such as the KKT condtons or the dualty gap (the dfference between pral and dual objectve values), whch should be zero at the optu. Fro the pral and dual objectves provded respectvely n (2) and (8), the MKL dualty gap s 1

11 SpleMKL DualGap = J(d ) α ax α αjy y j K (x, x j ),, where d and {α } are optal pral and dual varables, and J(d ) depends plctly on optal pral varables {f}, b and {ξ }. If J(d ) has been obtaned through the dual proble (11), then ths MKL dualty gap can also be coputed fro the sngle kernel SVM algorth dualty gap DG SVM. Indeed, equaton (12) holds only when the sngle kernel SVM algorth returns an exact soluton wth DG SVM =. Otherwse, we have DG SVM = J(d ) + 1 α α 2 jy y j d K (x, x j ) α,j then the MKL dualty gap becoes DualGap = DG SVM 1 α α 2 jy y j d K (x, x j ) ax α αjy y j K (x, x j ).,j Hence, t can be obtaned wth a sall addtonal coputatonal cost copared to the SVM dualty gap. In teratve procedures, t s coon to stop the algorth when the optalty condtons are respected up to a tolerance threshold ε. Obvously, SpleMKL has no pact on DG SVM, hence, one ay assue, as we dd here, that DG SVM needs not to be ontored. Consequently, we ternate the algorth when,j ax α αjy y j K (x, x j ) α αjy y j d K (x, x j ) ε. (15),j,j For soe of the other MKL algorths that wll be presented n Secton 4, the dual functon ay be ore dffcult to derve. Hence, t ay be easer to rely on approxate KKT condtons as a stoppng crteron. For the general MKL proble (9), the frst order optalty condtons are obtaned through the KKT condtons: J + λ η d = η d =, where λ and {η } are respectvely the Lagrange ultplers for the equalty and nequalty constrants of (9). These KKT condtons ply J d = λ f d > J d λ f d =. However, as Algorth 1 s not based on the Lagrangan forulaton of proble (9), λ s not coputed. Hence, we derve approxate necessary optalty condtons to be used for ternaton crteron. Let s defne dj n and dj ax as dj n = n {d d >},j J d and dj ax = ax {d d >} 11 J d,

12 Rakotoaonjy et al. then, the necessary optalty condtons are approxated by the followng ternaton condtons: dj n dj ax ε and J d dj ax f d = In other words, we are consdered at the optu when the gradent coponents for all postve d le n a ε-tube and when all gradent coponents for vanshng d are outsde ths tube. Note that these approxate necessary optalty condtons are avalable rght away for any dfferentable objectve functon J(d). 3.4 Cuttng Planes, Steepest Descent and Coputatonal Coplexty As we stated n the ntroducton, several algorths have been proposed for solvng the orgnal MKL proble defned by Lanckret et al. (24b). All these algorths are based on equvalent forulatons of the sae dual proble; they all a at provdng a par of optal vectors (d, α). In ths subsecton, we contrast SpleMKL wth ts closest relatve, the SILP algorth of Sonnenburg et al. (25, 26). Indeed, fro an pleentaton pont of vew, the two algorths are alke, snce they are wrappng a standard sngle kernel SVM algorth. Ths feature akes both algorths very easy to pleent. They, however, dffer n coputatonal effcency, because the kernel weghts d are optzed n qute dfferent ways, as detaled below. Let us frst recall that our dfferentable functon J(d) s defned as: ax α J(d) = wth 1 α α j y y j d K (x, x j ) + 2,j α y =, C α, α and both algorths a at nzng ths dfferentable functon. However, usng a SILP approach n ths case, does not take advantage of the soothness of the objectve functon. The SILP algorth of Sonnenburg et al. (26) s a cuttng plane ethod to nze J wth respect to d. For each value of d, the best α s found and leads to an affne lower bound on J(d). The nuber of lower boundng affne functons ncreases as ore (d, α) pars are coputed, and the next canddate vector d s the nzer of the current lower bound on J(d), that s, the axu over all the affne functons. Cuttng planes ethod do converge but they are known for ther nstablty, notably when the nuber of lower-boundng affne functons s sall: the approxaton of the objectve functon s then loose and the terates ay oscllate (Bonnans et al., 23). Our steepest descent approach, wth the proposed lne search, does not suffer fro nstablty snce we have a dfferentable functon to nze. Fgure 1 llustrates the behavour of both algorths n a sple case, wth oscllatons for cuttng planes and drect convergence for gradent descent. Secton 5 evaluates how these oscllatons pact on the coputatonal te of the SILP algorth on several exaples. These experents show that our algorth needs less costly gradent coputatons. Conversely, the lne search n the gradent base approach requres ore SVM retranngs n the process of queryng the objectve functon. However, 12

13 SpleMKL Fgure 1: Illustratng three teratons of the SILP algorth and a gradent descent algorth for a one-densonal proble. Ths densonalty s not representatve of the MKL fraework, but our a s to llustrate the typcal oscllatons of cuttng planes around the optal soluton (wth terates d to d 3 ). Note that coputng an affne lower bound at a gven d requres a gradent coputaton. Provded the step sze s chosen correctly, gradent descent converges drectly towards the optal soluton wthout overshootng (fro d to d ). the coputaton te per SVM tranng s consderably reduced, snce the gradent based approach produces estates of d on a sooth trajectory, so that the prevous SVM soluton provdes a good guess for the current SVM tranng. In SILP, wth the oscllatory subsequent approxatons of d, the beneft of war-start tranng severely decreases. 3.5 Convergence Analyss In ths paragraph, we brefly dscuss the convergence of the algorth we propose. We frst suppose that proble (1) s always exactly solved, whch eans that the dualty gap of such proble s. Wth such condtons, the gradent coputaton n (13) s exact and thus our algorth perfors reduced gradent descent on a contnuously dfferentable functon J( ) (reeber that we have assued that the kernel atrces are postve defnte) defned on the splex {d d = 1, d }, whch does converge to the global nu of J (Luenberger, 1984). However, n practce, proble (1) s not solved exactly snce ost SVM algorths wll stop when the dualty gap s saller than a gven ε. In ths case, the convergence of our projected gradent ethod s no ore guaranteed by standard arguents. Indeed, the output of the approxately solved SVM leads only to an ε-subgradent (Bonnans et al., 23; Bach et al., 24a). Ths stuaton s ore dffcult to analyze and we plan to address t thoroughly n future work (see for nstance D Aspreont (28) for an exaple of such analyss n a slar context). 13

14 Rakotoaonjy et al. 4. Extensons In ths secton, we dscuss how the proposed algorth can be sply extended to other SVM algorths such as SVM regresson, one-class SVM or parwse ultclass SVM algorths. More generally, we wll dscuss other loss functons that can be used wthn our MKL algorths. 4.1 Extensons to other SVM Algorths The algorth we descrbed n the prevous secton focuses on bnary classfcaton SVMs, but t s worth notng that our MKL algorth can be extended to other SVM algorths wth only lttle changes. For SVM regresson wth the ε-nsenstve loss, or clusterng wth the one-class soft argn loss, the proble only changes n the defnton of the objectve functon J(d) n (1). For SVM regresson (Vapnk et al., 1997; Schölkopf and Sola, 21), we have J(d) = 1 n f,b,ξ 2 s.t. y 1 d f 2 H + C f (x ) b ε + ξ f (x ) + b y ε + ξ ξ, ξ, (ξ + ξ ) (16) and for one-class SVMs (Schölkopf and Sola, 21), we have: 1 1 n f 2 H f,b,ξ 2 d + 1 ξ b νl J(d) = s.t. f (x ) b ξ ξ. (17) Agan, J(d) can be defned accordng to the dual functons of these two optzaton probles, whch are respectvely ax α,β J(d) = wth and (β α )y ε (β + α ) 1 (β α )(β j α j ) d K (x, x j ) 2,j (β α ) = α, β C,, ax α J(d) = wth 1 α α j d K (x, x j ) 2,j α 1 νl α = 1, 14 (18) (19)

15 SpleMKL where {α } and {β } are Lagrange ultplers. Then, as long as J(d) s dfferentable, a property strctly related to the strct concavty of ts dual functon, our descent algorth can stll be appled. The an effort for the extenson of our algorth s the evaluaton of J(d) and the coputaton of ts dervatves. Lke for bnary classfcaton SVM, J(d) can be coputed by eans of effcent off-the-shelf SVM solvers and the gradent of J(d) s easly obtaned through the dual probles. For SVM regresson, we have: J d = 1 2 (β α )(βj αj)k (x, x j ), (2),j and for one-class SVM, we have: J d = 1 2 α αjk (x, x j ), (21),j where α and β are the optal values of the Lagrange ultplers. These exaples llustrate that extendng SpleMKL to other SVM probles s rather straghforward. Ths observaton s vald for other SVM algorths (based for nstance on the ν paraeter, a squared hnge loss or squared-ε tube) that we do not detal here. Agan, our algorth can be used provded J(d) s dfferentable, by pluggng n the algorth the functon that evaluates the objectve value J(d) and ts gradent. Of course, the dualty gap ay be consdered as a stoppng crteron f t can be coputed. 4.2 Multclass Multple Kernel Learnng Wth SVMs, ultclass probles are custoarly solved by cobnng several bnary classfers. The well-known one-aganst-all and one-aganst-one approaches are the two ost coon ways for buldng a ultclass decson functon based on parwse decson functons. Multclass SVM ay also be defned rght away as the soluton of a global optzaton proble (Weston and Watkns, 1999; Craer and Snger, 21), that ay also be addressed wth structured-output SVM (Tsochantards et al., 25). Very recently, an MKL algorth based on structured-output SVM has been proposed by Zen and Ong (27). Ths work extends the work of Sonnenburg et al. (26) to ultclass probles, wth an MKL pleentaton stll based on a QCQP or SILP approach. Several works have copared the perforance of ultclass SVM algorths (Duan and Keerth, 25; Hsu and Ln, 22; Rfkn and Klautau, 24). In ths subsecton, we do not deal wth ths aspect; we explan how SpleMKL can be extended to parwse SVM ultclass pleentatons. The proble of applyng our algorth to structured-output SVM wll be brefly dscussed later. Suppose we have a ultclass proble wth P classes. For a one-aganst-all ultclass SVM, we need to tran P bnary SVM classfers, where the p-th classfer s traned by consderng all exaples of class p as postve exaples whle all other exaples are consdered negatve. For a one-aganst-one ultclass proble, P (P 1)/2 bnary SVM classfers are bult fro all pars of dstnct classes. Our ultclass MKL extenson of SpleMKL dffers fro the bnary verson only n the defnton of a new cost functon J(d). As we now look 15

16 Rakotoaonjy et al. for the cobnaton of kernels that jontly optzes all the parwse decson functons, the objectve functon we want to optze accordng to the kernel weghts {d } s: J(d) = p P J p (d), where P s the set of all pars to be consdered, and J p (d) s the bnary SVM objectve value for the classfcaton proble pertanng to par p. Once the new objectve functon s defned, the lnes of Algorth 1 stll apply. The gradent of J(d) s stll very sple to obtan, snce owng to lnearty, we have: J d = 1 2 α,pα j,py y j K (x, x j ), (22) p P,j where α j,p s the Lagrange ultpler of the j-th exaple nvolved n the p-th decson functon. Note that those Lagrange ultplers can be obtaned ndependently for each par. The approach descrbed above as at fndng the cobnaton of kernels that jontly optzes all bnary classfcaton probles: ths one set of features should axze the su of argns. Another possble and straghtforward approach conssts n runnng ndependently SpleMKL for each classfcaton task. However, ths choce s lkely to result n as any cobnatons of kernels as there are bnary classfers. 4.3 Other loss functons Multple kernel learnng has been of great nterest and snce the senal work of Lanckret et al. (24b), several works on ths topc have flourshed. For nstance, ultple kernel learnng has been transposed to least-square fttng and logstc regresson (Bach et al., 24b). Independently, several authors have appled xed-nor regularzaton, such as the addtve splne regresson odel of Grandvalet and Canu (1999). Ths type of regularzaton, whch s now known as the group lasso, ay be seen as a lnear verson of ultple kernel learnng (Bach, 28). Several algorths have been proposed for solvng the group lasso proble. Soe of the are based on projected gradent or on coordnate descent algorth. However, they all consder the non-sooth verson of the proble. We prevously entoned that Zen and Ong (27) have proposed an MKL algorth based on structured-output SVMs. For such proble, the loss functon, whch dffers fro the usual SVM hnge loss, leads to an algorth based on cuttng planes nstead of the usual QP approach. Provded the gradent of the objectve value can be obtaned, our algorth can be appled to group lasso and structured-output SVMs. The key pont s whether the theore of Bonnans et al. (23) can be appled or not. Although we have not deeply nvestgated ths pont, we thnk that any probles coply wth ths requreent, but we leave these developents for future work. 4.4 Approxate regularzaton path SpleMKL requres the settng of the usual SVM hyperparaeter C, whch usually needs to be tuned for the proble at hand. For dong so, a practcal and useful technque 16

17 SpleMKL s to copute the so-called regularzaton path, whch descrbes the set of solutons as C vares fro to. Exact path followng technques have been derved for soe specfc probles lke SVMs or the lasso (Haste et al., 24; Efron et al., 24). Besdes, regularzaton paths can be sapled by predctor-corrector ethods (Rosset, 24; Bach et al., 24b). For odel selecton purposes, an approxaton of the regularzaton path ay be suffcent. Ths approach has been appled for nstance by Koh et al. (27) n regularzed logstc regresson. Here, we copute an approxate regularzaton path based on a war-start technque. Suppose, that for a gven value of C, we have coputed the optal (d, α ) par; the dea of a war-start s to use ths soluton for ntalzng another MKL proble wth a dfferent value of C. In our case, we teratvely copute the solutons for decreasng values of C (note that α has to be odfed to be a feasble ntalzaton of the ore constraned SVM proble). 5. Nuercal experents In ths experental secton, we essentally a at llustratng three ponts. The frst pont s to show that our gradent descent algorth s effcent. Ths s acheved by bnary classfcaton experents, where SpleMKL s copared to the SILP approach of Sonnenburg et al. (26). Then, we llustrate the usefulness of a ultple kernel learnng approach n the context of regresson. The exaples we use are based on wavelet-based regresson n whch the ultple kernel learnng fraework naturally fts. The fnal experent as at evaluatng the ultple kernel approach n a odel selecton proble for soe ultclass probles. 5.1 Coputaton te The a of ths frst set of experents s to assess the runnng tes of SpleMKL. 2 Frst, we copare wth SILP regardng the te requred for coputng a sngle soluton of MKL wth a gven C hyperparaeter. Then, we copute an approxate regularzaton path by varyng C values. We fnally provde hnts on the expected coplexty of SpleMKL, by easurng the growth of runnng te as the nuber of exaples or kernels ncreases Te needed for reachng a sngle soluton In ths frst benchark, we put SpleMKL and SILP sde by sde, for a fxed value of the hyperparaeter C (C = 1). Ths procedure, whch does not take nto account a proper odel selecton procedure, s not representatve of the typcal use of SVMs. It s however relevant for the purpose of coparng algorthc ssues. The evaluaton s ade on fve datasets fro the UCI repostory: Lver, Wpbc, Ionosphere, Pa, Sonar (Blake and Merz, 1998). The canddate kernels are: 2. All the experents have been run on a Pentu D-3 GHz wth 3 GB of RAM. 17

18 Rakotoaonjy et al. Gaussan kernels wth 1 dfferent bandwdths σ, on all varables and on each sngle varable; polynoal kernels of degree 1 to 3, agan on all and each sngle varable. All kernel atrces have been noralzed to unt trace, and are precoputed pror to runnng the algorths. Both SpleMKL and SILP wrap an SVM dual solver based on SpleSVM, an actve constrants ethod wrtten n Matlab (Canu et al., 23). The descent procedure of SpleMKL s also pleented n Matlab, whereas the lnear prograng nvolved n SILP s pleented n the publcly avalable toolbox LPSOLVE (Berkelaar et al., 24). For a far coparson, we use the sae stoppng crteron for both algorths. They halt when, ether the dualty gap s lower than.1, or the nuber of teratons exceeds 2. Quanttatvely, the dsplayed results dffer fro the prelnary verson of ths work, where the stoppng crteron was based on the stablzaton of the weghts, but they are qualtatvely slar (Rakotoaonjy et al., 27). For each dataset, the algorths were run 2 tes wth dfferent tran and test sets (7% of the exaples for tranng and 3% for testng). Tranng exaples were noralzed to zero ean and unt varance. In Table 1, we report dfferent perforance easures: accuracy, nuber of selected kernels and runnng te. As the latter s anly spent n queryng the SVM solver and n coputng the gradent of J wth respect to d, the nuber of calls to these two routnes s also reported. Both algorths are nearly dentcal n perforance accuracy. Ther nuber of selected kernels are of sae agntude, although SpleMKL tends to select 1 to 2% ore kernels. As both algorths address the sae convex optzaton proble, wth convergent ethods startng fro the sae ntalzaton, the observed dfferences are only due to the naccuracy of the soluton when the stoppng crteron s et. Hence, the trajectores chosen by each algorth for reachng the soluton, detaled n Secton 3.4, explan the dfferences n the nuber of selected kernels. The updates of d based on the descent algorth of SpleMKL are rather conservatve (sall steps departng fro 1/M for all d ), whereas the oscllatons of cuttng planes are lkely to favor extree solutons, httng the edges of the splex. Ths explanaton s corroborated by Fgure 2, whch copares the behavor of the d coeffcents through te. The nstablty of SILP s clearly vsble, wth very hgh oscllatons n the frst teratons and a notceable resdual nose n the long run. In coparson, the trajectores for SpleSVM are uch soother. If we now look at the overall dfference n coputaton te reported n Table 1, clearly, on all data sets, SpleSVM s faster than SILP, wth an average gan factor of about 5. Furtherore, the larger the nuber of kernels s, the larger the speed gan we acheve. Lookng at the last colun of Table 1, we see that the an reason for proveent s that SpleMKL converges n fewer teratons (that s, gradent coputatons). It ay see surprsng that ths gan s not conterbalanced by the fact that SpleMKL requres any ore calls to the SVM solver (on average, about 4 tes). As we stated n Secton 3.4, when the nuber of kernels s large, coputng the gradent ay be expensve copared to SVM retranng wth war-start technques. 18

19 SpleMKL Table 1: Average perforance easures for the two MKL algorths and a plan gradent descent algorth. Lver l = 241 M = 91 Algorth # Kernel Accuracy Te (s) # SVM eval # Gradent eval SILP 1.6 ± ± ± ± ± 2 SpleMKL 11.2 ± ± ± ± ± 26 Grad. Desc ± ± ± ± ± 27 Pa l = 538 M = 117 Algorth # Kernel Accuracy Te (s) # SVM eval # Gradent eval SILP 11.6 ± ± ± ± ± 13 SpleMKL 14.7 ± ± ± ± ± 4.8 Grad. Desc ± ± ± ± ± 8.7 Ionosphere l = 246 M = 442 Algorth # Kernel Accuracy Te (s) # SVM eval # Gradent eval SILP 21.6 ± ± ± ± ± 53 SpleMKL 23.6 ± ± ± ± ± 25 Grad. Desc ± ± ± ± ± 38 Wpbc l = 136 M = 442 Algorth # Kernel Accuracy Te (s) # SVM eval # Gradent eval SILP 13.7 ± ± ± ± ± 44 SpleMKL 15.8 ± ± ± ± ± 1 Grad. Desc ± ± ± ± ± 16 Sonar l = 146 M = 793 Algorth # Kernel Accuracy Te (s) # SVM eval # Gradent eval SILP 33.5 ± ± ± ± ± 187 SpleMKL 36.7 ± ± ± ± ± 66 Grad. Desc ± ± ± ± ± 99 19

20 Rakotoaonjy et al. d k SpleMKL d k SpleMKL d k SILP Iteratons Pa d k SILP Iteratons Ionosphere Fgure 2: Evoluton of the fve largest weghts d for SpleMKL and SILP; left row: Pa; rght row: Ionosphere. To understand why, wth ths large nuber of calls to the SVM solver, SpleMKL s stll uch faster than SILP, we have to look back at Fgure 2. On the one hand, the large varatons n subsequents d values for SILP, ental that subsequent SVM probles are not lkely to have slar solutons: a war-start call to the SVM solver does not help uch. On the other hand, wth the sooth trajectores of d n SpleMKL, the prevous SVM soluton s often a good guess for the current proble: a war-start call to the SVM solver results n uch less coputaton than a call fro scratch. Table 1 also shows the results obtaned when replacng the update schee descrbed n Algorth 1 by a usual reduced gradent update, whch, at each teraton, odfes d by coputng the optal step sze on the descent drecton D (14). The tranng of ths varant s consderably slower than SpleMKL and s only slghtly better than SILP. We see that the gradent descent updates requre any ore calls to the SVM solver and a nuber of gradent coputatons coparable wth SILP. Note that, copared to SILP, the nuerous addtonal calls to the SVM solver have not a drastc effect on runnng te. The gradent updates are stable, so that they can beneft fro war-start contrary to SILP. To end ths frst seres of experents, Fgure 3 depcts the evoluton of the objectve functon for the data sets that were used n Fgure 2. Besdes the fact that SILP needs ore teratons for achevng a good approxaton of the fnal soluton, t s worth notng that the objectve values rapdly reach ther steady state whle stll beng far fro convergence, when d values are far fro beng settled. Thus, ontorng objectve values s not sutable to assess convergence Te needed for gettng an approxate regularzaton path In practce, the optal value of C s unknown, and one has to solve several SVM probles, spannng a wde range of C values, before choosng a soluton accordng to soe odel selecton crteron lke the cross-valdaton error. Here, we further pursue the coparson 2

21 SpleMKL 3.5 x 14 SpleMKL SILP 15 SpleMKL SILP Objectve value Objectve value Iteratons Pa Iteratons Ionosphere Fgure 3: Evoluton of the objectve values for SpleSVM and SILP; left row: Pa; rght row: Ionosphere. of the runnng tes of SpleMKL and SILP, n a seres of experents that nclude the search for a sensble value of C. In ths new benchark, we use the sae data sets as n the prevous experents, wth the sae kernel settngs. The task s only changed n the respect that we now evaluate the runnng tes needed by both algorths to copute an approxate regularzaton path. For both algorths, we use a sple war-start technque, whch conssts n usng the optal solutons {d } and {α } obtaned for a gven C to ntalze a new MKL proble wth C + C (DeCoste and Wagstaff., 2). As descrbed n Secton 4.4, we start fro the largest C and then approxate the regularzaton path by decreasng ts value. The set of C values s obtaned by evenly saplng the nterval [.1, 1] on a logarthc scale. Fgure 4 shows the varatons of the nuber of selected kernels and the values of d along the regularzaton path for the Pa and Wpbc datasets. The nuber of kernels s not a onotone functon of C: for sall values of C, the nuber of kernels s soewhat constant, then, t rses rapdly. There s a sall overshooth before reachng a plateau correspondng to very hgh values of C. Ths trend s slar for the nuber of leadng ters n the kernel weght vector d. Both phenoenon were observed consstently over the datasets we used. Table 2 dsplays the average coputaton te (over 1 runs) requred for buldng the approxate regularzaton path. As prevously, SpleMKL s ore effcent than SILP, wth a gan factor ncreasng wth the nuber of kernels n the cobnaton. The range of gan factors, fro 5.9 to 23, s even ore pressve than n the prevous benchark. SpleMKL benefts fro the contnuty of solutons along the regularzaton path, whereas SILP does not take advantage of war starts. Even provded wth a good ntalzaton, t needs any cuttng planes to stablze. 21

22 Rakotoaonjy et al. nuber of selected kernels d k C C Pa nuber of selected kernels d k C C Wpbc Fgure 4: Regularzaton paths for d and the nuber of selected kernels versus C; left row: Pa; rght row: Wpbc. Table 2: Average coputaton te (n seconds) for gettng an approxate regularzaton path. For the Sonar data set, SILP was extreely slow, so that regularzaton path was coputed only once. Dataset SpleMKL SILP Rato Lver 148 ± ± Pa 13 ± ± Ionosphere 129 ± ± Wpbc 88 ± ± Sonar 625 ± (*)

23 SpleMKL More on SpleMKL runnng tes Here, we provde an eprcal assessent of the expected coplexty of SpleMKL on dfferent data sets fro the UCI repostory. We frst look at the stuaton where kernel atrces can be pre-coputed and stored n eory, before reportng experents where the eory are too hgh, leadng to repeated kernel evaluatons. In a frst set of experents, we use Gaussan kernels, coputed on rando subsets of varables and wth rando wdth. These kernels are precoputed and stored n eory, and we report the average CPU runnng tes obtaned fro 2 runs dfferng n the rando draw of tranng exaples. The stoppng crteron s the sae as n the prevous secton: a relatve dualty gap less than ε =.1. The frst two rows of Fgure 5 depcts the growth of coputaton te as the nuber of kernel ncreases. We observe a nearly lnear trend for the four learnng probles. Ths growth rate could be expected consderng the lnear convergence property of gradent technques, but the absence of overhead s valuable. The last row of Fgure 5 depcts the growth of coputaton te as the nuber of exaples ncreases. Here, the nuber of kernels s set to 1. In these plots, the observed trend s clearly superlnear. Agan, ths trend could be expected, consderng that SVM expected tranng tes are superlnear n the nuber of tranng exaples. As we already entoned, the coplexty of SpleMKL s tghtly lnked to the one of SVM tranng (for soe exaples of sngle kernel SVM runnng te, one can refer to the work of Loosl and Canu (27)). When all the kernels used for MKL cannot be stored n eory, one can resort to a decoposton ethod. Table 3 reports the average coputaton tes, over 1 runs, n ths ore dffcult stuaton. The large-scale SVM schee of Joachs (1999) has been pleented, wth bass kernels recoputed whenever needed. Ths approach s coputatonally expensve but goes wth no eory lt. For these experents, the stoppng crteron s based on the varaton of the weghts d. As shown n Fgure 2, the kernel weghts rapdly reach a steady state and any teratons are spent for fne tunng the weght and reach the dualty gap tolerance. Here, we trade the optalty guarantees provded by the dualty gap for substantal coputatonal te savngs. The algorth ternates when the kernel weghts varaton s lower than.1. Results reported n Table 3 just a at showng that edu and large-scale stuatons can be handled by SpleMKL. Note that Sonnenburg et al. (26) have run a odfed verson of ther SILP algorth on a larger scale datasets. However, for such experents, they have taken advantage of soe specfc feature ap propertes. And, as they stated, for general cases where kernel atrces are dense, they have to rely on the SILP algorth we used n ths secton for effcency coparson. 5.2 Multple kernel regresson exaples Several research papers have already claed that usng ultple kernel learnng can lead to better generalzaton perforances n soe classfcaton probles (Lanckret et al., 24a; Zen and Ong, 27; Harchaou and Bach, 27). Ths next experent as at llustratng ths pont but n the context of regresson. The proble we deal wth s a classcal unvarate regresson proble where the desgn ponts are rregular (D Aato et al., 23

24 Rakotoaonjy et al. cpu te n seconds cpu te n seconds Nuber of kernels Nuber of kernels Credt, l = 588 Yeast, l = cpu te n seconds cpu te n seconds Nuber of kernels Nuber of kernels Spadata, l = 138 Optdgts, l = cpu te n seconds cpu te n seconds Nuber of tranng exaples Nuber of tranng exaples Spadata, M = 1 Optdgts, M = 1 Fgure 5: SpleMKL average coputaton tes for dfferent datasets; top two rows: nuber of tranng exaples fxed, nuber of kernels varyng; botto row: nuber of tranng exaples varyng, nuber of kernels fxed. Table 3: Average coputaton te needed by SpleSVM usng decoposton ethods. Dataset Nb Exaples # Kernel Accuracy (%) Te (s) Yeast Spadata

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 1998-2014. All Rghts Reserved. Created: July 15, 1999 Last Modfed: February 9, 2008 Contents 1 Lnear Fttng

More information

What is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner.

What is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner. (C) 998 Gerald B Sheblé, all rghts reserved Lnear Prograng Introducton Contents I. What s LP? II. LP Theor III. The Splex Method IV. Refneents to the Splex Method What s LP? LP s an optzaton technque that

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 1998-2015. All Rghts Reserved. Created: July 15, 1999 Last Modfed: January 5, 2015 Contents 1 Lnear Fttng

More information

Computational and Statistical Learning theory Assignment 4

Computational and Statistical Learning theory Assignment 4 Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

COMP th April, 2007 Clement Pang

COMP th April, 2007 Clement Pang COMP 540 12 th Aprl, 2007 Cleent Pang Boostng Cobnng weak classers Fts an Addtve Model Is essentally Forward Stagewse Addtve Modelng wth Exponental Loss Loss Functons Classcaton: Msclasscaton, Exponental,

More information

COS 511: Theoretical Machine Learning

COS 511: Theoretical Machine Learning COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that

More information

XII.3 The EM (Expectation-Maximization) Algorithm

XII.3 The EM (Expectation-Maximization) Algorithm XII.3 The EM (Expectaton-Maxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles

More information

1 Definition of Rademacher Complexity

1 Definition of Rademacher Complexity COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #9 Scrbe: Josh Chen March 5, 2013 We ve spent the past few classes provng bounds on the generalzaton error of PAClearnng algorths for the

More information

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS Darusz Bskup 1. Introducton The paper presents a nonparaetrc procedure for estaton of an unknown functon f n the regresson odel y = f x + ε = N. (1) (

More information

Applied Mathematics Letters

Applied Mathematics Letters Appled Matheatcs Letters 2 (2) 46 5 Contents lsts avalable at ScenceDrect Appled Matheatcs Letters journal hoepage: wwwelseverco/locate/al Calculaton of coeffcents of a cardnal B-splne Gradr V Mlovanovć

More information

1 Review From Last Time

1 Review From Last Time COS 5: Foundatons of Machne Learnng Rob Schapre Lecture #8 Scrbe: Monrul I Sharf Aprl 0, 2003 Revew Fro Last Te Last te, we were talkng about how to odel dstrbutons, and we had ths setup: Gven - exaples

More information

System in Weibull Distribution

System in Weibull Distribution Internatonal Matheatcal Foru 4 9 no. 9 94-95 Relablty Equvalence Factors of a Seres-Parallel Syste n Webull Dstrbuton M. A. El-Dacese Matheatcs Departent Faculty of Scence Tanta Unversty Tanta Egypt eldacese@yahoo.co

More information

Xiangwen Li. March 8th and March 13th, 2001

Xiangwen Li. March 8th and March 13th, 2001 CS49I Approxaton Algorths The Vertex-Cover Proble Lecture Notes Xangwen L March 8th and March 3th, 00 Absolute Approxaton Gven an optzaton proble P, an algorth A s an approxaton algorth for P f, for an

More information

Recap: the SVM problem

Recap: the SVM problem Machne Learnng 0-70/5-78 78 Fall 0 Advanced topcs n Ma-Margn Margn Learnng Erc Xng Lecture 0 Noveber 0 Erc Xng @ CMU 006-00 Recap: the SVM proble We solve the follong constraned opt proble: a s.t. J 0

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

y new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion)

y new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion) Feature Selecton: Lnear ransforatons new = M x old Constrant Optzaton (nserton) 3 Proble: Gven an objectve functon f(x) to be optzed and let constrants be gven b h k (x)=c k, ovng constants to the left,

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

SimpleMKL. hal , version 1-26 Jan Abstract. Alain Rakotomamonjy LITIS EA 4108 Université de Rouen Saint Etienne du Rouvray, France

SimpleMKL. hal , version 1-26 Jan Abstract. Alain Rakotomamonjy LITIS EA 4108 Université de Rouen Saint Etienne du Rouvray, France SipleMKL Alain Rakotoaonjy LITIS EA 48 Université de Rouen 768 Saint Etienne du Rouvray, France Francis Bach INRIA - Willow project Départeent d Inforatique, Ecole Norale Supérieure 45, Rue d Ul 7523 Paris,

More information

Preference and Demand Examples

Preference and Demand Examples Dvson of the Huantes and Socal Scences Preference and Deand Exaples KC Border October, 2002 Revsed Noveber 206 These notes show how to use the Lagrange Karush Kuhn Tucker ultpler theores to solve the proble

More information

CHAPTER 6 CONSTRAINED OPTIMIZATION 1: K-T CONDITIONS

CHAPTER 6 CONSTRAINED OPTIMIZATION 1: K-T CONDITIONS Chapter 6: Constraned Optzaton CHAPER 6 CONSRAINED OPIMIZAION : K- CONDIIONS Introducton We now begn our dscusson of gradent-based constraned optzaton. Recall that n Chapter 3 we looked at gradent-based

More information

Machine Learning. What is a good Decision Boundary? Support Vector Machines

Machine Learning. What is a good Decision Boundary? Support Vector Machines Machne Learnng 0-70/5 70/5-78 78 Sprng 200 Support Vector Machnes Erc Xng Lecture 7 March 5 200 Readng: Chap. 6&7 C.B book and lsted papers Erc Xng @ CMU 2006-200 What s a good Decson Boundar? Consder

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Slobodan Lakić. Communicated by R. Van Keer

Slobodan Lakić. Communicated by R. Van Keer Serdca Math. J. 21 (1995), 335-344 AN ITERATIVE METHOD FOR THE MATRIX PRINCIPAL n-th ROOT Slobodan Lakć Councated by R. Van Keer In ths paper we gve an teratve ethod to copute the prncpal n-th root and

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

4 Column generation (CG) 4.1 Basics of column generation. 4.2 Applying CG to the Cutting-Stock Problem. Basic Idea of column generation

4 Column generation (CG) 4.1 Basics of column generation. 4.2 Applying CG to the Cutting-Stock Problem. Basic Idea of column generation 4 Colun generaton (CG) here are a lot of probles n nteger prograng where even the proble defnton cannot be effcently bounded Specfcally, the nuber of coluns becoes very large herefore, these probles are

More information

Machine Learning. Support Vector Machines. Eric Xing. Lecture 4, August 12, Reading: Eric CMU,

Machine Learning. Support Vector Machines. Eric Xing. Lecture 4, August 12, Reading: Eric CMU, Machne Learnng Support Vector Machnes Erc Xng Lecture 4 August 2 200 Readng: Erc Xng @ CMU 2006-200 Erc Xng @ CMU 2006-200 2 What s a good Decson Boundar? Wh e a have such boundares? Irregular dstrbuton

More information

CHAPTER 7 CONSTRAINED OPTIMIZATION 1: THE KARUSH-KUHN-TUCKER CONDITIONS

CHAPTER 7 CONSTRAINED OPTIMIZATION 1: THE KARUSH-KUHN-TUCKER CONDITIONS CHAPER 7 CONSRAINED OPIMIZAION : HE KARUSH-KUHN-UCKER CONDIIONS 7. Introducton We now begn our dscusson of gradent-based constraned optzaton. Recall that n Chapter 3 we looked at gradent-based unconstraned

More information

The Parity of the Number of Irreducible Factors for Some Pentanomials

The Parity of the Number of Irreducible Factors for Some Pentanomials The Party of the Nuber of Irreducble Factors for Soe Pentanoals Wolfra Koepf 1, Ryul K 1 Departent of Matheatcs Unversty of Kassel, Kassel, F. R. Gerany Faculty of Matheatcs and Mechancs K Il Sung Unversty,

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

LECTURE :FACTOR ANALYSIS

LECTURE :FACTOR ANALYSIS LCUR :FACOR ANALYSIS Rta Osadchy Based on Lecture Notes by A. Ng Motvaton Dstrbuton coes fro MoG Have suffcent aount of data: >>n denson Use M to ft Mture of Gaussans nu. of tranng ponts If

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Chapter 12 Lyes KADEM [Thermodynamics II] 2007

Chapter 12 Lyes KADEM [Thermodynamics II] 2007 Chapter 2 Lyes KDEM [Therodynacs II] 2007 Gas Mxtures In ths chapter we wll develop ethods for deternng therodynac propertes of a xture n order to apply the frst law to systes nvolvng xtures. Ths wll be

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Scattering by a perfectly conducting infinite cylinder

Scattering by a perfectly conducting infinite cylinder Scatterng by a perfectly conductng nfnte cylnder Reeber that ths s the full soluton everywhere. We are actually nterested n the scatterng n the far feld lt. We agan use the asyptotc relatonshp exp exp

More information

PROBABILITY AND STATISTICS Vol. III - Analysis of Variance and Analysis of Covariance - V. Nollau ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE

PROBABILITY AND STATISTICS Vol. III - Analysis of Variance and Analysis of Covariance - V. Nollau ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE V. Nollau Insttute of Matheatcal Stochastcs, Techncal Unversty of Dresden, Gerany Keywords: Analyss of varance, least squares ethod, odels wth fxed effects,

More information

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. ) with a symmetric Pcovariance matrix of the y( x ) measurements V

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. ) with a symmetric Pcovariance matrix of the y( x ) measurements V Fall Analyss o Experental Measureents B Esensten/rev S Errede General Least Squares wth General Constrants: Suppose we have easureents y( x ( y( x, y( x,, y( x wth a syetrc covarance atrx o the y( x easureents

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

An Optimal Bound for Sum of Square Roots of Special Type of Integers

An Optimal Bound for Sum of Square Roots of Special Type of Integers The Sxth Internatonal Syposu on Operatons Research and Its Applcatons ISORA 06 Xnang, Chna, August 8 12, 2006 Copyrght 2006 ORSC & APORC pp. 206 211 An Optal Bound for Su of Square Roots of Specal Type

More information

Denote the function derivatives f(x) in given points. x a b. Using relationships (1.2), polynomials (1.1) are written in the form

Denote the function derivatives f(x) in given points. x a b. Using relationships (1.2), polynomials (1.1) are written in the form SET OF METHODS FO SOUTION THE AUHY POBEM FO STIFF SYSTEMS OF ODINAY DIFFEENTIA EUATIONS AF atypov and YuV Nulchev Insttute of Theoretcal and Appled Mechancs SB AS 639 Novosbrs ussa Introducton A constructon

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

arxiv: v2 [math.co] 3 Sep 2017

arxiv: v2 [math.co] 3 Sep 2017 On the Approxate Asyptotc Statstcal Independence of the Peranents of 0- Matrces arxv:705.0868v2 ath.co 3 Sep 207 Paul Federbush Departent of Matheatcs Unversty of Mchgan Ann Arbor, MI, 4809-043 Septeber

More information

AN ANALYSIS OF A FRACTAL KINETICS CURVE OF SAVAGEAU

AN ANALYSIS OF A FRACTAL KINETICS CURVE OF SAVAGEAU AN ANALYI OF A FRACTAL KINETIC CURE OF AAGEAU by John Maloney and Jack Hedel Departent of Matheatcs Unversty of Nebraska at Oaha Oaha, Nebraska 688 Eal addresses: aloney@unoaha.edu, jhedel@unoaha.edu Runnng

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased

More information

Machine Learning. Support Vector Machines. Eric Xing , Fall Lecture 9, October 6, 2015

Machine Learning. Support Vector Machines. Eric Xing , Fall Lecture 9, October 6, 2015 Machne Learnng 0-70 Fall 205 Support Vector Machnes Erc Xng Lecture 9 Octoer 6 205 Readng: Chap. 6&7 C.B ook and lsted papers Erc Xng @ CMU 2006-205 What s a good Decson Boundar? Consder a nar classfcaton

More information

On the number of regions in an m-dimensional space cut by n hyperplanes

On the number of regions in an m-dimensional space cut by n hyperplanes 6 On the nuber of regons n an -densonal space cut by n hyperplanes Chungwu Ho and Seth Zeran Abstract In ths note we provde a unfor approach for the nuber of bounded regons cut by n hyperplanes n general

More information

ITERATIVE ESTIMATION PROCEDURE FOR GEOSTATISTICAL REGRESSION AND GEOSTATISTICAL KRIGING

ITERATIVE ESTIMATION PROCEDURE FOR GEOSTATISTICAL REGRESSION AND GEOSTATISTICAL KRIGING ESE 5 ITERATIVE ESTIMATION PROCEDURE FOR GEOSTATISTICAL REGRESSION AND GEOSTATISTICAL KRIGING Gven a geostatstcal regresson odel: k Y () s x () s () s x () s () s, s R wth () unknown () E[ ( s)], s R ()

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Designing Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate

Designing Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate The Frst Internatonal Senar on Scence and Technology, Islac Unversty of Indonesa, 4-5 January 009. Desgnng Fuzzy Te Seres odel Usng Generalzed Wang s ethod and Its applcaton to Forecastng Interest Rate

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Two Conjectures About Recency Rank Encoding

Two Conjectures About Recency Rank Encoding Internatonal Journal of Matheatcs and Coputer Scence, 0(205, no. 2, 75 84 M CS Two Conjectures About Recency Rank Encodng Chrs Buhse, Peter Johnson, Wlla Lnz 2, Matthew Spson 3 Departent of Matheatcs and

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}. CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

On the Calderón-Zygmund lemma for Sobolev functions

On the Calderón-Zygmund lemma for Sobolev functions arxv:0810.5029v1 [ath.ca] 28 Oct 2008 On the Calderón-Zygund lea for Sobolev functons Pascal Auscher october 16, 2008 Abstract We correct an naccuracy n the proof of a result n [Aus1]. 2000 MSC: 42B20,

More information

Multipoint Analysis for Sibling Pairs. Biostatistics 666 Lecture 18

Multipoint Analysis for Sibling Pairs. Biostatistics 666 Lecture 18 Multpont Analyss for Sblng ars Bostatstcs 666 Lecture 8 revously Lnkage analyss wth pars of ndvduals Non-paraetrc BS Methods Maxu Lkelhood BD Based Method ossble Trangle Constrant AS Methods Covered So

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Sequential Minimal Optimization for SVM with Pinball Loss

Sequential Minimal Optimization for SVM with Pinball Loss Sequental Mnal Optzaton for SVM wth Pnball Loss Xaoln Huang a,, Le Sh b, Johan A.K. Suykens a a KU Leuven, Departent of Electrcal Engneerng (ESAT-STADIUS), B-300 Leuven, Belgu b School of Matheatcal Scences,

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Column Generation. Teo Chung-Piaw (NUS) 25 th February 2003, Singapore

Column Generation. Teo Chung-Piaw (NUS) 25 th February 2003, Singapore Colun Generaton Teo Chung-Paw (NUS) 25 th February 2003, Sngapore 1 Lecture 1.1 Outlne Cuttng Stoc Proble Slde 1 Classcal Integer Prograng Forulaton Set Coverng Forulaton Colun Generaton Approach Connecton

More information

Gradient Descent Learning and Backpropagation

Gradient Descent Learning and Backpropagation Artfcal Neural Networks (art 2) Chrstan Jacob Gradent Descent Learnng and Backpropagaton CSC 533 Wnter 200 Learnng by Gradent Descent Defnton of the Learnng roble Let us start wth the sple case of lnear

More information

Three Algorithms for Flexible Flow-shop Scheduling

Three Algorithms for Flexible Flow-shop Scheduling Aercan Journal of Appled Scences 4 (): 887-895 2007 ISSN 546-9239 2007 Scence Publcatons Three Algorths for Flexble Flow-shop Schedulng Tzung-Pe Hong, 2 Pe-Yng Huang, 3 Gwoboa Horng and 3 Chan-Lon Wang

More information

Solving Fuzzy Linear Programming Problem With Fuzzy Relational Equation Constraint

Solving Fuzzy Linear Programming Problem With Fuzzy Relational Equation Constraint Intern. J. Fuzz Maeatcal Archve Vol., 0, -0 ISSN: 0 (P, 0 0 (onlne Publshed on 0 Septeber 0 www.researchasc.org Internatonal Journal of Solvng Fuzz Lnear Prograng Proble W Fuzz Relatonal Equaton Constrant

More information

,..., k N. , k 2. ,..., k i. The derivative with respect to temperature T is calculated by using the chain rule: & ( (5) dj j dt = "J j. k i.

,..., k N. , k 2. ,..., k i. The derivative with respect to temperature T is calculated by using the chain rule: & ( (5) dj j dt = J j. k i. Suppleentary Materal Dervaton of Eq. 1a. Assue j s a functon of the rate constants for the N coponent reactons: j j (k 1,,..., k,..., k N ( The dervatve wth respect to teperature T s calculated by usng

More information

NP-Completeness : Proofs

NP-Completeness : Proofs NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

Towards strong security in embedded and pervasive systems: energy and area optimized serial polynomial multipliers in GF(2 k )

Towards strong security in embedded and pervasive systems: energy and area optimized serial polynomial multipliers in GF(2 k ) Towards strong securty n ebedded and pervasve systes: energy and area optzed seral polynoal ultplers n GF( k ) Zoya Dyka, Peter Langendoerfer, Frank Vater and Steffen Peter IHP, I Technologepark 5, D-53

More information

On a direct solver for linear least squares problems

On a direct solver for linear least squares problems ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear

More information

Study of Classification Methods Based on Three Learning Criteria and Two Basis Functions

Study of Classification Methods Based on Three Learning Criteria and Two Basis Functions Study of Classfcaton Methods Based on hree Learnng Crtera and wo Bass Functons Jae Kyu Suhr Abstract - hs paper nvestgates several classfcaton ethods based on the three learnng crtera and two bass functons.

More information