The University of Auckland, School of Engineering SCHOOL OF ENGINEERING REPORT 616 SUPPORT VECTOR MACHINES BASICS. written by.

Size: px
Start display at page:

Download "The University of Auckland, School of Engineering SCHOOL OF ENGINEERING REPORT 616 SUPPORT VECTOR MACHINES BASICS. written by."

Transcription

1 The Unversty of Auckand, Schoo of Engneerng SCHOOL OF ENGINEERING REPORT 66 SUPPORT VECTOR MACHINES BASICS wrtten by Vojsav Kecman Schoo of Engneerng The Unversty of Auckand Apr, 004

2 Vojsav Kecman Copyrght, Vojsav Kecman

3 Contents Abstract Bascs of earnng from data Support Vector Machnes n Cassfcaton and Regresson. Lnear Maxma Margn Cassfer for Lneary Separabe Data. Lnear Soft Margn Cassfer for Overappng Casses.3 The Nonnear Cassfer 5.4 Regresson by Support Vector Machnes 36 3 Impementaton Issues 46 Appendx 48 L Support Vector Machnes Modes Dervaton 48 L Soft Margn Cassfer 49 L Soft Regressor 50 References 53

4

5 Support Vector Machnes Bascs An Introducton Ony Vojsav Kecman The Unversty of Auckand, Auckand, New Zeaand Abstract. Ths s a booket about (machne) earnng from emprca data (.e., exampes, sampes, measurements, records, patterns or observatons) by appyng support vector machnes (SVMs) a.k.a. kerne machnes. The basc am of ths report s to gve, as far as possbe, a condensed (but systematc) presentaton of a nove earnng paradgm emboded n SVMs. Our focus w be on the constructve earnng agorthms for both the cassfcaton (pattern recognton) and regresson (functon approxmaton) probems. Consequenty, we w not go nto a the subtetes and detas of the statstca earnng theory (SLT) and structura rsk mnmzaton (SRM) whch are theoretca foundatons for the earnng agorthms presented beow. Ths seems more approprate for the appcaton orented readers. The theoretcay mnded and nterested reader may fnd an extensve presentaton of both the SLT and SRM n (Vapnk, 995, 998; Cherkassky and Muer, 998; Crstann and Shawe-Tayor, 00; Kecman, 00; Schökopf and Smoa 00). Instead of dvng nto a theory, a quadratc programmng based earnng eadng to parsmonous SVMs w be presented n a gente way - startng wth near separabe probems, through the cassfcaton tasks havng overapped casses but st a near separaton boundary, beyond the nearty assumptons to the nonnear separaton boundary, and fnay to the near and nonnear regresson probems. Here, the adjectve parsmonous denotes a SVM wth a sma number of support vectors ( hdden ayer neurons ). The scarcty of the mode resuts from a sophstcated, QP based, earnng that matches the mode capacty to the data compexty ensurng a good generazaton,.e., a good performance of SVM on the future, prevousy, durng the tranng unseen, data. Same as the neura networks (or smary to them), SVMs possess the we-known abty of beng unversa approxmators of any mutvarate functon to any desred degree of accuracy. Consequenty, they are of partcuar nterest for modeng the unknown, or partay known, hghy nonnear, compex systems, pants or processes. Aso, at the very begnnng, and just to be sure what the whoe booket s about, we shoud state ceary when there s no need for an appcaton of SVMs mode-budng technques. In short, whenever there exsts an anaytca cosed-form mode (or t s possbe to devse one) there s no need to resort to earnng from emprca data by SVMs (or by any other type of a earnng machne).

6 Vojsav Kecman Bascs of earnng from data SVMs have been deveoped n the reverse order to the deveopment of neura networks (NNs). SVMs evoved from the sound theory to the mpementaton and experments, whe the NNs foowed more heurstc path, from appcatons and extensve expermentaton to the theory. It s nterestng to note that the very strong theoretca background of SVMs dd not make them wdey apprecated at the begnnng. The pubcaton of the frst papers by Vapnk, Chervonenks (Vapnk and Chervonenks, 965) and co-workers went argey unnotced t 99. Ths was due to a wdespread beef n the statstca and/or machne earnng communty that, despte beng theoretcay appeang, SVMs are nether sutabe nor reevant for practca appcatons. They were taken serousy ony when exceent resuts on practca earnng benchmarks were acheved (n numera recognton, computer vson and text categorzaton). Today, SVMs show better resuts than (or comparabe outcomes to) NNs and other statstca modes, on the most popuar benchmark probems. The earnng probem settng for SVMs s as foows: there s some unknown and nonnear dependency (mappng, functon) y = f(x) between some hgh-dmensona nput vector x and scaar output y (or the vector output y as n the case of mutcass SVMs). There s no nformaton about the underyng jont probabty functons here. Thus, one must perform a dstrbuton-free earnng. The ony nformaton avaabe s a tranng data set D = ^(x, y ) X x Y`, =,, where stands for the number of the tranng data pars and s therefore equa to the sze of the tranng data set D. Often, y s denoted as d, where d stands for a desred (target) vaue. Hence, SVMs beong to the supervsed earnng technques. Note that ths probem s smar to the cassc statstca nference. However, there are severa very mportant dfferences between the approaches and assumptons n tranng SVMs and the ones n cassc statstcs and/or NNs modeng. Cassc statstca nference s based on the foowng three fundamenta assumptons:. Data can be modeed by a set of near n parameter functons; ths s a foundaton of a parametrc paradgm n earnng from expermenta data.. In the most of rea-fe probems, a stochastc component of data s the norma probabty dstrbuton aw, that s, the underyng jont probabty dstrbuton s a Gaussan dstrbuton. 3. Because of the second assumpton, the nducton paradgm for parameter estmaton s the maxmum kehood method, whch s reduced to the mnmzaton of the sum-of-errors-squares cost functon n most engneerng appcatons. A three assumptons on whch the cassc statstca paradgm reed turned out to be napproprate for many contemporary rea-fe probems (Vapnk, 998) because of the foowng facts:

7 Support Vector Machnes Bascs An Introducton Ony 3. Modern probems are hgh-dmensona, and f the underyng mappng s not very smooth the near paradgm needs an exponentay ncreasng number of terms wth an ncreasng dmensonaty of the nput space X (an ncreasng number of ndependent varabes). Ths s known as the curse of dmensonaty.. The underyng rea-fe data generaton aws may typcay be very far from the norma dstrbuton and a mode-buder must consder ths dfference n order to construct an effectve earnng agorthm. 3. From the frst two ponts t foows that the maxmum kehood estmator (and consequenty the sum-of-error-squares cost functon) shoud be repaced by a new nducton paradgm that s unformy better, n order to mode non-gaussan dstrbutons. In addton to the three basc objectves above, the nove SVMs probem settng and nductve prncpe have been deveoped for standard contemporary data sets whch are typcay hgh-dmensona and sparse (meanng, the data sets contan sma number of the tranng data pars). SVMs are the so-caed nonparametrc modes. Nonparametrc does not mean that the SVMs modes do not have parameters at a. On the contrary, ther earnng (seecton, dentfcaton, estmaton, tranng or tunng) s the cruca ssue here. However, unke n cassc statstca nference, the parameters are not predefned and ther number depends on the tranng data used. In other words, parameters that defne the capacty of the mode are data-drven n such a way as to match the mode capacty to data compexty. Ths s a basc paradgm of the structura rsk mnmzaton (SRM) ntroduced by Vapnk and Chervonenks and ther coworkers that ed to the new earnng agorthm. Namey, there are two basc constructve approaches possbe n desgnng a mode that w have a good generazaton property (Vapnk, 995 and 998):. choose an approprate structure of the mode (order of poynomas, number of HL neurons, number of rues n the fuzzy ogc mode) and, keepng the estmaton error (a.k.a. confdence nterva, a.k.a. varance of the mode) fxed n ths way, mnmze the tranng error (.e., emprca rsk), or. keep the vaue of the tranng error (a.k.a. an approxmaton error, a.k.a.an emprca rsk) fxed (equa to zero or equa to some acceptabe eve), and mnmze the confdence nterva. Cassc NNs mpement the frst approach (or some of ts sophstcated varants) and SVMs mpement the second strategy. In both cases the resutng mode shoud resove the tradeoff between under-fttng and over-fttng the tranng data. The fna mode structure (ts order) shoud deay match the earnng machnes capacty wth tranng data compexty. Ths mportant dfference n two earnng approaches comes from the mnmzaton of dfferent cost (error, oss) functonas. Tabe tabuates the basc rsk functonas apped n deveopng the three contemporary statstca modes.

8 4 Vojsav Kecman Tabe : Basc Modes and Ther Error (Rsk) Functonas Mutayer perceptron (NN) Reguarzaton Network (Rada Bass Functons Network) Support Vector Machne R ( d f( x, w)) ÈÉ Coseness to data R ( d f( x, w)) O Pf ÈÉ Coseness to data ÈÉ Smoothness R L :(, h) H N Coseness to data Capacty of a machne Coseness to data = tranng error, a.k.a. emprca rsk d stands for desred vaues, w s the weght vector subject to tranng, O s a reguarzaton parameter, P s a smoothness operator, L H s a SVMs oss functon, h s a VC dmenson and : s a functon boundng the capacty of the earnng machne. In cassfcaton probems L H s typcay 0- oss functon, and n regresson probems L H s the so-caed Vapnk s H-nsenstvty oss (error) functon L = y - f( x, w) H H 0, f y - f( x, w) dh y - f( xw, ) - H, otherwse. () where H s a radus of a tube wthn whch the regresson functon must e, after the successfu earnng. (Note that for H = 0, the nterpoaton of tranng data w be performed). It s nterestng to note that (Gros, 997) has shown that under some constrants the SV machne can aso be derved from the framework of reguarzaton theory rather than SLT and SRM. Thus, unke the cassc adaptaton agorthms (that work n the L norm), SV machnes represent nove earnng technques whch perform SRM. In ths way, the SV machne creates a mode wth mnmzed VC dmenson and when the VC dmenson of the mode s ow, the expected probabty of error s ow as we. Ths means good performance on prevousy unseen data,.e. a good generazaton. Ths property s of partcuar nterest because the mode that generazes we s a good mode and not the mode that performs we on tranng data pars. Too good a performance on tranng data s aso known as an extremey undesrabe overfttng. As t w be shown beow, n the smpest pattern recognton tasks, support vector machnes use a near separatng hyperpane to create a cassfer wth a maxma margn. In order to do that, the earnng probem for the SV machne w be cast as a constraned nonnear optmzaton probem. In ths settng the cost functon w be quadratc and the constrants near (.e., one w have to sove a cassc quadratc programmng probem). In cases when gven casses cannot be neary separated n the orgna nput space, the SV machne frst (non-neary) transforms the orgna nput space nto a hgher dmensona feature space. Ths transformaton can be acheved by usng varous nonnear mappngs; poynoma, sgmoda as n mutayer perceptrons, RBF mappngs havng as the bass functons raday symmetrc functons such as Gaussans, or mutquadrcs or dffer-

9 Support Vector Machnes Bascs An Introducton Ony 5 ent spne functons. After ths nonnear transformaton step, the task of a SV machne n fndng the near optma separatng hyperpane n ths feature space s reatvey trva. Namey, the optmzaton probem to sove n a feature space w be of the same knd as the cacuaton of a maxma margn separatng hyperpane n orgna nput space for neary separabe casses. How, after the specfc nonnear transformaton, nonneary separabe probems n nput space can become neary separabe probems n a feature space w be shown ater. In a probabstc settng, there are three basc components n a earnng from data tasks: a generator of random nputs x, a system whose tranng responses y (.e., d) are used for tranng the earnng machne, and a earnng machne whch, by usng nputs x and system s responses y, shoud earn (estmate, mode) the unknown dependency between these two sets of varabes defned by the weght vector w (Fg ). The earnng (tranng) phase Learnng Machne w = w(x, y) w x Ths connecton s present ony durng the earnng phase. System.e., Pant y.e., d The appcaton (generazaton or, test) phase: x Learnng Machne o = f a (x, w) ~ y o Fgure A mode of a earnng machne (top) w = w(x, y) that durng the tranng phase (by observng nputs x to, and outputs y from, the system) estmates (earns, adjusts, trans, tunes) ts parameters (weghts) w, and n ths way earns mappng y = f(x, w) performed by the system. The use of f a (x, w) ~ y denotes that we w rarey try to nterpoate tranng data pars. We woud rather seek an approxmatng functon that can generaze we. After the tranng, at the generazaton or test phase, the output from a machne o = f a (x, w) s expected to be a good estmate of a system s true response y. The fgure shows the most common earnng settng that some readers may have aready seen n varous other feds - notaby n statstcs, NNs, contro system dentfcaton and/or

10 6 Vojsav Kecman n sgna processng. Durng the (successfu) tranng phase a earnng machne shoud be abe to fnd the reatonshp between an nput space X and an output space Y, by usng data D n regresson tasks (or to fnd a functon that separates data wthn the nput space, n cassfcaton ones). The resut of a earnng process s an approxmatng functon f a (x, w), whch n statstca terature s aso known as, a hypothess f a (x, w). Ths functon approxmates the underyng (or true) dependency between the nput and output n the case of regresson, and the decson boundary,.e., separaton functon, n a cassfcaton. The chosen hypothess f a (x, w) beongs to a hypothess space of functons H (f a H), and t s a functon that mnmzes some rsk functona R(w). It may be practca to remnd the reader that under the genera name approxmatng functon we understand any mathematca structure that maps nputs x nto outputs y. Hence, an approxmatng functon may be: a mutayer perceptron NN, RBF network, SV machne, fuzzy mode, Fourer truncated seres or poynoma approxmatng functon. Here we dscuss SVMs. A set of parameters w s the very subject of earnng and generay these parameters are caed weghts. These parameters may have dfferent geometrca and/or physca meanngs. Dependng upon the hypothess space of functons + we are workng wth the parameters w are usuay: - the hdden and the output ayer weghts n mutayer perceptrons, - the rues and the parameters (for the postons and shapes) of fuzzy subsets, - the coeffcents of a poynoma or Fourer seres, - the centers and (co)varances of Gaussan bass functons as we as the output ayer weghts of ths RBF network, - the support vector weghts n SVMs. There s another mportant cass of functons n earnng from exampes tasks. A earnng machne tres to capture an unknown target functon f o (x) that s beeved to beong to some target space T, or to a cass T, that s aso caed a concept cass. Note that we rarey know the target space T and that our earnng machne generay does not beong to the same cass of functons as an unknown target functon f o (x). Typca exampes of target spaces are contnuous functons wth s contnuous dervatves n n varabes; Soboev spaces (comprsng square ntegrabe functons n n varabes wth s square ntegrabe dervatves), band-mted functons, functons wth ntegrabe Fourer transforms, Booean functons, etc. In the foowng, we w assume that the target space T s a space of dfferentabe functons. The basc probem we are facng stems from the fact that we know very tte about the possbe underyng functon between the nput and the output varabes. A we have at our dsposa s a tranng data set of abeed exampes drawn by ndependenty sampng a (X x Y) space accordng to some unknown probabty dstrbuton. The earnng-from-data probem s -posed. (Ths w be shown on Fgs and 3 for a regresson and cassfcaton exampes respectvey). The basc source of the -posedness of the probem s due to the nfnte number of possbe soutons to the earnng probem.

11 Support Vector Machnes Bascs An Introducton Ony 7 At ths pont, just for the sake of ustraton, t s usefu to remember that a functons that nterpoate data ponts w resut n a zero vaue for tranng error (emprca rsk) as shown (n the case of regresson) n Fg. The fgure shows a smpe exampe of three-outof-nfntey-many dfferent nterpoatng functons of tranng data pars samped from a noseess functon y = sn(x)..5 f(x) Three dfferent nterpoatons of the nose-free tranng data samped from a snus functon (sod thn ne) y = f(x ) x x Fgure Three-out-of-nfntey-many nterpoatng functons resutng n a tranng error equa to 0. However, a thck sod, dashed and dotted nes are bad modes of a true functon y = sn(x) (thn dashed ne). In Fg, each nterpoant resuts n a tranng error equa to zero, but at the same tme, each one s a very bad mode of the true underyng dependency between x and y, because a three functons perform very poory outsde the tranng nputs. In other words, none of these three partcuar nterpoants can generaze we. However, not ony nterpoatng functons can msead. There are many other approxmatng functons (earnng machnes) that w mnmze the emprca rsk (approxmaton or tranng error) but not necessary the generazaton error (true, expected or guaranteed rsk). Ths foows from the fact that a earnng machne s traned by usng some partcuar sampe of the true underyng functon and consequenty t aways produces based approxmatng functons. These approxmants depend necessary on the specfc tranng data pars (.e., the tranng sampe) used. Fg 3 shows an extremey smpe cassfcaton exampe where the casses (represented by the empty tranng crces and squares) are neary separabe. However, n addton to a near separaton (dashed ne) the earnng was aso performed by usng a mode of a hgh capacty (say, the one wth Gaussan bass functons, or the one created by a hgh order poynoma, over the -dmensona nput space) that produced a perfect separaton boundary (emprca rsk equas zero) too. However, such a mode s overfttng the data and t w defntey perform very bady on, durng the tranng unseen, test exampes. Fed crces and squares n the rght hand graph are a wrongy cassfed by the nonnear mode.

12 8 Vojsav Kecman Note that a smpe near separaton boundary correcty cassfes both the tranng and the test data. x x The parameter h s caed the VC (Vapnk-Chervonenks) dmenson of a set of functons. It descrbes the capacty of a set of functons mpemented n a earnng machne. For a bx x Fgure 3 Overfttng n the case of neary separabe cassfcaton probem. Left: The perfect cassfcaton of the tranng data (empty crces and squares) by both ow order near mode (dashed ne) and hgh order nonnear one (sod wggy curve). Rght: Wrong cassfcaton of a the test data shown (fed crces and squares) by a hgh capacty mode, but correct one by the smpe near separaton boundary. A souton to ths probem proposed n the framework of the SLT s restrctng the hypothess space H of approxmatng functons to a set smaer than that of the target functon T whe smutaneousy controng the fexbty (compexty) of these approxmatng functons. Ths s ensured by an ntroducton of a nove nducton prncpe of the SRM and ts agorthmc reazaton through the SV machne. The Structura Rsk Mnmzaton prncpe (Vapnk, 979) tres to mnmze an expected rsk (the cost functon) R comprsng two terms as gven n Tabe for the SVMs R :(, h) L :(, h) R H emp and t s based on the fact that for the cassfcaton earnng probem wth a probabty of at east - K the bound h n( K) R( wn) d: (, ) Remp( w n), (a) hods. The frst term on the rght hand sde s named a VC confdence (confdence term or confdence nterva) that s defned as ª º K h n( ) n( ) h n( K) «h» 4 :(, ) ¼ (b)

13 Support Vector Machnes Bascs An Introducton Ony 9 nary cassfcaton h s the maxma number of ponts whch can be separated (shattered) nto two casses n a possbe h ways by usng the functons of the earnng machne. A SV (earnng) machne can be thought of as o a set of functons mpemented n a SVM, o an nducton prncpe and, o an agorthmc procedure for mpementng the nducton prncpe on the gven set of functons. The notaton for rsks gven above by usng R(w n ) denotes that an expected rsk s cacuated over a set of functons f an (x, w n ) of ncreasng compexty. Dfferent bounds can aso be formuated n terms of other concepts such as growth functon or anneaed VC entropy. Bounds aso dffer for regresson tasks. More deta can be found n (Vapnk, 995, as we as n Cherkassky and Muer, 998). However, the genera characterstcs of the dependence of the confdence nterva on the number of tranng data and on the VC dmenson h s smar and gven n Fg 4. :(h,, K = 0.) VC Confdence VC confdence.e., or Estmaton error bound Error Number of data VC dmenson h 00 Fgure 4 The dependency of VC confdence nterva :(h,, K) on the number of tranng data and the VC dmenson h (h < ) for a fxed confdence eve - K = 0. = Equatons () show that when the number of tranng data ncreases,.e., for o f (wth other parameters fxed), an expected (true) rsk R(w n ) s very cose to emprca rsk R emp (w n ) because : o 0. On the other hand, when the probabty - K (aso caed a confdence eve whch shoud not be confused wth the confdence term :) approaches, the generazaton bound grows arge, because n the case when K o 0 (meanng that the confdence eve - K o ), the vaue of : o f. Ths has an obvous ntutve nterpretaton (Cherkassky and Muer, 998) n that any earnng machne (mode, estmates) obtaned from a fnte number of tranng data cannot have an arbtrary hgh confdence eve.

14 0 Vojsav Kecman There s aways a trade-off between the accuracy provded by bounds and the degree of confdence (n these bounds). Fg 4 aso shows that the VC confdence nterva ncreases wth an ncrease n a VC dmenson h for a fxed number of the tranng data pars. The SRM s a nove nductve prncpe for earnng from fnte tranng data sets. It proved to be very usefu when deang wth sma sampes. The basc dea of the SRM s to choose (from a arge number of possby canddate earnng machnes), a mode of the rght capacty to descrbe the gven tranng data pars. As mentoned, ths can be done by restrctng the hypothess space H of approxmatng functons and smutaneousy by controng ther fexbty (compexty). Thus, earnng machnes w be those parameterzed modes that, by ncreasng the number of parameters (typcay caed weghts w here), form a nested structure n the foowng sense H H H 3 H n - H n H (3) In such a nested set of functons, every functon aways contans a prevous, ess compex, functon. Typcay, H n may be: a set of poynomas n one varabe of degree n; fuzzy ogc mode havng n rues; mutayer perceptrons, or RBF network havng n HL neurons, SVM structured over n support vectors. The goa of earnng s one of a subset seecton that matches tranng data compexty wth approxmatng mode capacty. In other words, a earnng agorthm chooses an optma poynoma degree or, an optma number of HL neurons or, an optma number of FL mode rues, for a poynoma mode or NN or FL mode respectvey. For earnng machnes near n parameters, ths compexty (expressed by the VC dmenson) s gven by the number of weghts,.e., by the number of free parameters. For approxmatng modes nonnear n parameters, the cacuaton of the VC dmenson s often not an easy task. Nevertheess, even for these networks, by usng smuaton experments, one can fnd a mode of approprate compexty.

15 Lnear Maxma Margn Cassfer for Lneary Separabe Data Support Vector Machnes n Cassfcaton and Regresson Beow, we focus on the agorthm for mpementng the SRM nducton prncpe on the gven set of functons. It mpements the strategy mentoned prevousy t keeps the tranng error fxed and mnmzes the confdence nterva. We frst consder a smpe exampe of near decson rues (.e., the separatng functons w be hyperpanes) for bnary cassfcaton (dchotomzaton) of neary separabe data. In such a probem, we are abe to perfecty cassfy data pars, meanng that an emprca rsk can be set to zero. It s the easest cassfcaton probem and yet an exceent ntroducton of a reevant and mportant deas underyng the SLT, SRM and SVM. Our presentaton w graduay ncrease n compexty. It w begn wth a Lnear Maxma Margn Cassfer for Lneary Separabe Data where there s no sampe overappng. Afterwards, we w aow some degree of overappng of tranng data pars. However, we w st try to separate casses by usng near hyperpanes. Ths w ead to the Lnear Soft Margn Cassfer for Overappng Casses. In probems when near decson hyperpanes are no onger feasbe, the mappng of an nput space nto the so-caed feature space (that corresponds to the HL n NN modes) w take pace resutng n the Nonnear Cassfer. Fnay, n the subsecton on Regresson by SV Machnes we ntroduce same approaches and technques for sovng regresson (.e., functon approxmaton) probems.. Lnear Maxma Margn Cassfer for Lneary Separabe Data Consder the probem of bnary cassfcaton or dchotomzaton. Tranng data are gven as (x, y ), (x, y ),..., (x, y ), x ƒ n, y {+, -} (4) For reasons of vsuazaton ony, we w consder the case of a two-dmensona nput space,.e., x ƒ. Data are neary separabe and there are many dfferent hyperpanes that can perform separaton (Fg 5). (Actuay, for x ƒ, the separaton s performed by panes w x + w x + b = o. In other words, the decson boundary,.e., the separaton ne n nput space s defned by the equaton w x + w x + b = 0.). How to fnd the best one? The dffcut part s that a we have at our dsposa are sparse tranng data. Thus, we want to fnd the optma separatng functon wthout knowng the underyng probabty dstrbuton P(x, y). There are many functons that can sove gven pattern recognton (or functona approxmaton) tasks. In such a probem settng, the SLT (deveoped n the eary 960s by Vapnk and Chervonenks) shows that t s cruca to restrct the cass of functons mpemented by a earnng machne to one wth a compexty that s sutabe for the amount of avaabe tranng data.

16 Vojsav Kecman In the case of a cassfcaton of neary separabe data, ths dea s transformed nto the foowng approach among a the hyperpanes that mnmze the tranng error (.e., emprca rsk) fnd the one wth the argest margn. Ths s an ntutvey acceptabe approach. Just by ookng at Fg 5 we w fnd that the dashed separaton ne shown n the rght graph seems to promse probaby good cassfcaton whe facng prevousy unseen data (meanng, n the generazaton,.e. test, phase). Or, at east, t seems to probaby be better n generazaton than the dashed decson boundary havng smaer margn shown n the eft graph. Ths can aso be expressed as that a cassfer wth smaer margn w have hgher expected rsk. By usng gven tranng exampes, durng the earnng stage, our machne fnds parameters w = [w w w n ] T and b of a dscrmnant or decson functon d(x, w, b) gven as d(x, w, b) = w T x + b = n w x b, (5) where x, w ƒ n, and the scaar b s caed a bas. (Note that the dashed separaton nes n Fg 5 represent the ne that foows from d(x, w, b) = 0). After the successfu tranng stage, by usng the weghts obtaned, the earnng machne, gven prevousy unseen pattern x p, produces output o accordng to an ndcator functon gven as F = o = sgn(d(x p, w, b)), (6) where o s the standard notaton for the output from the earnng machne. In other words, the decson rue s: f d(x p, w, b) > 0, the pattern x p beongs to a cass (.e., o = y = +), and f d(x p, w, b) < 0 the pattern x p beongs to a cass (.e., o = y = -). x Smaest margn M Cass x Cass Cass Separaton ne,.e., decson boundary Cass Largest margn M x x Fgure 5 Two-out-of-many separatng nes: a good one wth a arge margn (rght) and a ess acceptabe separatng ne wth a sma margn, (eft).

17 Lnear Maxma Margn Cassfer for Lneary Separabe Data 3 The ndcator functon F gven by (6) s a step-wse (.e., a stars-wse) functon (see Fgs 6 and 7). At the same tme, the decson (or dscrmnant) functon d(x, w, b) s a hyperpane. Note aso that both a decson hyperpane d and the ndcator functon F ve n an n + - dmensona space or they e over a tranng pattern s n-dmensona nput space. There s one more mathematca object n cassfcaton probems caed a separaton boundary that ves n the same n-dmensona space of nput vectors x. Separaton boundary separates vectors x nto two casses. Here, n cases of neary separabe data, the boundary s aso a (separatng) hyperpane but of a ower order than d(x, w, b). The decson (separaton) boundary s an ntersecton of a decson functon d(x, w, b) and a space of nput features. It s gven by d(x, w, b) = 0. (7) A these functons and reatonshps can be foowed, for two-dmensona nputs x, n Fg 6. In ths partcuar case, the decson boundary.e., separatng (hyper)pane s actuay a separatng ne n a x x pane and, a decson functon d(x, w, b) s a pane over the - dmensona space of features,.e., over a x x pane. Desred vaue y Indcator functon F(x, w, b) = sgn(d) Input x The separaton boundary s an ntersecton of d(x, w, b) wth the nput pane (x, x ). Thus t s: w T x + b = 0 + Input pane (x, x ) 0 Input pane (x, x ) Support vectors are star data - d(x, w, b) Margn M The decson functon (optma canonca separatng hyperpane) d(x, w, b) s an argument of the ndcator functon. Input x Fgure 6 The defnton of a decson (dscrmnant) functon or hyperpane d(x, w, b), a decson (separatng) boundary d(x, w, b) = 0 and an ndcator functon F = sgn(d(x, w, b)) whch vaue represents a earnng, or SV, machne s output o. In the case of -dmensona tranng patterns x (.e., for -dmensona nputs x to the earnng machne), decson functon d(x, w, b) s a straght ne n an x-y pane. An ntersecton of ths ne wth an x-axs defnes a pont that s a separaton boundary between two casses. Ths can be foowed n Fg 7. Before attemptng to fnd an optma separatng hy-

18 4 Vojsav Kecman perpane havng the argest margn, we ntroduce the concept of the canonca hyperpane. We depct ths concept wth the hep of the -dmensona exampe shown n Fg 7.Not qute ncdentay, the decson pane d(x, w, b) shown n Fg 6 s aso a canonca pane. Namey, the vaues of d and of F are the same and both are equa to for the support vectors depcted by stars. At the same tme, for a other tranng patterns d > F. In order to present a noton of ths new concept of the canonca pane, frst note that there are many hyperpanes that can correcty separate data. In Fg 7 three dfferent decson functons d(x, w, b) are shown. There are nfntey many more. In fact, gven d(x, w, b), a functons d(x, kw, kb), where k s a postve scaar, are correct decson functons too. Because parameters (w, b) descrbe the same separaton hyperpane as parameters (kw, kb) there s a need to ntroduce the noton of a canonca hyperpane: A hyperpane s n the canonca form wth respect to tranng data x X f T Nmn w x b. x X (8) The sod ne d(x, w, b) = -x + 5 n Fg 7 fufs (8) because ts mnma absoute vaue for the gven sx tranng patterns beongng to two casses s. It acheves ths vaue for two patterns, chosen as support vectors, namey for x 3 =, and x 4 = 3. For a other patterns, d > Target y,.e., d d(x, k w, k b) - The decson boundary. -3 For a -dm nput, t s a pont or, a zero-order -4 hyperpane. d(x, k w, k b) -5 The decson functon s a (canonca) hyperpane d(x, w, b). For a -dm nput, t s a (canonca) straght ne. The ndcator functon F = sgn(d(x, w, b)) s a step-wse functon. It s a SV machne output o Feature x The two dashed nes represent decson functons that are not canonca hyperpanes. However, they do have the same separaton boundary as the canonca hyperpane here. Fgure 7 SV cassfcaton for -dmensona nputs by the near decson functon. Graphca presentaton of a canonca hyperpane. For -dmensona nputs, t s actuay a canonca straght ne (depcted as a thck straght sod ne) that passes through ponts (+, +) and (+3, -) defned as the support vectors (stars). The two dashed nes are the two other decson hyperpanes (.e., straght nes). The tranng nput patterns {x = 0.5, x =, x 3 = } Cass have a desred or target vaue (abe) y = +. The nputs {x 4 = 3, x 5 = 4, x 6 = 4.5, x 7 = 5} Cass have the abe y = -.

19 Lnear Maxma Margn Cassfer for Lneary Separabe Data 5 Note an nterestng deta regardng the noton of a canonca hyperpane that s easy checked. There are many dfferent hyperpanes (panes and straght nes for -D and -D probems n Fgs 6 and 7 respectvey) that have the same separaton boundary (sod ne and a dot n Fgs 6 (rght) and 7 respectvey). At the same tme there are far fewer hyperpanes that can be defned as canonca ones fufng (8). In Fg 7,.e., for a - dmensona nput vector x, the canonca hyperpane s unque. Ths s not the case for tranng patterns of hgher dmenson. Dependng upon the confguraton of cass eements, varous canonca hyperpanes are possbe. Therefore, there s a need to defne an optma canonca hyperpane (OCSH) as a canonca hyperpane havng a maxma margn. Ths search for a separatng, maxma margn, canonca hyperpane s the utmate earnng goa n statstca earnng theory underyng SV machnes. Carefuy note the adjectves used n the prevous sentence. The hyperpane obtaned from a mted tranng data must have a maxma margn because t w probaby better cassfy new data. It must be n canonca form because ths w ease the quest for sgnfcant patterns, here caed support vectors. The canonca form of the hyperpane w aso smpfy the cacuatons. Fnay, the resutng hyperpane must utmatey separate tranng patterns. We avod the dervaton of an expresson for the cacuaton of a dstance (margn M) between the cosest members from two casses for ts smpcty. The curous reader can derve the expresson for M as gven beow, or t can ook n (Kecman, 00) or other books. The margn M can be derved by both the geometrc and agebrac argument and s gven as M w. (9) Ths mportant resut w have a great consequence for the constructve (.e., earnng) agorthm n a desgn of a maxma margn cassfer. It w ead to sovng a quadratc programmng (QP) probem whch w be shown shorty. Hence, the good od gradent earnng n NNs w be repaced by souton of the QP probem here. Ths s the next mportant dfference between the NNs and SVMs and foows from the mpementaton of SRM n desgnng SVMs, nstead of a mnmzaton of the sum of error squares, whch s a standard cost functon for NNs. Equaton (9) s a very nterestng resut showng that mnmzaton of a norm of a hyperpane norma weght vector w = T ww w w " w n eads to a maxmzaton of a margn M. Because a mnmzaton of f s equvaent to the mnmzaton of f, the mnmzaton of a norm w equas a mnmzaton of w T n w = w w w " wn, and ths eads to a maxmzaton of a margn M. Hence, the earnng probem s

20 6 Vojsav Kecman mnmze wt w, (0a) subject to constrants ntroduced and gven n (0b) beow. x The optma canonca separaton hyperpane x Cass, y = + x Support Vectors w x 3 Margn M Cass, y = - x Fgure 8 The optma canonca separatng hyperpane (OCSH) wth the argest margn ntersects hafway between the two casses. The ponts cosest to t (satsfyng y j w T x j + b =, j =, N SV ) are support vectors and the OCSH satsfes y (w T x + b) t =, (where denotes the number of tranng data and N SV stands for the number of SV). Three support vectors (x and x from cass, and x 3 from cass ) are the textured tranng data. (A mutpcaton of w T w by 0.5 s for numerca convenence ony, and t doesn t change the souton). Note that n the case of neary separabe casses emprca error equas zero (R emp = 0 n (a)) and mnmzaton of w T w corresponds to a mnmzaton of a confdence term :. The OCSH,.e., a separatng hyperpane wth the argest margn defned by M = / w, specfes support vectors,.e., tranng data ponts cosest to t, whch satsfy y j [w T x j + b] {, j =, N SV. For a the other (non-svs data ponts) the OCSH satsfes nequates y [w T x + b] >. In other words, for a the data, OCSH shoud satsfy the foowng constrants y [w T x + b], =, (0b) where denotes a number of tranng data ponts, and N SV stands for a number of SVs. The ast equaton can be easy checked vsuay n Fgs 6 and 7 for -dmensona and - dmensona nput vectors x respectvey. Thus, n order to fnd the OCSH havng a maxma margn, a earnng machne shoud mnmze w subject to the nequaty constrants (0b). Ths s a cassc quadratc optmzaton probem wth nequaty constrants. Such

21 Lnear Maxma Margn Cassfer for Lneary Separabe Data 7 an optmzaton probem s soved by the sadde pont of the Lagrange functona (Lagrangan) L(w, b, ) = T T w w D{ y[ w x b] } () where the D are Lagrange mutpers. The search for an optma sadde pont (w o, b o, ) o s necessary because Lagrangan L must be mnmzed wth respect to w and b, and has to be maxmzed wth respect to nonnegatve D (.e., D t 0 shoud be found). Ths probem can be soved ether n a prma space (whch s the space of parameters w and b) or n a dua space (whch s the space of Lagrange mutpers D ). The second approach gves nsghtfu resuts and we w consder the souton n a dua space beow. In order to do that, we use Karush-Kuhn-Tucker (KKT) condtons for the optmum of a constraned functon. In our case, both the objectve functon () and constrants (0b) are convex and KKT condtons are necessary and suffcent condtons for a maxmum of (). These condtons are: at the sadde pont (w o, b o, o ), dervatves of Lagrangan L wth respect to prma varabes shoud vansh whch eads to, wl 0,.e., w D y x w w o o () wl 0,.e., Dy 0 (3) w b o and the KKT compementarty condtons beow (statng that at the souton pont the products between dua varabes and constrants equas zero) must aso be satsfed, D {y [w T x + b]-} = 0, =,. (4) Substtutng () and (3) nto a prma varabes Lagrangan L(w, b, to the dua varabes Lagrangan L d (D) ) (), we change L d (D) = T D yy jdd j j, j xx. (5) In order to fnd the optma hyperpane, a dua Lagrangan L d ( ) has to be maxmzed wth respect to nonnegatve D (.e., D must be n the nonnegatve quadrant) and wth respect to the equaty constrant as foows In formng the Lagrangan, for constrants of the form f > 0, the nequaty constrants equatons are mutped by nonnegatve Lagrange mutpers (.e., D t 0) and subtracted from the objectve functon.

22 8 Vojsav Kecman D t 0, =, (6a) D y 0 (6b) Note that the dua Lagrangan L d ( ) s expressed n terms of tranng data and depends ony on the scaar products of nput patterns (x T x j ). The dependency of L d ( ) on a scaar product of nputs w be very handy ater when anayzng nonnear decson boundares and for genera nonnear regresson. Note aso that the number of unknown varabes equas the number of tranng data. After earnng, the number of free parameters s equa to the number of SVs but t does not depend on the dmensonaty of nput space. Such a standard quadratc optmzaton probem can be expressed n a matrx notaton and formuated as foows: Maxmze subject to L d ( ) = 0.5 T + I T, (7a) y T = 0, (7b) D t 0, =, (7c) where =[D, D,..., D ] T, H denotes the Hessan matrx ( H j =y y j (x x j ) = y y j x T x j ) of ths probem, and f s an (, ) unt vector f = = [... ] T. (Note that maxmzaton of (7a) equas a mnmzaton of L d ( ) = 0.5 T + I T, subject to the same constrants). Soutons D o of the dua optmzaton probem above determne the parameters w o and b o of the optma hyperpane accordng to () and (4) as foows w D y x, (8a) o o ( ( xw) ( ( xw ), s =, N SV. (8b) NSV T NSV T o s o y s s s s o NSV ys N SV b In dervng (8b) we used the fact that y can be ether + or -, and /y = y. N SV denotes the number of support vectors. There are two mportant observatons about the cacuaton of w o. Frst, an optma weght vector w o, s obtaned n (8a) as a near combnaton of the tranng data ponts and second, w o (same as the bas term b 0 ) s cacuated by usng ony the seected data ponts caed support vectors (SVs). The fact that the summatons n (8a) goes over a tranng data patterns (.e., from to ) s rreevant because the Lagrange mutpers for a non-support vectors equa zero (D o = 0, = N SV +, ). Fnay, havng cacuated w o and b o we obtan a decson hyperpane d(x) and an ndcator functon F = o = sgn(d(x)) as gven beow d(x) = xx, F = o = sgn(d(x)). (9) T w ox bo y D bo

23 Lnear Soft Margn Cassfer for Overappng Casses 9 Tranng data patterns havng non-zero Lagrange mutpers are caed support vectors. For neary separabe tranng data, a support vectors e on the margn and they are generay just a sma porton of a tranng data (typcay, N SV << ). Fgs 6, 7 and 8 show the geometry of standard resuts for non-overappng casses. Before presentng appcatons of OCSH for both overappng casses and casses havng nonnear decson boundares, we w comment ony on whether and how SV based near cassfers actuay mpement the SRM prncpe. The more detaed presentaton of ths mportant property can be found n (Kecman, 00; Schökopf and Smoa 00)). Frst, t can be shown that an ncrease n margn reduces the number of ponts that can be shattered.e., the ncrease n margn reduces the VC dmenson, and ths eads to the decrease of the SVM capacty. In short, by mnmzng w (.e., maxmzng the margn) the SV machne tranng actuay mnmzes the VC dmenson and consequenty a generazaton error (expected rsk) at the same tme. Ths s acheved by mposng a structure on the set of canonca hyperpanes and then, durng the tranng, by choosng the one wth a mnma VC dmenson. A structure on the set of canonca hyperpanes s ntroduced by consderng varous hyperpanes havng dfferent w. In other words, we anayze sets S A such that w d A. Then, f A d A d A 3 d... d A n, we ntroduced a nested set S A S A S A3... S An. Thus, f we mpose the constrant w d A, then the canonca hyperpane cannot be coser than /A to any of the tranng ponts x. Vapnk n (Vapnk, 995) states that the VC dmenson h of a set of canonca hyperpanes n ƒ n such that w d A s H d mn[r A, n] +, (0) where a the tranng data ponts (vectors) are encosed by a sphere of the smaest radus R. Therefore, a sma w resuts n a sma h, and mnmzaton of w s an mpementaton of the SRM prncpe. In other words, a mnmzaton of the canonca hyperpane weght norm w mnmzes the VC dmenson accordng to (0). See aso Fg 4 that shows how the estmaton error, meanng the expected rsk (because the emprca rsk, due to the near separabty, equas zero) decreases wth a decrease of a VC dmenson. Fnay, there s an nterestng, smpe and powerfu resut (Vapnk, 995) connectng the generazaton abty of earnng machnes and the number of support vectors. Once the support vectors have been found, we can cacuate the bound on the expected probabty of commttng an error on a test exampe as foows E [P(error)] E[number of support vectors] d, () where E denotes expectaton over a tranng data sets of sze. Note how easy t s to estmate ths bound that s ndependent of the dmensonaty of the nput space. Therefore, an SV machne havng a sma number of support vectors w have good generazaton abty even n a very hgh-dmensona space.

24 0 Vojsav Kecman Exampe beow shows the SVM s earnng of the weghts for a smpe separabe data probem n both the prma and the dua doman. The sma number and ow dmensonaty of data pars s used n order to show the optmzaton steps anaytcay and graphcay. The same reasonng w be n the case of hgh dmensona and arge tranng data sets but for them, one has to rey on computers and the nsght n souton steps s necessary ost. Exampe: Consder a desgn of SVM cassfer for 3 data shown n Fg 9 beow. Target y,.e., d + (c) b (a) (b) Feature x w Fgure 9 Left: Sovng SVM cassfer for 3 data shown. SVs are star data. Rght: Souton space w-b Frst we sove the probem n the prma doman: From the constrants (0b) t foows w tb, ( a) w tb, ( b) b t. ( c) The three straght nes correspondng to the equates above are shown n Fg 9 rght. The textured area s a feasbe doman for the weght w and bas b. Note that the area s not defned by the nequaty (a), thus pontng to the fact that the pont - s not a support vector. Ponts - and 0 defne the textured area and they w be the supportng data for our decson functon. The task s to mnmze (0a), and ths w be acheved by takng the vaue w =. Then, from (b), t foows that b =. Note that (a) must not be used for the cacuaton of the bas term b. Because both the cost functon (0a) and the constrants (0b) are convex, the prma and the dua souton must produce same w and b. Dua souton foows from maxmzng (5) subject to (6) as foows ª 4 0ºª D º L d D D D3 [ D D D3 ] «0»«D», «0 0 0»«¼ D» 3 ¼ s.t. -D D D3 0, D t0, D t0, D t0, 3 The dua Lagrangan s obtaned n terms of D and D after expressng D 3 from the equaty constrant and t s gven as Ld DD 0.5(4D 4 DD D ). L d w have maxmum for D = 0, and t foows that we have to fnd the maxmum of

25 Lnear Soft Margn Cassfer for Overappng Casses Ld D 0.5D whch w be at D. Note that the Hessan matrx s extremey bad condtoned and f the QP probem s to be soved by computer H shoud be reguarzed frst. From the equaty constrant t foows that D 3 too. Now, we can cacuate the weght vector w and the bas b from (8a) and (8b) as foows, w 3 D yx 0( )( ) ( )( ) ()0 The bas can be cacuated by usng SVs ony, meanng from ether pont - or pont 0. Both resut n same vaue as shown beow b ( ), or b (0).. Lnear Soft Margn Cassfer for Overappng Casses The earnng procedure presented above s vad for neary separabe data, meanng for tranng data sets wthout overappng. Such probems are rare n practce. At the same tme, there are many nstances when near separatng hyperpanes can be good soutons even when data are overapped (e.g., normay dstrbuted casses havng the same covarance matrces have a near separaton boundary). However, quadratc programmng soutons as gven above cannot be used n the case of overappng because the constrants y [w T x + b] t, =, gven by (0b) cannot be satsfed. In the case of an overappng (see Fg 0), the overapped data ponts cannot be correcty cassfed and for any mscassfed tranng data pont x, the correspondng D w tend to nfnty. Ths partcuar data pont (by ncreasng the correspondng D vaue) attempts to exert a stronger nfuence on the decson boundary n order to be cassfed correcty. When the D vaue reaches the maxma bound, t can no onger ncrease ts effect, and the correspondng pont w stay mscassfed. In such a stuaton, the agorthm ntroduced above chooses (amost) a tranng data ponts as support vectors. To fnd a cassfer wth a maxma margn, the agorthm presented n the secton. above, must be changed aowng some data to be uncassfed. Better to say, we must eave some data on the wrong sde of a decson boundary. In practce, we aow a soft margn and a data nsde ths margn (whether on the correct sde of the separatng ne or on the wrong one) are negected. The wdth of a soft margn can be controed by a correspondng penaty parameter C (ntroduced beow) that determnes the trade-off between the tranng error and VC dmenson of the mode. The queston now s how to measure the degree of mscassfcaton and how to ncorporate such a measure nto the hard margn earnng agorthm gven by equatons (0). The smpest method woud be to form the foowng earnng probem mnmze wt w + C(number of mscassfed data), ()

26 Vojsav Kecman where C s a penaty parameter, tradng off the margn sze (defned by w,.e., by w T w) for the number of mscassfed data ponts. Large C eads to sma number of mscassfcatons, bgger w T w and consequenty to the smaer margn and vce versa. Obvousy takng C = UHTXLUHVWKDWWKHQXPEHURIPLVFODVVLILHGGDWDLV]HURDQGLQWKHFDVHRIDQRYHrappng ths s not possbe. Hence, the probem may be feasbe ony for some vaue C < x = - d(x ), >, mscassfed postve cass pont 3 0 x x 3 4 = 0 x 4 x Cass, y = + = + d(x ), >, mscassfed negatve cass pont Cass, y = - d(x) = - d(x) = + x Fgure 0 The soft decson boundary for a dchotomzaton probem wth data overappng. Separaton ne (sod), margns (dashed) and support vectors (textured tranng data ponts). ). 4 SVs n postve cass (crces) and 3 SVs n negatve cass (squares). mscassfcatons for postve cass and mscassfcaton for negatve cass. However, the serous probem wth () s that the error s countng can t be accommodated wthn the handy (meanng reabe, we understood and we deveoped) quadratc programmng approach. Aso, the countng ony can t dstngush between huge (or dsastrous) errors and cose msses! The possbe souton s to measure the dstances [ of the ponts crossng the margn from the correspondng margn and trade ther sum for the margn sze as gven beow mnmze wt w + C(sum of dstances of the wrong sde ponts), (3) In fact ths s exacty how the probem of the data overappng was soved n (Cortes, 995; Cortes and Vapnk, 995) - by generazng the optma hard margn agorthm. They ntroduced the nonnegatve sack varabes [ ( =, ) n the statement of the optmzaton probem for the overapped data ponts. Now, nstead of fufng (0a) and (0b), the separatng hyperpane must satsfy mnmze wt w + C [, (4a)

27 Lnear Soft Margn Cassfer for Overappng Casses 3 subject to.e., subject to y [w T x + b] t - [, =,, [ t 0, w T x + b t + - [, for y = +, [ t 0, (4b) (4c) w T x + b d - + [, for y = -, [ t 0,. (4d) Hence, for such a generazed optma separatng hyperpane, the functona to be mnmzed comprses an extra term accountng the cost of overappng errors. In fact the cost functon (4a) can be even more genera as gven beow mnmze wt k w + C [, (4e) subject to same constrants. Ths s a convex programmng probem that s usuay soved ony for k = or k =, and such soft margn SVMs are dubbed L and L SVMs respectvey. By choosng exponent k =, nether sack varabes [ nor ther Lagrange mutpers E appear n a dua Lagrangan L d. Same as for a neary separabe probem presented prevousy, for L SVMs (k = ) here, the souton to a quadratc programmng probem (4), s gven by the sadde pont of the prma Lagrangan L p ( w, b,,, ) shown beow T w wc ( [ ) L p ( w, b,,, ) = T D { y[ wx b] [ } E [, for L SVM (5) where D and E are the Lagrange mutpers. Agan, we shoud fnd an optma sadde pont ( w, b,,, ) because the Lagrangan L p has to be mnmzed wth respect to w, o o o o o b and, and maxmzed wth respect to nonnegatve D and E. As before, ths probem can be soved n ether a prma space or dua space (whch s the space of Lagrange mutpers D and E.). Agan, we consder a souton n a dua space as gven beow by usng - standard condtons for an optmum of a constraned functon wl 0,.e., w D y x, (6) w w o o wl 0,.e., Dy 0, (7) w b o wl w[ o and the KKT compementarty condtons beow, 0,.e., D E C, (8)

28 4 Vojsav Kecman D {y [w T x + b]- + [ }= 0, =,. (9a) [ = (C - D )[ = 0, =,. (9b) At the optma souton, due to the KKT condtons (9), the ast two terms n the prma Lagrangan L p gven by (5) vansh and the dua varabes Lagrangan L d ( ), for L SVM, s not a functon of E. In fact, t s same as the hard margn cassfer s L d gven before and repeated here for the soft margn one, L d ( ) = T D yy jdd j j, j xx. (30) In order to fnd the optma hyperpane, a dua Lagrangan L d ( ) has to be maxmzed wth respect to nonnegatve and (unke before) smaer than or equa to C, D. In other words wth C t D t 0, =,, (3a) and under the constrant (7),.e., under D y 0. (3b) Thus, the fna quadratc optmzaton probem s practcay same as for the separabe case the ony dfference beng n the modfed bounds of the Lagrange mutpers D. The penaty parameter C, whch s now the upper bound on D, s determned by the user. The seecton of a good or proper C s aways done expermentay by usng some crossvadaton technque. Note that n the prevous neary separabe case, wthout data overappng, ths upper bound C = f. We can aso ready change to the matrx notaton of the probem above as n equatons (7). Most mportant of a s that the earnng probem s expressed ony n terms of unknown Lagrange mutpers D, and known nputs and outputs. Furthermore, optmzaton does not soey depend upon nputs x whch can be of a very hgh (ncusve of an nfnte) dmenson, but t depends upon a scaar product of nput vectors x. It s ths property we w use n the next secton where we desgn SV machnes that can create nonnear separaton boundares. Fnay, expressons for both a decson functon d(x) and an ndcator functon F = sgn(d(x)) for a soft margn cassfer are same as for neary separabe casses and are aso gven by (9). From (9) foows that there are ony three possbe soutons for D (see Fg 0). D = 0, [ = 0, data pont x s correcty cassfed,. C > D > 0, then, the two compementarty condtons must resut resut n y [w T x + b]- + [ = 0, and [ = 0. Thus, y [w T x + b] = and x s a support vector. The support vectors wth C t D t 0 are caed unbounded or free support vectors. They e on the two margns,

29 The Nonnear Cassfer 5 3. D = C, then, y [w T x + b]- + [ = 0, and [ t 0, and x s a support vector. The support vectors wth D = C are caed bounded support vectors. They e on the wrong sde of the margn. For > [ t 0, x s st correcty cassfed, and f [ t, x s mscassfed. For L SVM the second term n the cost functon (4e) s quadratc,.e., C [, and ths eads to changes n a dua optmzaton probem whch s now, subject to L d ( ) = Gj D DD C ¹ T yy j j j, j xx, (3) D t 0, =,, (33a) D y 0. (33b) where, j = for = j, and t s zero otherwse. Note the change n Hessan matrx eements gven by second terms n (3), as we as that there s no upper bound on D. The detaed anayss and comparsons of the L and L SVMs s presented n (Abe, 004). Dervaton of (3) and (33) s gven n the Appendx. We use the most popuar L SVMs here, because they usuay produce more sparse soutons,.e., they create a decson functon by usng ess SVs than the L SVMs..3 The Nonnear Cassfer The near cassfers presented n two prevous sectons are very mted. Mosty, casses are not ony overapped but the genune separaton functons are nonnear hypersurfaces. A nce and strong characterstc of the approach presented above s that t can be easy (and n a reatvey straghtforward manner) extended to create nonnear decson boundares. The motvaton for such an extenson s that an SV machne that can create a nonnear decson hypersurface w be abe to cassfy nonneary separabe data. Ths w be acheved by consderng a near cassfer n the so-caed feature space that w be ntroduced shorty. A very smpe exampe of a need for desgnng nonnear modes s gven n Fg where the true separaton boundary s quadratc. It s obvous that no erroress near separatng hyperpane can be found now. The best near separaton functon shown as a dashed straght ne woud make sx mscassfcatons (textured data ponts; 4 n the negatve cass and n the postve one). Yet, f we use the nonnear separaton boundary we are abe to separate two casses wthout any error. Generay, for n-dmensona nput pat-

30 6 Vojsav Kecman terns, nstead of a nonnear curve, an SV machne w create a nonnear separatng hypersurface. A mappng )(x) s chosen n advance..e., t s a fxed functon. Note that an nput space (x-space) s spanned by components x of an nput vector x and a feature space F ()-space) s spanned by components I (x) of a vector )(x). By performng such a mappng, we hope that n a )-space, our earnng agorthm w be abe to neary separate mages of x by appyng the near SVM formuaton presented above. (In fact, t can be shown that for a whoe cass of mappngs the near separaton n a feature space s aways possbe. Such mappngs w correspond to the postve defnte kernes that w be shown shorty). We aso expect ths approach to agan ead to sovng a quadratc optmzaton probem wth smar constrants n a )-space. The souton for an ndcator functon F (x) = sgn(w T ) T (x) + b) = sgn y ( ) ( ) D [ [ b, whch s a near cassfer n a feature space, w create a nonnear separatng hypersurface n the orgna nput space gven by (35) bex Nonnear separaton boundary Cass, y = + Cass, y = - Ponts mscassfed by near separaton boundary are textured x Fgure A nonnear SVM wthout data overappng. A true separaton s a quadratc curve. The nonnear separaton ne (sod), the near one (dashed) and data ponts mscassfed by the near separaton ne (the textured tranng data ponts) are shown. There are 4 mscassfed negatve data and mscassfed postve ones. SVs are not shown. The basc dea n desgnng nonnear SV machnes s to map nput vectors x ƒ n nto vectors )(x) of a hgher dmensona feature space F (where ) represents mappng: ƒ n o ƒ f ), and to sove a near cassfcaton probem n ths feature space x ƒ n o )(x) = [I (x) I (x),..., I n (x)] T ƒ f, (34)

31 The Nonnear Cassfer 7 ow. (Compare ths souton wth (9) and note the appearances of scaar products n both the orgna X-space and n the feature space F). The equaton for an F (x) just gven above can be rewrtten n a neura networks form as foows T F (x) = sgn y ( ) ( ) D [ [ b (35) y D (, ) k b vk(, ) x x b = sgn x x = sgn where v corresponds to the output ayer weghts of the SVM s network and k(x, x) denotes the vaue of the kerne functon that w be ntroduced shorty. (v equas y D n the cassfcaton case presented above and t s equa to (D - D * ) n the regresson probems). Note the dfference between the weght vector w whch norm shoud be mnmzed and whch s the vector of the same dmenson as the feature space vector )(x) and the weghtngs v = D y that are scaar vaues composng the weght vector v whch dmenson equas the number of tranng data ponts. The ( - N SVs ) of v components are equa to zero, and ony N SVs entres of v are nonzero eements. A smpe exampe beow (Fg ) shoud exempfy the dea of a nonnear mappng to (usuay) hgher dmensona space and how t happens that the data become neary separabe n the F-space. d x = - x = 0 x 3 = x d(x) F (x) - Fgure A nonnear -dmensona cassfcaton probem. One possbe souton s gven by the decson functon d(x) (sod curve).e., by the correspondng ndcator functon defned as F = sgn(d(x)) (dashed stepwse functon). Consder sovng the smpest -D cassfcaton probem gven the nput and the output (desred) vaues as foows: x = [- 0 ] T and d = y = [- -] T. Here we choose the foowng mappng to the feature space: )(x) = [ (x) (x) 3 (x)] T = [x x ] T. The mappng produces the foowng three ponts n the feature space (shown as the rows of the matrx F (F standng for features))

32 8 Vojsav Kecman F ª º «0 0» ¼ These three ponts are neary separabe by the pane 3 (x) = (x) n a feature space as shown n Fg 3. It s easy to show that the mappng obtaned by )(x) = [x x ] T s a T j j scaar product mpementaton of a quadratc kerne functon (x x ) k( x, x ). In other words, ) T (x ) )(x j ) = k( x, x ). Ths equaty w be ntroduced shorty. j 3-D feature space Const x 3 [ ] T x [0 0 ] T x [ ] T x x Fgure 3 The three data ponts of a probem n Fg are neary separabe n the feature space (obtaned by the mappng )(x) = [ (x) (x) 3 (x)] T = [x x ] T ). The separaton boundary s gven as the pane 3(x) = (x) shown n the fgure. There are two basc probems when mappng an nput x-space nto hgher order F-space: ) the choce of mappng )(x) that shoud resut n a rch cass of decson hypersurfaces, ) the cacuaton of the scaar product ) T (x) )(x) that can be computatonay very dscouragng f the number of features f (.e., dmensonaty f of a feature space) s very arge. The second probem s connected wth a phenomenon caed the curse of dmensonaty. For exampe, to construct a decson surface correspondng to a poynoma of degree two n an n-d nput space, a dmensonaty of a feature space f = n(n + 3)/. In other words, a feature space s spanned by f coordnates of the form

33 The Nonnear Cassfer 9 z = x,, z n = x n (n coordnates), z n+ = (x ),, z n = (x n ) (next n coordnates), z n+ = x x,, z f = x n x n- (n(n-)/ coordnates), and the separatng hyperpane created n ths space, s a second-degree poynoma n the nput space (Vapnk, 998). Thus, constructng a poynoma of degree two ony, n a 56- dmensona nput space, eads to a dmensonaty of a feature space f = 33,5. Performng a scaar product operaton wth vectors of such, or hgher, dmensons, s not a cheap computatona task. The probems become serous (and fortunatey ony seemngy unsovabe) f we want to construct a poynoma of degree 4 or 5 n the same 56-dmensona space eadng to the constructon of a decson hyperpane n a bon-dmensona feature space. Ths exposon n dmensonaty can be avoded by notcng that n the quadratc optmzaton probem gven by (5) and (30), as we as n the fna expresson for a cassfer, tranng data ony appear n the form of scaar products x T x j. These products w be repaced by scaar products ) T (x))(x) = [I (x), I (x),..., I n (x)] T [I (x ), I (x ),..., I n (x )] n a feature space F, and the atter can be and w be expressed by usng the kerne functon K(x, x j ) = ) T (x ))(x j ). Note that a kerne functon K(x, x j ) s a functon n nput space. Thus, the basc advantage n usng kerne functon K(x, x j ) s n avodng performng a mappng )(x) et a. Instead, the requred scaar products n a feature space ) T (x ))(x j ), are cacuated drecty by computng kernes K(x, x j ) for gven tranng data vectors n an nput space. In ths way, we bypass a possby extremey hgh dmensonaty of a feature space F. Thus, by usng the chosen kerne K(x, x j ), we can construct an SVM that operates n an nfnte dmensona space (such a kerne functon s a Gaussan kerne functon gven n tabe beow). In addton, as w be shown beow, by appyng kernes we do not even have to know what the actua mappng )(x) s. A kerne s a functon K such that K(x, x j ) = ) T (x ))(x j ). (36) There are many possbe kernes, and the most popuar ones are gven n tabe. A of them shoud fuf the so-caed Mercer s condtons. The Mercer s kernes beong to a set of reproducng kernes. For further detas see (Mercer, 909; Azerman et a, 964; Smoa and Schökopf, 997; Vapnk, 998; Kecman 00). The smpest s a near kerne defned as K(x, x j ) = x T x j. Beow we show a few more kernes: POYNOMIAL KERNELS: Let x ƒ.e., x=[x x ] T, and f we choose )(x) =[ x o ƒ 3 mappng), then the dot product x x x ] T (.e., there s an ƒ ) T (x ))(x j ) = [x x x x ] [x j x jx j x j ] T = [x x j + x x x j x + x x j ] = (x T x j ) = K(x, x j ), or

34 30 Vojsav Kecman K(x, x j ) = (x T x j ) = ) T (x ))(x j ) Note that n order to cacuate the scaar product n a feature space ) T (x ))(x j ), we do not need to perform the mappng )(x) = [ x x x x ] T et a. Instead, we cacuate ths product drecty n the nput space by computng (x T x j ). Ths s very we known under the popuar name of the kerne trck. Interestngy, note aso that other mappngs such as an ƒ oƒ 3 mappng gven by )(x) = [ x - x x x x + x ], or an ƒ o ƒ 4 mappng gven by )(x) = [x x x x x x ] aso accompsh the same task as (x T x j ) Now, assume the foowng mappng )(x) = [ x x x x x x ],.e., there s an ƒ o ƒ 5 mappng pus bas term as the constant 6 th dmenson s vaue. Then the dot product n a feature space F s gven as ) T (x ))(x j ) = + x x j + x x j + x x x j x + x x j + x x j = + (x T x j ) + (x T x j ) = (x T x j + ) = K(x, x j ), or K(x, x j ) = (x T x j + ) = ) T (x ))(x j ) Thus, the ast mappng eads to the second order compete poynoma. Many canddate functons can be apped to a convouton of an nner product (.e., for kerne functons) K(x, x ) n an SV machne. Each of these functons constructs a dfferent nonnear decson hypersurface n an nput space. In the frst three rows, the tabe shows the three most popuar kernes n SVMs n use today, and the nverse mutquadrcs one as an nterestng and powerfu kerne to be proven yet. Tabe. Popuar Admssbe Kernes Kerne functons Type of cassfer K(x, x ) = (x T x ) Lnear, dot product, kerne, CPD K(x, x ) = [(x T x ) + ] d Compete poynoma of degree d, PD [( ) T xx 6 ( xx )] K( xx, ) e Gaussan RBF, PD K(x, x ) = tanh[(x T x ) + b]* Mutayer perceptron, CPD K( xx, ) Inverse mutquadrc functon, PD xx E *ony for certan vaues of b, (C)PD = (condtonay) postve defnte

35 The Nonnear Cassfer 3 The postve defnte (PD) kernes are the kernes whch Gramm matrx G (a.k.a. Gramman) cacuated by usng a the tranng data ponts s postve defnte (meanng a ts egenvaues are strcty postve,.e., O > 0, =, ) G K( x, x ) j ª k( x, x) k( x, x) " k( x, x ) º «k(, ) k(, ) k(, )» «x x x x # x x» «# # # #» «k( x, x ) k( x, x ) " k( x, x )» ¼ (37) The kerne matrx G s a symmetrc one. Even more, any symmetrc postve defnte matrx can be regarded as a kerne matrx, that s - as an nner product matrx n some space. Fnay, we arrve at the pont of presentng the earnng n nonnear cassfers (n whch we are utmatey nterested here). The earnng agorthm for a nonnear SV machne (cassfer) foows from the desgn of an optma separatng hyperpane n a feature space. Ths s the same procedure as the constructon of a hard (5) and soft (30) margn cassfers n an x-space prevousy. In a )(x)-space, the dua Lagrangan, gven prevousy by (5) and (30), s now L d ( ) = T D DD jyy j j, j, (38) and, accordng to (36), by usng chosen kernes, we shoud maxmze the foowng dua Lagrangan subject to L d ( ) = D DD yyk( x, x ), (39) j j j, j D t 0, =, and D y 0. (39a) In a more genera case, because of a nose or due to generc cass features, there w be an overappng of tranng data ponts. Nothng but constrants for D change. Thus, the nonnear soft margn cassfer w be the souton of the quadratc optmzaton probem gven by (39) subject to constrants C t D t 0, =, and D y 0. (39b) Agan, the ony dfference to the separabe nonnear cassfer s the upper bound C on the Lagrange mutpers D. In ths way, we mt the nfuence of tranng data ponts that w reman on the wrong sde of a separatng nonnear hypersurface. After the dua varabes are cacuated, the decson hypersurface d(x) s determned by

36 3 Vojsav Kecman d( x) yd K( x, x ) b v K( x, x ) b, (40) ª º and the ndcator functon s F( x) sgn[ d( x)] sgn «vk( x, x ) b». ¼ Note that the summaton s not actuay performed over a tranng data but rather over the support vectors, because ony for them do the Lagrange mutpers dffer from zero. The exstence and cacuaton of a bas b s now not a drect procedure as t s for a near hyperpane. Dependng upon the apped kerne, the bas b can be mpcty part of the kerne functon. If, for exampe, Gaussan RBF s chosen as a kerne, t can use a bas term as the f + st feature n F-space wth a constant output = +, but not necessary. In short, a PD kernes do not necessary need an expct bas term b, but b can be used. (More on ths can be found n (Kecman, Huang, and Vogt, 004) as we as n the (Vogt and Kecman, 004). Same as for the near SVM, (39) can be wrtten n a matrx notaton as maxmze L d ( ) = 0.5 T + I T, (4a) subject to y T = 0, (4b) and C t D t 0, =,, (4c) where =[D, D,..., D ] T, H denotes the Hessan matrx ( H j = y y j K(x, x j )) of ths probem and f s an (, ) unt vector f = = [... ] T. Note that f K(x, x j ) s the postve defnte matrx, then so s the matrx y y j K(x, x j ) too. The foowng -D exampe (just for the sake of graphca presentaton) w show the creaton of a near decson functon n a feature space and a correspondng nonnear (quadratc) decson functon n an nput space. Suppose we have 4 -D data ponts gven as x =, x =, x 3 = 5, x 4 = 6, wth data at,, and 6 as cass and the data pont at 5 as cass,.e., y = -, y = -, y 3 =, y 4 = -. We use the poynoma kerne of degree, K(x, y) = (xy + ). C s set to 50, whch s of esser mportance because the constrants w be not mposed n ths exampe for maxma vaue for the dua varabes apha w be smaer than C = 50. Case : Workng wth a bas term b as gven n (40). We frst fnd D ( =,, 4) by sovng dua probem (4) havng a Hessan matrx ª º «9 5-69» H « » « » ¼

37 The Nonnear Cassfer 33 Aphas are D = 0, D = , D 3 = D 4 = and the bas b w be found by usng (8b), or by fufng the requrements that the vaues of a decson functon at the support vectors shoud be the gven y. The mode (decson functon) s gven by 4 4 d( x) yd K( x, x) b v( xx ) b, or by d(x) = (-)(x + ) ()(5x + ) (-)(6x + ) + b d(x) = x x + b Bas b s determned from the requrement that at the SV ponts, 5 and 6, the outputs must be -, and - respectvey. Hence, b = -9, resutng n the decson functon d(x) = x x 9. The nonnear (quadratc) decson functon and the ndcator one are shown n Fg 4. Note that n cacuatons above 6 decma paces have been used for apha vaues. The cacuaton s numercay very senstve, and workng wth fewer decmas can gve very approxmate or wrong resuts. The compete poynoma kerne as used n the case, s postve defnte and there s no need to use an expct bas term b as presented above. Thus, one can use the same second order poynoma mode wthout the bas term b. Note that n ths partcuar case there s no equaty constrant equaton that orgnates from an equazaton of the prma Lagrangan dervatve n respect to the bas term b to zero. Hence, we do not use (4b) whe usng a postve defnte kerne wthout bas as t w be shown beow n the case. y, d NL SV cassfcaton. D nput. Poynoma, quadratc, kerne used x Fgure 4 The nonnear decson functon (sod) and the ndcator functon (dashed) for -D overappng data. By usng a compete second order poynoma the mode wth and wthout a bas term b are same.

38 34 Vojsav Kecman Case : Workng wthout a bas term b Because we use the same second order poynoma kerne, the Hessan matrx H s same as n the case. The souton wthout the equaty constrant for aphas s: D = 0, D = , D 3 = , D 4 = The mode (decson functon) s gven by 4 4 D, or by d( x) y K( x, x ) v ( xx ) d(x) = (-)(x + ) ()(5x + ) (-)(6x + ) d(x) = x x 9. Thus the nonnear (quadratc) decson functon and consequenty the ndcator functon n the two partcuar cases are equa. XOR Exampe: In the next exampe shown by Fgs 4 and 5 we present a the mportant mathematca objects of a nonnear SV cassfer by usng a cassc XOR (excusve-or) probem. The graphs show a the mathematca functons (objects) nvoved n a nonnear cassfcaton. Namey, the nonnear decson functon d(x), the NL ndcator functon F (x), tranng data (x ), support vectors (x SV ) and separaton boundares. The same objects w be created n the cases when the nput vector x s of a dmensonaty n >, but the vsuazaton n these cases s not possbe. In such cases one taks about the decson hyperfuncton (hypersurface) d(x), ndcator hyperfuncton (hypersurface) F (x), tranng data (x ), support vectors (x SV ) and separaton hyperboundares (hypersurfaces). Note the dfferent character of a d(x), F (x) and separaton boundares n the two graphs gven beow. However, n both graphs a the data are correcty cassfed. The anaytc souton to the Fg 6 for the second order poynoma kerne (.e., for (x T x j + ) = ) T (x ))(x j ), where )(x) = [ x x x x x x ], no expct bas and C = f) goes as foows. Inputs and desred outputs are, ª 0 0º «, 0 0» ¼ matrx T x y = d T. The dua Lagrangan (39) has the Hessan H ª - - º «9-4 -4» «- -4 4» «- -4 4» ¼

39 The Nonnear Cassfer 35 Decson and ndcator functon of a NL SVM x x Separaton boundares Input x Input x Fgure 5 XOR probem. Kerne functons (-D Gaussans) are not shown. The nonnear decson functon, the nonnear ndcator functon and the separaton boundares are shown. A four data are chosen as support vectors.. x Decson and ndcator functon of a nonnear SVM x Input pane Hyperboc separaton boundares Fgure 6 XOR probem. Kerne functon s a -D poynoma. The nonnear decson functon, the nonnear ndcator functon and the separaton boundares are shown. A four data are support vectors.

MARKOV CHAIN AND HIDDEN MARKOV MODEL

MARKOV CHAIN AND HIDDEN MARKOV MODEL MARKOV CHAIN AND HIDDEN MARKOV MODEL JIAN ZHANG JIANZHAN@STAT.PURDUE.EDU Markov chan and hdden Markov mode are probaby the smpest modes whch can be used to mode sequenta data,.e. data sampes whch are not

More information

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students.

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students. Exampe: Suppose we want to bud a cassfer that recognzes WebPages of graduate students. How can we fnd tranng data? We can browse the web and coect a sampe of WebPages of graduate students of varous unverstes.

More information

Image Classification Using EM And JE algorithms

Image Classification Using EM And JE algorithms Machne earnng project report Fa, 2 Xaojn Sh, jennfer@soe Image Cassfcaton Usng EM And JE agorthms Xaojn Sh Department of Computer Engneerng, Unversty of Caforna, Santa Cruz, CA, 9564 jennfer@soe.ucsc.edu

More information

On the Equality of Kernel AdaTron and Sequential Minimal Optimization in Classification and Regression Tasks and Alike Algorithms for Kernel

On the Equality of Kernel AdaTron and Sequential Minimal Optimization in Classification and Regression Tasks and Alike Algorithms for Kernel Proceedngs of th European Symposum on Artfca Neura Networks, pp. 25-222, ESANN 2003, Bruges, Begum, 2003 On the Equaty of Kerne AdaTron and Sequenta Mnma Optmzaton n Cassfcaton and Regresson Tasks and

More information

Associative Memories

Associative Memories Assocatve Memores We consder now modes for unsupervsed earnng probems, caed auto-assocaton probems. Assocaton s the task of mappng patterns to patterns. In an assocatve memory the stmuus of an ncompete

More information

Application of support vector machine in health monitoring of plate structures

Application of support vector machine in health monitoring of plate structures Appcaton of support vector machne n heath montorng of pate structures *Satsh Satpa 1), Yogesh Khandare ), Sauvk Banerjee 3) and Anrban Guha 4) 1), ), 4) Department of Mechanca Engneerng, Indan Insttute

More information

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory Advanced Scence and Technoogy Letters Vo.83 (ISA 205), pp.60-65 http://dx.do.org/0.4257/ast.205.83.2 Research on Compex etworks Contro Based on Fuzzy Integra Sdng Theory Dongsheng Yang, Bngqng L, 2, He

More information

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks Shengyang Sun, Changyou Chen, Lawrence Carn Suppementary Matera: Learnng Structured Weght Uncertanty n Bayesan Neura Networks Shengyang Sun Changyou Chen Lawrence Carn Tsnghua Unversty Duke Unversty Duke

More information

Sparse Training Procedure for Kernel Neuron *

Sparse Training Procedure for Kernel Neuron * Sparse ranng Procedure for Kerne Neuron * Janhua XU, Xuegong ZHANG and Yanda LI Schoo of Mathematca and Computer Scence, Nanng Norma Unversty, Nanng 0097, Jangsu Provnce, Chna xuanhua@ema.nnu.edu.cn Department

More information

Neural network-based athletics performance prediction optimization model applied research

Neural network-based athletics performance prediction optimization model applied research Avaabe onne www.jocpr.com Journa of Chemca and Pharmaceutca Research, 04, 6(6):8-5 Research Artce ISSN : 0975-784 CODEN(USA) : JCPRC5 Neura networ-based athetcs performance predcton optmzaton mode apped

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

NONLINEAR SYSTEM IDENTIFICATION BASE ON FW-LSSVM

NONLINEAR SYSTEM IDENTIFICATION BASE ON FW-LSSVM Journa of heoretca and Apped Informaton echnoogy th February 3. Vo. 48 No. 5-3 JAI & LLS. A rghts reserved. ISSN: 99-8645 www.jatt.org E-ISSN: 87-395 NONLINEAR SYSEM IDENIFICAION BASE ON FW-LSSVM, XIANFANG

More information

On the Power Function of the Likelihood Ratio Test for MANOVA

On the Power Function of the Likelihood Ratio Test for MANOVA Journa of Mutvarate Anayss 8, 416 41 (00) do:10.1006/jmva.001.036 On the Power Functon of the Lkehood Rato Test for MANOVA Dua Kumar Bhaumk Unversty of South Aabama and Unversty of Inos at Chcago and Sanat

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06

Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06 Dervng the Dua Prof. Bennett Math of Data Scence /3/06 Outne Ntty Grtty for SVM Revew Rdge Regresson LS-SVM=KRR Dua Dervaton Bas Issue Summary Ntty Grtty Need Dua of w, b, z w 2 2 mn st. ( x w ) = C z

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory Proceedngs of the 2009 IEEE Internatona Conference on Systems Man and Cybernetcs San Antono TX USA - October 2009 Mutspectra Remote Sensng Image Cassfcaton Agorthm Based on Rough Set Theory Yng Wang Xaoyun

More information

WAVELET-BASED IMAGE COMPRESSION USING SUPPORT VECTOR MACHINE LEARNING AND ENCODING TECHNIQUES

WAVELET-BASED IMAGE COMPRESSION USING SUPPORT VECTOR MACHINE LEARNING AND ENCODING TECHNIQUES WAVELE-BASED IMAGE COMPRESSION USING SUPPOR VECOR MACHINE LEARNING AND ENCODING ECHNIQUES Rakb Ahmed Gppsand Schoo of Computng and Informaton echnoogy Monash Unversty, Gppsand Campus Austraa. Rakb.Ahmed@nfotech.monash.edu.au

More information

A DIMENSION-REDUCTION METHOD FOR STOCHASTIC ANALYSIS SECOND-MOMENT ANALYSIS

A DIMENSION-REDUCTION METHOD FOR STOCHASTIC ANALYSIS SECOND-MOMENT ANALYSIS A DIMESIO-REDUCTIO METHOD FOR STOCHASTIC AALYSIS SECOD-MOMET AALYSIS S. Rahman Department of Mechanca Engneerng and Center for Computer-Aded Desgn The Unversty of Iowa Iowa Cty, IA 52245 June 2003 OUTLIE

More information

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION European Journa of Mathematcs and Computer Scence Vo. No. 1, 2017 ON AUTOMATC CONTNUTY OF DERVATONS FOR BANACH ALGEBRAS WTH NVOLUTON Mohamed BELAM & Youssef T DL MATC Laboratory Hassan Unversty MORO CCO

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression ISIS Technca Report Support Vector Machnes for Cassfcaton and Regresson Steve Gunn 0 November 997 Contents Introducton 3 2 Support Vector Cassfcaton 4 2. The Optma Separatng Hyperpane...5 2.. Lneary Separabe

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Nested case-control and case-cohort studies

Nested case-control and case-cohort studies Outne: Nested case-contro and case-cohort studes Ørnuf Borgan Department of Mathematcs Unversty of Oso NORBIS course Unversty of Oso 4-8 December 217 1 Radaton and breast cancer data Nested case contro

More information

COXREG. Estimation (1)

COXREG. Estimation (1) COXREG Cox (972) frst suggested the modes n whch factors reated to fetme have a mutpcatve effect on the hazard functon. These modes are caed proportona hazards (PH) modes. Under the proportona hazards

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Diplomarbeit. Support Vector Machines in der digitalen Mustererkennung

Diplomarbeit. Support Vector Machines in der digitalen Mustererkennung Fachberech Informatk Dpomarbet Support Vector Machnes n der dgtaen Mustererkennung Ausgeführt be der Frma Semens VDO n Regensburg vorgeegt von: Chrstan Mkos St.-Wofgangstrasse 9305 Regensburg Betreuer:

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques

Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques Energes 20, 4, 73-84; do:0.3390/en40073 Artce OPEN ACCESS energes ISSN 996-073 www.mdp.com/journa/energes Short-Term Load Forecastng for Eectrc Power Systems Usng the PSO-SVR and FCM Custerng Technques

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1

The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1 Abstract The Entre Souton Path for Support Vector Machne n Postve and Unabeed Cassfcaton 1 Yao Lmn, Tang Je, and L Juanz Department of Computer Scence, Tsnghua Unversty 1-308, FIT, Tsnghua Unversty, Bejng,

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Optimum Selection Combining for M-QAM on Fading Channels

Optimum Selection Combining for M-QAM on Fading Channels Optmum Seecton Combnng for M-QAM on Fadng Channes M. Surendra Raju, Ramesh Annavajjaa and A. Chockangam Insca Semconductors Inda Pvt. Ltd, Bangaore-56000, Inda Department of ECE, Unversty of Caforna, San

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

Support Vector Machines

Support Vector Machines /14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x

More information

3. Stress-strain relationships of a composite layer

3. Stress-strain relationships of a composite layer OM PO I O U P U N I V I Y O F W N ompostes ourse 8-9 Unversty of wente ng. &ech... tress-stran reatonshps of a composte ayer - Laurent Warnet & emo Aerman.. tress-stran reatonshps of a composte ayer Introducton

More information

Note 2. Ling fong Li. 1 Klein Gordon Equation Probablity interpretation Solutions to Klein-Gordon Equation... 2

Note 2. Ling fong Li. 1 Klein Gordon Equation Probablity interpretation Solutions to Klein-Gordon Equation... 2 Note 2 Lng fong L Contents Ken Gordon Equaton. Probabty nterpretaton......................................2 Soutons to Ken-Gordon Equaton............................... 2 2 Drac Equaton 3 2. Probabty nterpretaton.....................................

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

GENERATIVE AND DISCRIMINATIVE CLASSIFIERS: NAIVE BAYES AND LOGISTIC REGRESSION. Machine Learning

GENERATIVE AND DISCRIMINATIVE CLASSIFIERS: NAIVE BAYES AND LOGISTIC REGRESSION. Machine Learning CHAPTER 3 GENERATIVE AND DISCRIMINATIVE CLASSIFIERS: NAIVE BAYES AND LOGISTIC REGRESSION Machne Learnng Copyrght c 205. Tom M. Mtche. A rghts reserved. *DRAFT OF September 23, 207* *PLEASE DO NOT DISTRIBUTE

More information

IDENTIFICATION OF NONLINEAR SYSTEM VIA SVR OPTIMIZED BY PARTICLE SWARM ALGORITHM

IDENTIFICATION OF NONLINEAR SYSTEM VIA SVR OPTIMIZED BY PARTICLE SWARM ALGORITHM Journa of Theoretca and Apped Informaton Technoogy th February 3. Vo. 48 No. 5-3 JATIT & LLS. A rghts reserved. ISSN: 99-8645 www.att.org E-ISSN: 87-395 IDENTIFICATION OF NONLINEAR SYSTEM VIA SVR OPTIMIZED

More information

Quantum Runge-Lenz Vector and the Hydrogen Atom, the hidden SO(4) symmetry

Quantum Runge-Lenz Vector and the Hydrogen Atom, the hidden SO(4) symmetry Quantum Runge-Lenz ector and the Hydrogen Atom, the hdden SO(4) symmetry Pasca Szrftgser and Edgardo S. Cheb-Terrab () Laboratore PhLAM, UMR CNRS 85, Unversté Le, F-59655, France () Mapesoft Let's consder

More information

Lower bounds for the Crossing Number of the Cartesian Product of a Vertex-transitive Graph with a Cycle

Lower bounds for the Crossing Number of the Cartesian Product of a Vertex-transitive Graph with a Cycle Lower bounds for the Crossng Number of the Cartesan Product of a Vertex-transtve Graph wth a Cyce Junho Won MIT-PRIMES December 4, 013 Abstract. The mnmum number of crossngs for a drawngs of a gven graph

More information

9 Adaptive Soft K-Nearest-Neighbour Classifiers with Large Margin

9 Adaptive Soft K-Nearest-Neighbour Classifiers with Large Margin 9 Adaptve Soft -Nearest-Neghbour Cassfers wth Large argn Abstract- A nove cassfer s ntroduced to overcome the mtatons of the -NN cassfcaton systems. It estmates the posteror cass probabtes usng a oca Parzen

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

Monica Purcaru and Nicoleta Aldea. Abstract

Monica Purcaru and Nicoleta Aldea. Abstract FILOMAT (Nš) 16 (22), 7 17 GENERAL CONFORMAL ALMOST SYMPLECTIC N-LINEAR CONNECTIONS IN THE BUNDLE OF ACCELERATIONS Monca Purcaru and Ncoeta Adea Abstract The am of ths paper 1 s to fnd the transformaton

More information

A General Column Generation Algorithm Applied to System Reliability Optimization Problems

A General Column Generation Algorithm Applied to System Reliability Optimization Problems A Genera Coumn Generaton Agorthm Apped to System Reabty Optmzaton Probems Lea Za, Davd W. Cot, Department of Industra and Systems Engneerng, Rutgers Unversty, Pscataway, J 08854, USA Abstract A genera

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0 Bezer curves Mchael S. Floater August 25, 211 These notes provde an ntroducton to Bezer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of the

More information

Lecture 6: Introduction to Linear Regression

Lecture 6: Introduction to Linear Regression Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

A finite difference method for heat equation in the unbounded domain

A finite difference method for heat equation in the unbounded domain Internatona Conerence on Advanced ectronc Scence and Technoogy (AST 6) A nte derence method or heat equaton n the unbounded doman a Quan Zheng and Xn Zhao Coege o Scence North Chna nversty o Technoogy

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Laboratory 1c: Method of Least Squares

Laboratory 1c: Method of Least Squares Lab 1c, Least Squares Laboratory 1c: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Cyclic Codes BCH Codes

Cyclic Codes BCH Codes Cycc Codes BCH Codes Gaos Feds GF m A Gaos fed of m eements can be obtaned usng the symbos 0,, á, and the eements beng 0,, á, á, á 3 m,... so that fed F* s cosed under mutpcaton wth m eements. The operator

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

A Novel Hierarchical Method for Digital Signal Type Classification

A Novel Hierarchical Method for Digital Signal Type Classification Proceedngs of the 6th WSEAS Internatona Conference on Apped Informatcs and Communcatons, Eounda, Greece, August 8-0, 006 (pp388-393) A Nove Herarchca Method for Dgta Sgna ype Cassfcaton AAOLLAH EBRAHIMZADEH,

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

n-step cycle inequalities: facets for continuous n-mixing set and strong cuts for multi-module capacitated lot-sizing problem

n-step cycle inequalities: facets for continuous n-mixing set and strong cuts for multi-module capacitated lot-sizing problem n-step cyce nequates: facets for contnuous n-mxng set and strong cuts for mut-modue capactated ot-szng probem Mansh Bansa and Kavash Kanfar Department of Industra and Systems Engneerng, Texas A&M Unversty,

More information

Chapter 6. Rotations and Tensors

Chapter 6. Rotations and Tensors Vector Spaces n Physcs 8/6/5 Chapter 6. Rotatons and ensors here s a speca knd of near transformaton whch s used to transforms coordnates from one set of axes to another set of axes (wth the same orgn).

More information

Math1110 (Spring 2009) Prelim 3 - Solutions

Math1110 (Spring 2009) Prelim 3 - Solutions Math 1110 (Sprng 2009) Solutons to Prelm 3 (04/21/2009) 1 Queston 1. (16 ponts) Short answer. Math1110 (Sprng 2009) Prelm 3 - Solutons x a 1 (a) (4 ponts) Please evaluate lm, where a and b are postve numbers.

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Laboratory 3: Method of Least Squares

Laboratory 3: Method of Least Squares Laboratory 3: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly they are correlated wth

More information

Gaussian Processes and Polynomial Chaos Expansion for Regression Problem: Linkage via the RKHS and Comparison via the KL Divergence

Gaussian Processes and Polynomial Chaos Expansion for Regression Problem: Linkage via the RKHS and Comparison via the KL Divergence entropy Artce Gaussan Processes and Poynoma Chaos Expanson for Regresson Probem: Lnkage va the RKHS and Comparson va the KL Dvergence Lang Yan * ID, Xaojun Duan, Bowen Lu and Jn Xu Coege of Lbera Arts

More information

Part II. Support Vector Machines

Part II. Support Vector Machines Part II Support Vector Machnes 35 Chapter 5 Lnear Cassfcaton 5. Lnear Cassfers on Lnear Separabe Data As a frst step n understandng and constructng Support Vector Machnes e stud the case of near separabe

More information

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0 Bézer curves Mchael S. Floater September 1, 215 These notes provde an ntroducton to Bézer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

Pattern Classification

Pattern Classification Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Xin Li Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, CHINA

Xin Li Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, CHINA RESEARCH ARTICLE MOELING FIXE OS BETTING FOR FUTURE EVENT PREICTION Weyun Chen eartment of Educatona Informaton Technoogy, Facuty of Educaton, East Chna Norma Unversty, Shangha, CHINA {weyun.chen@qq.com}

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Numerical integration in more dimensions part 2. Remo Minero

Numerical integration in more dimensions part 2. Remo Minero Numerca ntegraton n more dmensons part Remo Mnero Outne The roe of a mappng functon n mutdmensona ntegraton Gauss approach n more dmensons and quadrature rues Crtca anass of acceptabt of a gven quadrature

More information

International Journal "Information Theories & Applications" Vol.13

International Journal Information Theories & Applications Vol.13 290 Concuson Wthn the framework of the Bayesan earnng theory, we anayze a cassfer generazaton abty for the recognton on fnte set of events. It was shown that the obtane resuts can be appe for cassfcaton

More information

Approximate merging of a pair of BeÂzier curves

Approximate merging of a pair of BeÂzier curves COMPUTER-AIDED DESIGN Computer-Aded Desgn 33 (1) 15±136 www.esever.com/ocate/cad Approxmate mergng of a par of BeÂzer curves Sh-Mn Hu a,b, *, Rou-Feng Tong c, Tao Ju a,b, Ja-Guang Sun a,b a Natona CAD

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

CHAPTER III Neural Networks as Associative Memory

CHAPTER III Neural Networks as Associative Memory CHAPTER III Neural Networs as Assocatve Memory Introducton One of the prmary functons of the bran s assocatve memory. We assocate the faces wth names, letters wth sounds, or we can recognze the people

More information

A parametric Linear Programming Model Describing Bandwidth Sharing Policies for ABR Traffic

A parametric Linear Programming Model Describing Bandwidth Sharing Policies for ABR Traffic parametrc Lnear Programmng Mode Descrbng Bandwdth Sharng Poces for BR Traffc I. Moschoos, M. Logothets and G. Kokknaks Wre ommuncatons Laboratory, Dept. of Eectrca & omputer Engneerng, Unversty of Patras,

More information

A MIN-MAX REGRET ROBUST OPTIMIZATION APPROACH FOR LARGE SCALE FULL FACTORIAL SCENARIO DESIGN OF DATA UNCERTAINTY

A MIN-MAX REGRET ROBUST OPTIMIZATION APPROACH FOR LARGE SCALE FULL FACTORIAL SCENARIO DESIGN OF DATA UNCERTAINTY A MIN-MAX REGRET ROBST OPTIMIZATION APPROACH FOR ARGE SCAE F FACTORIA SCENARIO DESIGN OF DATA NCERTAINTY Travat Assavapokee Department of Industra Engneerng, nversty of Houston, Houston, Texas 7704-4008,

More information

Predicting Model of Traffic Volume Based on Grey-Markov

Predicting Model of Traffic Volume Based on Grey-Markov Vo. No. Modern Apped Scence Predctng Mode of Traffc Voume Based on Grey-Marov Ynpeng Zhang Zhengzhou Muncpa Engneerng Desgn & Research Insttute Zhengzhou 5005 Chna Abstract Grey-marov forecastng mode of

More information