Part II. Support Vector Machines

Size: px
Start display at page:

Download "Part II. Support Vector Machines"

Transcription

1 Part II Support Vector Machnes 35

2 Chapter 5 Lnear Cassfcaton 5. Lnear Cassfers on Lnear Separabe Data As a frst step n understandng and constructng Support Vector Machnes e stud the case of near separabe data hch s smp cassfed nto to casses the postve and the negatve one aso knon as bnar cassfcaton. To gve a nk to an eampe mportant noadas magne the cassfcaton probem of ema nto spam or not-spam. A cacuated eampe and eampes on near non-separabe data can be found n Append B. Ths s frequent performed b usng a rea-vaued functon f : X R n R n the foong a: The nput n s assgned to the postve cass f f and otherse to the negatve one. The vector s bud up b the reevant features hch are used for cassfcaton. In our spam eampe above e need to etract reevant features certan ords from the tet and bud a feature vector for the correspondng document. Often such feature vectors consst of the counted numbers of predefned ords as n fgure 5.. If ou oud ke to earn more about tet cassfcaton / categorzaton ou can have a ook at [Joa98] here the feature vectors have dmensons n the range about 9. In ths dpoma e assume that the features are aread avaabe. We consder the case here f s a near functon of X so t can be rtten as f b 5. b here b R n R are the parameters. 36

3 Fgure 5.: Vector representaton of the sentence Take Vagra before atchng a vdeo or eave Vagra be to pa n our onne casno. These are often referred to as eght vector and bas b terms borroed from the neura netork terature. As stated n Part I the goa s to earn these parameters from the gven and aread cassfed data done b the supervsor/teacher the tranng set. So ths a of earnng s caed supervsed earnng. So the decson functon for cassfcaton of an nput n s gven b sgn f : sgn f f f postve cass - ese negatve cass Geometrca e can nterpret ths behavour as foos see fgure 5.: One can see that the nput space X s spt nto to parts b the so caed hperpane defned b the equaton b. Ths means ever nput vector sovng ths equaton s drect part of the hperpane. A Hperpane s an affne subspace 7 of dmenson n- hch dvdes the space nto to haf spaces hch correspond to the nputs of the to dstnct casses. 7 A transaton of a near subspace of n R s caed an affne subspace. For eampe an ne or pane n 3 R s an affne subspace. 37

4 In the eampe of fgure 5. n s a to dmensona nput space so the hperpane s smp a ne here. The vector therefore defnes a drecton perpendcuar to the hperpane so the drecton of the pane s unque he varng the vaue of b moves the hperpane parae to tsef. Whereb negatve vaues of b move the hperpane runnng through the orgn nto the postve drecton. Fgure 5.: A separatng hperpane b for a to dmensona tranng set. The smaer dotted nes represent the cass of hperpanes th same and dfferent vaues of b. In fact t s cear to see that f one ants to represent a possbe hperpanes n the space R the representaton s on possbe b nvovng n n free parameters n ones gven b and one b b. But the queston that arses here s hch hperpane to choose because there are man possbe as n hch t can separate the data. So e need a crteron for choosng the best one the optma separatng hperpane. The goa behnd supervsed earnng from eampes for cassfcaton can be restrcted to consderaton of the to-cass probem thout oss of generat. In ths probem the goa s to separate the to casses b a functon hch s nduced from avaabe eampes. The overa goa s to produce a cassfer b fndng parameters and b that ork e on unseen eampes.e. t generazes e. 38

5 So f the dstance beteen the separatng hperpane and the tranng ponts becomes too sma even test eampes near to the gven tranng ponts oud be mscassfed. Fgure 5.3 ustrates ths behavour. Therefore t seems that the cassfcaton of unseen data s much more successfu n settng B than n settng A. Ths observaton eads to the concept of the mama margn hperpanes or the optma separatng hperpane. Fgure 5.3: Whch separaton to choose? Amost zero margn A or arge margn B? In append B. e have a coser ook at an eampe th a smpe teratve agorthm separatng ponts from to casses b means of a hperpane the so caed Perceptron. It s on appcabe on near separabe data. There e aso fnd some mportant ssues aso stressed n the foong chapters hch have a arge mpact on the agorthms used n the Support Vector Machnes. 5. The Optma Separatng Hperpane for Lnear Separabe Data Defnton 5. Margn Consder the separatng hperpane H defned b b th both and b normased b and b b. 39

6 The functona margn γ b of an eampe th respect to H s defned as the dstance beteen and H: γ b b The margn γ S b of a set of vectors A { n } s defned as the mnmum dstance from H to the vectors n A: γ S b A mn γ b For carfcaton see fgures 5.4 and 5.5. Fgure 5.4: The functona margn of to ponts n respect to a hperpane In fgure 5.5 e have ntroduced to ne dentfers: d and d - : Let them be the shortest dstance from the separatng hperpane H to the cosest postve negatve eampe the smaest functona margn from each cass. Then the geometrc margn s defned as d d -. 4

7 Fgure 5.5: The geometrc margn of a tranng set The tranng set s therefore sad to be optma separated b the hperpane f t s separated thout an error and the dstance beteen the cosest vectors to the hperpane s mama mama margn [Vap98].. So the goa s to mamze the margn As Vapnk shoed n hs ork [Vap98] e can assume canonca hperpanes n the upcomng dscusson thout oss of generat. Ths s necessar because there ests the foong probem: For an scang parameter c : b c cb E.g. A possbe souton s Wth a parameter c of vaue 5 e get 4

8 5 5 hch can aso be soved b. So c cb descrbe the same hperpane as b do. Ths means the hperpane s not descrbed unque! For unqueness b aas need to be scaed b a factor c reatve to the tranng set. The foong constrant s chosen to do ths: mn b Ths constrant scaes the hperpane n a a such that the tranng ponts nearest to t get some mportant propert. No the sove b for of cass and on the other sde b for of cass -. A such scaed hperpane s caed a canonca hperpane. Reformuated ths means mpng correct cassfcaton: b ; 5. Ths can be transformed nto the foong constrants: b for b for Therefore t s cear to see that the hperpanes H and H n fgure 5.5 are sovng b and b. The are caed margn hperpanes. Note that H and H are parae the have the same norma as H does too and that no other tranng ponts fa beteen them n the margn! The sove mn b. 4

9 Defnton 5. Dstance The Eucdan dstance db; of a pont beongng to a cass from the hperpane b that s defned b b s { } d b; b 5.4 As stated above tranng ponts and - that are nearest to the so scaed hperpane respectve the e on H and H have the dstance d and d - - from t see fgure 5.5. Or reformuated th equaton 5.4 and constrants 5.3 ths means: b and b b and b 5.4 d and d So overa as seen n fgure 5.5 the geometrc margn of a separatng canonca hperpane s d d - and so. As stated the goa s to mamze ths margn. That s acheved b mnmsng. The transformaton to a quadratc functon of the form Φ does not change the resut but ease ater cacuaton. Ths s because e no sove the probem th hep of the Lagrangan method. There are to reasons for dong so. Frst the constrants of 5. be repaced b constrants on the Lagrangan themseves hch 43

10 be much easer to hande the are equates then. Second the tranng data on appear n the form of dot products beteen vectors hch be a cruca concept ater n generazng the method on the nonnear separabe case and the use of kernes. And so the probem s reformuated n a conve one hch s overa easer to hande b the Lagrangan method th ts dfferentatons. Summarzng e have the foong optmzaton probem to sove: Gven a near separabe tranng set S Mnmze subect to b for b for The constrants are necessar to ensure unqueness of the hperpane as mentoned above! Note: because... n... n n... n... n Aso the optmzaton probem s ndependent of the Bas b because the provded equaton 5. s satsfed.e. t s a separatng hperpane. So changng the vaue of b on moves t n the norma drecton to tsef. Accordng the margn remans unchanged but the hperpane oud no onger be optma. The probem of 5.5 s knon as conve quadratc optmzaton 8 probem th near constrants and can be effcent soved b usng the method of the Lagrange Mutpers and the duat theor see chapter 4. 8 Convet be proofed n chapter

11 45 Mamze W subect to ; The prma Lagrangan for 5.5 and the gven near separabe tranng set S s [ ] P b b L 5.6 here are the Lagrange Mutpers. Ths Lagrangan L P has to be mnmzed th respect to the prma varabes and b. As seen n chapter 4 at the sadde pont the to dervatons th respect to and b must vansh statonart P b L P b L b obtanng the foong reatons: 5.7 B substtutng the reatons 5.7 back nto L P one arrves at the so caed Wofe Dua of the optmzaton probem no on dependabe on no more and b!: [ ] W b b L D 5.8 So the dua probem for 5.6 can be formuated: Gven a near separabe tranng set S 5.9

12 Note: The matr s knon as the Gram Matr G. So the goa s to fnd parameters hch sove ths optmzaton probem. As a souton to construct the optma separatng hperpane th mama margn e obtan the optma eght vector: 5. Remark: One can thnk that up to no the probem be abe to be soved eas as the one n append C th the use of Lagrangan theor and the prma dua obectve functon. Ths coud be rght f havng nput vectors of sma dmenson e.g.. But n the rea-ord case the number of varabes be over some thousand ones. Here sovng the sstem th standard technques not be practcabe n the case of tme and memor usage of the correspondng vectors and matrces. But ths ssue be dscussed n the mpementaton chapter ater. 5.. Support Vectors Statng the Kuhn-Tucker KT condtons for the prma probem L P above 5.6 as seen n chapter 4 e get b [ L L P P b b b b ] 5. As mentoned the optmzaton probem for SVMs s a conve one a conve functon th constrants gvng a conve feasbe regon. And for conve probems the KT condtons are necessar and suffcent for b and to be a souton. Thus sovng the prma/dua probem of the SVMs 46

13 s equvaent to fndng a souton to the KT condtons for the prma 9 see chapter 3 too. The ffth reaton n 5. s knon as the KT compementar condton. In the thrd chapter on optmzaton theor an ntenton as gven on ho t orks. In the SVM s probem t has a good graphca meanng. It states that for a gven tranng pont ether the correspondng Lagrange Mutper equas zero or f not zero es on one of the margn hperpanes see fgure 5.4 and foong tet H or H : H H : : b b On them are the tranng ponts th mnma dstance to the optma separatng hperpane OSH th mama margn. The vectors ng on H or H mpng > are caed Support Vectors SV. Defnton 5.3 Support Vectors A tranng pont s caed support vector f ts correspondng Lagrange mutper >. A other tranng ponts havng ether e on one of the to margn hperpanes equat of 5. or on the sde of H or H nequat of 5.. A tranng pont can be on one of the to margn hperpanes because the compementar condton n 5. on states that that a SVs are on the margn hperpanes but not that the SVs are the on ones on them. So there ma be the case here both and b. Then the pont es on one of the to margn hperpanes thout beng a SV. Therefore SVs are the on ponts nvoved n determnng the optma eght vector n equaton 5.. So the cruca concept here s that the optma separatng hperpane s unque defned b the SVs of a tranng set. That means repeatng the tranng th a other ponts removed or moved around thout crossng H or H ead to the same eght vector and therefore to the same optma separatng hperpane. 9 on the be needed because the prma/dua probem s a equvaent one so e mamze the dua t s on dependabe on! and as a crteron take the KT condtons of the prma. 47

14 In other ords a compresson has taken pace. So for repeatng the tranng ater the same resut can be acheved b on usng the determned SVs. Fgure 5.6: The optma separatng hperpane OSH th mama margn s determned b the support vectors SV marked ng on the margn hperpanes H and H. Note that n the dua representaton the vaue of b does not appear and so the optma vaue b has to be found makng use of the prma constrants: b ; So on the optma vaue of s epct determned b the tranng procedure. Ths mpes e have optma vaues for. Therefore t s possbe to pck an a support vector and so th the substtuton n the above nequat the upper constrant becomes an equat because a support vector aas s part of a margn hperpane and b can be computed. Numerca t s safer to compute b for a and take the mean vaue or another approach as n the book [Ne]: 48

15 ma mn b 5. Note: Ths approach to compute the bas has been shon to be probematc th regard to the mpementaton of the SMO agorthm as shoed b [Ker]. Ths ssue be dscussed n the mpementaton chapter ater. 5.. Cassfcaton of unseen data After the hperpanes parameters and b have been earned th the tranng set e can cassf unseen/unabeed data ponts z. In the bnar case casses dscussed up to no the found hperpane dvdes the n R nto to regons. One here b > and the other one here b <. The dea behnd the mama margn cassfer s to determne on hch of the to sdes the test pattern es and to assgn the abe correspondng th - or as a cassfers and aso to mamze the margn beteen the to sets. Hence the used decson functon can be epressed th the optma parameters and b SV and therefore b the found/used support vectors ther correspondng > and b. So overa the decson functon of the traned mama margn cassfer for some data pont z can be formuated: f z b sgn z b sgn sgn SV z b 5.3 z b Whereb the ast reformuaton on sums over the eements tranng pont correspondng abe assocated and the bas b hch are assocated th a support vector SV because on the have > and therefore an mpact on the sum. A n a the optma separatng hperpane e get b sovng the margn optmzaton probem s a ver smpe speca case of a Support Vector Machne because t computes drect on the nput data. But t s a good startng pont for understandng the forthcomng concepts. In the net chapters the concept be generazed to nonnear cassfers and there- 49

16 fore the concept of Kerne mappng be ntroduced. But frst the adapton of the separatng hperpane on near non-separabe data be done. 5.3 The Optma Separatng Hperpane for Lnear Non-Separabe Data The agorthm above for the mama margn cassfer cannot be used n man rea-ord appcatons. In genera nos data render near separaton mpossbe but the hugest probem st be the used features n practce eadng to overappng casses. The man probem th the mama margn cassfer s the fact that t aos no cassfcaton errors durng tranng. Ether the tranng s perfect thout an errors or there s no souton at a. Hence t s ntutve that e need a a to rea the constrants of 5.3. But each voaton of the constrants needs to be punshed b a mscassfcaton penat.e. an ncrease n the prma obectve functon L P. Ths can be reazed b ntroducng the so caed postve sack varabes ξ n the constrants frst and as shon ater ntroduce an error eght C too: b ξ for b ξ for - ξ As above these to constrants can be rertten nto one: b ξ ; 5.4 So the ξ `s can be nterpreted as a vaue that measures ho much a pont fas to have a margn dstance to the OSH of. So t ndcates here a pont es compared to the separatng hperpane see fgure 5.7. < ξ b mscassfcaton < ξ < s cassfed correct but es nsde the margn 5

17 ξ s cassfed correct and es outsde the margn or on the margn boundar So a cassfcaton error s marked b the correspondng ξ eceedng unt. Therefore ξ s an upper bound on the number of tranng errors. Overa th the ntroducton of these sack varabes the goa s to mamze the margn and smutaneous mnmze mscassfcatons. To defne a penat on tranng errors the error eght C s ntroduced b C Ths parameter has to be chosen b the user. In practce C s vared through a de range of vaues and the optma performance s assessed usng a separate vadaton set or a technque caed cross-vadaton for verfng performance ust usng the tranng set. ξ. Fgure 5.7: Vaues of sack varabes: mscassfcaton f ξ s arger than the margn ξ ; correct cassfcaton of ng n the margn th < correct cassfcaton of outsde the margn or on t th ξ < ξ ; 3 5

18 So the optmzaton probem can be etended to Mnmze C ξ k subect to b ξ ; ξ 5.5 The probem s agan a conve one for an postve nteger k. Ths approach s caed the Soft Margn Generazaton he the orgna concept above s knon as Hard Margn because t aos no errors. The Soft Margn case s de adapted to the vaues of k -Norm Soft Margn and k -Norm Soft Margn Norm Soft Margn - or the Bo Constrant For k as above the prma Lagrangan can be formuated as L P b ξ ß Cξ [ b ξ ] ßξ th ; β. Note: As descrbed n chapter 4 e need another parameter ß here because of the ne nequat constrant ξ. As before the correspondng dua representaton s found b dfferentatng L P th respect to ξ and b: ξ b L L L P P P C β B resubsttutng these reatons back nto the prma e obtan the dua formuaton L D : 5

19 Gven a tranng set S Mamze LD b ξ β W subect to C 5.6 Ths probem s curous dentca to that for the mama hard margn one n 5.9. The on dfference s that C β together th β enforces C. So n the soft margn case the Lagrange mutpers are upper bounded b C. The Kuhn-Tucker compementar condtons for the prma above are: [ b ξ ] ; ξ C ; Another consequence of the KT condtons s that the mp that non-zero sack varabes ξ can on occur hen β and therefore C. The correspondng pont has a dstance ess than / from the hperpane and therefore es nsde the margn. Ths can be seen th the constrants on shon for the other case s anaogous: b b for ponts on the margn hperpane. b ξ b ξ < And therefore ponts th non-zero sack varabes have a dstance ess than /. Ponts for hch C then e eact at the target dstance of / and therefore on one of the margn hperpanes ξ. Ths aso shos that the hard margn hperpane can be attaned n the soft margn case b settng C to nfnt. 53

20 54 The fact that the Lagrange mutpers are upper bounded b the vaue of C gves the name to ths technque: bo constrant. Because the vector s constraned to e nsde the bo th sde ength C n the postve orthant. Ths approach s aso knon under SVM th near oss functon Norm Soft Margn - or Weghtng the Dagona - Ths s the case for k. But before statng the prma Lagrangan and for ease of the upcomng cacuaton note that for < ξ the frst constrant of 5.5 st hods f ξ. Hence e st obtan the optma souton hen the postvt constrant on ξ s removed. So ths eads to the foong prma Lagrangan: P b C b C b L ] [ ξ ξ ξ ξ ξ th the Lagrange mutpers agan. As before the correspondng dua s found b dfferentatng th respect to ξ and b mposng statonart.e. settng to zero: P P P L b C L L ξ ξ and agan resubsttutng the reatons back nto the prma to obtan the dua formuaton L D : ξ C C C b L D

21 Usng the equaton δ δ here δ s the Kronecker Deta hch s defned to be f and otherse. So on the rght sde of above equaton nsertng changes nothng at the resut because s ether or - and δ s the same as rtng and so e smp mutp etra b but can smpf L D to get the fna probem to be soved: Gven a tranng set S L D Mamze b ξ W δ C subect to ; 5.7 The compementar KT condtons for the prma probem above are [ b ξ ] ; Ths hoe probem can be soved th the same methods used for the mama margn cassfer. The on dfference s the addton of /C to the dagona of the Gram matr G. On on the dagona because of the Kronecker Deta. Ths approach s aso knon under SVM th quadratc oss functon. Summarzng ths subchapter t can be sad that the soft margn optmzaton s a compromse beteen tte emprca rsk and mama margn. For an eampe ook at fgure 5.8. The vaue of C can be nterpreted as representng the trade-off beteen mnmzng the tranng set error and mamzng the margn. So a n a b usng C as an upper bound on the Lagrange mutpers the roe of outers s reduced b preventng a pont from havng too arge Lagrange mutpers. 55

22 a b 56

23 c Fgure 5.8: Decson boundares arsng hen usng a Gaussan kerne th fed vaue of σ n the three dfferent machnes: a the mama margn SVM b the -norm soft margn SVM and c the -norm soft margn SVM. The data are an artfca created to dmensona set. The bue dots beng postve eampes and the red ones negatve eampes. 5.4 The Duat of Lnear Machnes Ths secton s ntended to stress the fact that as used and remarked severa tmes before. The near machnes ntroduced above can be formuated n a dua descrpton. Ths reformuaton turn out to be cruca n the constructon of the more poerfu generazed Support Vector Machnes beo. But hat does duat of cassfers mean? As seen n the former chapter the norma vector can be represented as a near combnaton of the tranng ponts: th S the gven tranng set aread cassfed b the supervsor. The ere ntroduced n the used Lagrange a to fnd a 57

24 souton to the margn mamzaton probem. The ere caed the dua varabes of the probem and therefore the fundamenta unknons. On the a to the souton e then obtan W and the reformuated decson functon for unseen data z of 5.3: f z b sgn z b sgn z b The cruca observaton here s that the tranng and test ponts never act through ther ndvdua attrbutes. These ponts on appear as entres n the Gram Matr G n the tranng phase and ater n the test phase the on appear n an nner product th the tranng ponts z. 5.5 Vector/Matr Representaton of the Optmzaton Probem and Summar 5.5. Vector/Matr Representaton To gve a frst mpresson on ho the above probems can be soved usng a computer the probems be formuated n the equvaent notaton th vectors and matrces. Ths notaton s more practca understandabe and are used n man mpementatons. As descrbed above the conve quadratc optmzaton probem hch arses for hard C -norm C < and -norm change the Gram matr b means of addng /C to the dagona margn s the foong: 58

25 Mamze LD b ξ β W subect to C Ths probem can be epressed as: Mamze subect to e T Q C T T 5.8 here e s the vector of a ones C > the upper bound Q s a b postve semdefnte matr Q. And th a correct tranng set S th the ength of 5.8 oud ook ke:... n Q Q... Q Q Q Q 5.5. Summar As seen n chapter 4 quadratc probems th a so caed postve sem- defnte matr are conve functons. Ths aos the cruca concepts of soutons to conve functons to be adapted see chapter 4: conve KT. Semdefnte: For each Q net page for epanaton T Q has non-negatve egenvaues. Aso see 59

26 In former chapters the convet of the obectve functon has been assumed thout proof. So et M be an possb non-square matr and set A M T M. Then A s a postve sem-defnte matr snce e can rte T T T T A M M M M M M M 5.9 for an vector. If e take M to be the matr hose coumns are the vectors then A s the Gram Matr of the set S shong that Gram Matrces are aas postve sem-defnte. And therefore the above matr Q aso s postve sem-defnte. Summarzed the probem to be soved up to no can be stated as Mamze LD b ξ β W subect to C 5. th the partcuar smpe prma KT condtons as crterons for a souton to the -norm optmzaton probem: C < < C C b b b 5. Notce that the sack varabes ξ do not need to be computed for ths case because as seen n chapter 5.3. the on be non-zero f C and β. And so reca the prma of ths chapter stated as LP b ξ ß Cξ [ b ξ ] ßξ Then set C β so the thrd sum s zero and from the second 6

27 6 sum e get C ξ hch s equvaent to C ξ and so t be deeted and no sack varabe s there anmore. For the mama margn case the condtons be: < b b 5. And ast but not east for the -norm case: C b b < 5.3 The ast condton s reformuated b means of mpct defnng ξ th hep of the prma KT condton ξ ξ C L P of chapter 5.3. and therefore C ξ. And th the compementar KT condton ] [ b ξ the thrd condton above s ganed. As seen n the soft margn chapters ponts for hch the second equaton hods are Support Vectors on one of the margn hperpanes and for hch the thrd one hods are nsde the margn therefore caed margn-errors. These KT condtons be used ater and proof to be mportant hen mpementng agorthms for computatona numerca sovng the probem of 5.. Because a pont s an optmum of 5. f and on f the KT condtons are fufed and Q s postve sem-defnte. The second requrement s proven above. And after the tranng process the sovng of the quadratc optmzaton probem and as a souton gettng the vector and therefore bas b the cassfcaton of unseen data z s performed b

28 f z b sgn z b sgn sgn SV z b 5.3 z b here the are the tranng ponts th ther correspondng greater than zero and upper bounded b C and therefore support vectors. As one can thnk no the queston arsng here s h aas cassf ne data b the use of the and h not smp savng the resutng eght vector? Sure up to no t be possbe to do that and so no further need of havng to store the tranng ponts and ther abes. But as seen above there be ver fe support vectors norma and on them and ther correspondng and are necessar to reconstruct. But the man reason be gven n chapter 5 here e see that e must use the and not smp store. To gve a short nk to the mpementaton ssue dscussed ater t can be sad that n most cases the -norm s used because n rea-ord appcatons ou norma not have nose-free near separabe data and therefore the mama margn approach not ead to satsfactor resuts. But the man probem s st the seecton of the used feature data n practce. The -norm s used n feer cases because t s not eas to ntegrate n the SMO agorthm dscussed n the mpementaton chapter. 6

29 Chapter 6 Nonnear Cassfers The ast chapter shoed ho the near cassfers can eas be computed b means of standard optmzaton technques. But near earnng machnes are restrcted because of ther mted computatona poer as hghghted n the 96 s b Mnsk and Papert. Summarzed t can be stated that rea-ord appcatons requre more epressve hpothess spaces than near functons. Or n other ords the target concept ma be too compe to be epressed as a smpe near combnaton of the gven attrbutes That s hat near machnes do equvaent to: the decson functon s not a near functon of the data. Ths probem can be overcome b the use of the so caed kerne technque. The genera dea s to map the nput data nonnear to a near aas hgher dmensona space and then separate t ther b near cassfers. Therefore ths resut n a nonnear cassfer n nput space see fgure 6.. Another souton to ths probem has been proposed n the neura netork theor: Mutpe aers of threshoded near functons hch ed to the deveopment of mut-aer neura netorks. Fgure 6.: Smper cassfcaton task b a feature mapφ. -dmensona Input space on the eft -dmensona feature space on the rght here e are abe to separate b a near cassfer hch eads to the nonnear cassfer n nput space. 63

30 6. Epct Mappngs No the representaton of tranng eampes be changed b mappng the data to a possb nfnte dmensona Hbert space F. Usua the space F have a much hgher dmenson than the nput space X. The mappng Φ : X F s apped to each abeed eampe before tranng and then the optma separatng hperpane s constructed n the space F. Φ : X n n F... a Φ Φ... Φ 6. Ths s equvaent to mappng the hoe nput space X nto F. The components of Φ are caed features he the orgna quanttes are sometmes referred to as the attrbutes. F s caed the feature space. The task of choosng the most sutabe representaton of the data s knon as feature seecton. Ths can be a ver dffcut task. There are dfferent approaches estng to feature seecton. Frequent one seeks to dentf the smaest set of features that st conves the essenta nformaton contaned n the orgna attrbutes. Ths s knon as dmensonat reducton... a Φ Φ... Φ d n 6. n d < and can be ver benefca as both computatona and generazaton performance can degrade as the number of features gros a phenomenon knon as the curse of dmensonat. The dffcutes one s facng th hgh dmensona feature spaces s that snce the arger the set of probab redundant features s the more ke t s that the functon to be earned coud be represented usng a standardsed earnng machne. Another approach to feature seecton s the detecton of rreevant features and ther emnaton. As an eampe consder the gravtaton a hch on uses nformaton about the masses and the poston of to bodes. So an rreevant feature oud be the coour or the temperature of the to bodes. So as a ast ord to sa on feature seecton t shoud be consdered e as a part of the earnng process. But t s aso natura a somehat arbtrar step hch needs some pror knoedge on the underng target functon. Therefore recent research has been done on the technques for Is a vector space th some more restrctons. A space H s separabe f there est a countabe subset D H such that ever eement of H s the mt of a sequence of eements of D. A Hbert space s a compete separabe nner product space. Fnte dmen- n sona vector spaces ke R are Hbert spaces. Ths space be descrbed n deta a tte further n ths chapter and for further readngs see [Ne]. 64

31 feature reducton. Hoever n the rest of the dpoma e do not tak about the feature seecton technques because as Chrstann and Shae- Taor proofed n ther book [Ne] e can afford to use nfnte dmensona feature spaces and avod computatona probems b the means of the mpct mappng descrbed n the net chapter. So the curse of dmensonat can be sad to be rreevant b mpct mappng the data aso knon as the Kerne Trck. Before ustratng the mappng th an eampe frst notce that the on a n hch data appears n the tranng probem s n the form of dot products. No suppose ths data s frst mapped to some other possbe nfnte dmensona space F usng the mappng of 6.: Φ : R n F Then of course as seen n 6. and 6. the tranng agorthm oud on depend on the data through dot products n F.e. on functons of the form Φ Φ a other varabes are scaars. Second there s no vector mappng to va Φ but e can rte n the form Φ and the hoe hpothess decson functons be of the tpe f sgn Φ b or reformuated f sgn Φ Φ b. So a support vector machne s constructed hch ves n the ne hgher dmensona space F but a the consderatons of the former chapters st hod snce e are st dong a near separaton but n dfferent space. But no a smpe eampe th epct mappng. Consder a gven tranng set S of ponts n R th cass abes and -: S { }. Trva these three ponts are not separabe b a hperpane here a pont n R see fgure So frst the data s nonnear mapped to the R b appng Input dmenson s therefore the hperpane s of the dmenson and dm s defned as. 65

32 66 R R : 3 Φ Fgure 6.: A non-separabe eampe n the nput space R. The hperpane oud be a snge pont but t cannot separate the data ponts. Ths Step resuts n a tranng set consstng of the vectors th the correspondng abes -. As ustrated n fgure 6.3 the souton n the ne space 3 R can be eas seen geometrca n the Φ Φ -pane see fgure 6.4. It s therefore hch s amost normazed et meanng t has a ength of and the bas b becomes b -.5 negatve b means movng the hperpane runnng through the orgn n the postve drecton. So t can be seen that the earnng task can be eas soved n the 3 R b near separaton. But ho does the decson functon ook ke n the orgna space R here e need t?

33 Remember that can be rtten n the form Φ. Fgure 6.3: Creaton of a separatng hperpane.e. a pane n the ne space 3 R. Fgure 6.4: Lookng at the ΦΦ -pane the souton to and b can be eas gven b geometrc nterpretaton of the pcture. 67

34 68 And n our partcuar eampe t can be rtten as : 3 Φ And orked out: 3 3 The sovng vector 3 s then. Wth the equaton Φ Φ 6.3 The hperpane n R then becomes th the orgna tranng ponts n R : b z ; 3 3 z z z z z z z z Φ Φ Ths eads to the nonnear hperpane n R consstng of to ponts: z and z. As seen n equaton 6.3 the nner product n the feature space has a equvaent functon n the nput space. No e ntroduce an abbrevaton for the dot product n feature space: Φz Φ z : K 6.4 Cear that f the feature space s ver hgh-dmensona or even nfnte dmensona the rght-hand sde of 6.4 be ver epensve to com-

35 pute. The observaton n 6.3 together th the probem descrbed above motvates to search for as to evauate nner products n feature space thout makng drect use of the feature space nor the mappng Φ. Ths approach eads to the terms Kerne and Kerne Trck. 6. Impct Mappngs and the Kerne Trck Defnton 6. Kerne Functon Gven a mappng Φ : X F from nput space X to an nner product feature space 3 F e ca the functon K : X X R a kerne functon f for a z X K z Φ Φz. 6.5 The kerne functon then behaves ke an nner product n feature space but can be evauated as a functon n nput space. For eampe take the ponoma kerne K. No assume e have got d and R orgna nput space so e get: d 3 Inner product space: A vector space X s caed a nner product space f there ests a bnear map near n each argument that for each to eements X gves a rea number denoted b satsfng the foong propertes: and e.g.: n... n X R... λ are fed postve numbers. Then the foong defnes a vad nner product: n n λ T A here A s the n n dagona on dagona non-zero matr th non-zero entres λ. A 69

36 7 Φ Φ 6.6 So the data s mapped to the 3 R. But the second ne can be eft out b mpct cacuatng Φ Φ th the vectors n nput space: hat s the same as n the above cacuaton frst mappng the nput vectors to the feature space and then cacuatng the dot product: Φ Φ. So b mpct mappng the nput vectors to the feature space e are abe to cacuate the dot product there thout even knong the underng mappng Φ! Summarzed t can be stated that b mpct performng such a non-near mappng to a hgher dmensona space t can be done thout ncreasng the number of parameters because the kerne functon computes the nner product n feature space on b use of the to nputs n nput space. To generaze here a ponoma kerne d K th d and attrbutes n nput space of dmenson n maps the data to a feature space of dmenson d d n 4. In the eampe of 6.6 ths means th n and d : 4!!! k n k n k n caed the bnoma coeffcent

37 n d d 3!!! 3 3 And as can be seen above the data s rea mapped from the 3 R. R to the In fgure 6.5 the hoe ne procedure for cassfcaton of an unknon pont z s shon after tranng of the kerne-based SVM and therefore havng the optma eght vector defned b the s the correspondng tranng ponts and ther abes and the bas b. Fgure 6.5: The hoe procedure for cassfcaton of a test vector z n ths eampe the test and tranng vectors are smpe dgts. To stress the mportant facts summarzng t can be sad that n contrast to the eampe n chapter 6. the chan of arguments s nverted n that a that there e started b epct defnng a mappng Φ before appng the earnng agorthm. But no the startng pont s choosng a kerne functon K hch mpct defnes the mappng Φ and therefore avodng the feature space n the computaton of nner products as e n the hoe desgn of the earnng machne tsef. As seen above both the earnng and test step on depend on the vaue of nner products n feature space. Hence as shon the can be formuated n terms of kerne functons. So once such a kerne functon has been chosen the decson functon for unseen data z 5.3 becomes: 7

38 f z sgn sgn Φ z b K z b 6.7 And as sad before as a consequence e do not need to kno the underng feature map to be abe to sove the earnng task n feature space! Remark: As remarked n chapter 5 the consequence of usng kernes s that no the drect storng of the resutng eght vector s not practcabe because as seen n 6.7 above then e have to kno the mappng and cannot use the advantage arsng b the usage of kernes. But hch functons can be chosen as kernes? 6.. Requrements for Kernes - Mercer s Condton - As a frst requrement for a functon to be chosen as a kerne defnton 6.5 gves to condtons because the mappng has to be to an nner product feature space. So t can be eas seen that K has to be a smmetrc functon: K z Φ Φz Φz Φ K z 6.8 And another condton for an nner product space s the Scharz Inequat: K z Φ Φz Φ Φ Φ Φz Φz Φz K K zz 6.9 Hoever these condtons are not suffcent to guarantee the estence of a feature space. Here Mercer s Theorem gves suffcent condtons Vapnk 995; Courant and Hbert 953. The foong formuaton of Mercer s Theorem s gven thout proof as t s stated n the paper of [Bur98]. 7

39 Theorem 6. Mercer s Theorem There est a mappng Φ and an epanson K Φ Φ Φ Φ f and on f for an g such that g d < s fnte 6. then K g g dd 6. Note: 6. has to hod for ever g satsfng 6.. Ths theorem s aso suffcent for the nfnte case. Another smpfed condton for K to be a kerne n the fnte case can be seen from and hen descrbng K th ts egenvectors and egenvaues The proof s gven n [Ne]. Proposton 6.3 Let X be a fnte nput space th K z a smmetrc functon on X. Then K z s a kerne functon f and on f the matr K s postve sem-defnte. Therefore Mercer s Theorem s an etenson of ths proposton based on the stud of ntegra operator theor. 6.. Makng Kernes from Kernes Theorem 6. s the basc too for verfng that a functon s a kerne. The remarked proposton 6.3 gves the requrement for a fnte set of ponts. No ths crteron for a fnte set s apped to confrm that a number of ne kernes can be created. The net proposton of Chrstann and John Shae-Taor [Ne] aos creatng more compcated kernes from smpe budng bocks: 73

40 Proposton 6.4 Let KandK be kernes over a rea-vaued functon on X X n X X R a R f Φ : X R m th K 3 a kerne over R m R m p a ponoma th postve coeffcents and B a smmetrc postve sem-defnte n n matr. Then the foong functons are kernes too: K z K z K z K z ak z K z K z K z K z ffz K z K 3 Φ Φz 6. K z T Bz K z p K z K z ep K z 6..3 Some e-knon Kernes The seecton of a kerne functon s an mportant probem n appcatons athough there s no theor to te hch kerne hen to use. Moreover t can be ver dffcut to check that some partcuar kerne satsfes Mercer s condtons snce the must hod for ever g satsfng 6.. In the foong some e knon and de used kernes are presented. Seecton of the kerne perhaps from among the presented ones s usua based on eperence and knoedge about the cassfcaton probem at hand and aso theoretca consderatons. The probem of choosng a kerne and ts parameters on the bass of theoretca consderatons be dscussed n chapter 7. Each kerne be epaned beo. 74

41 Ponoma K z p z c 6.3 Sgmod K z tanh κ z δ 6.4 Rada Bass Functon z - Gaussan Kerne - K z ep 6.5 σ Ponoma Kerne Here p gves the degree of the ponoma and c s some non-negatve constant usua c. Usage of another generazed nner-product nstead of the standard nner product above as proposed n man other orks on SVMs because the Hessan matr then becomng zero n numerca cacuatons ths means no souton for the optmzaton probem. Then the Kerne become: K z z σ c p Where the vector σ s such chosen that the functon satsfes Mercer s condton. 75

42 Fgure 6.6: A ponoma kerne of degree used for the cassfcaton of the nonseparabe XOR-data set n nput space b a near cassfer. Each coour represents one cass and the dashed nes mark the margns. The eve of shadng ndcates the functona margn or n other ords: The darker the shadng of one coour representng a specfc cass the more confdent the cassfer s that ths pont n that regon beongs to that cass Sgmod-functon The Sgmod kerne stated above usua satsfes Mercer s condton on for certan vaues of the parameters κ and δ. Ths as notced epermenta b Vapnk. Current there are no theoretca resuts on the parameter vaues that satsf Mercer s condtons. As stated n [Pan] the usage of the sgmod kerne th the SVM can be regarded as a to-aer neura netork. In such netorks the nput vector z s mapped b the frst aer nto the vector F F...FN here F tanh κ z δ N and the dmenson of F s caed the number of the Hdden Unts. In the second aer the sgn of the eghted sum of the eements of F s cacuated b usng eghts γ. Fgure 6.7 ustrates that. The man dfference to notce beteen SVMs and to-aer neura netorks s the dfferent optmzaton crteron: In the SVM case the goa s to fnd the optma separatng hperpane hch mamzes the margn n the feature space he n a to-aer neura netork the crteron s usua to mnmze the emprca rsk assocated th some oss functon tpca the mean squared error. 76

43 F γ z F γ ŷ / -... γ N F N Fgure 6.7: A -aer neura netork th N hdden unts. Output of the frst aer are of the form F tanh κ z δ N. Whe the output of the hoe netork then becomes ˆ sgn γ F b. N Another mportant notce shoud be gven here: In neura netorks the optma netork archtecture s qute often unknon and most found on b eperments and/or pror knoedge he n the SVM case such probems are avoded. Here the number of hdden unts s the same as the number of support vectors and the vector of the eghts n the output aer γ are a determned automatca n the near separabe case n feature space Rada Bass Functon Gaussan The Gaussan Kerne s aso knon as the Rada Bass Functon. In the above functon 6.5 σ varance defnes a so caed ndo dth dth of the Gaussan. Sure t s possbe to have dfferent ndo dths for dfferent vectors meanng to use a vector σ see [Cha]. As some orks sho [Ln3] the RBF-kerne be a good startng pont for a frst tr f one knos near nothng about the data to cassf. The man reasons be stated n the upcomng chapter 7 here aso the parameter seecton be dscussed. 77

44 Fgure 6.8: A SVM th a Gaussan Kerne a vaue of sgma σ. and th the appcaton of the mama margn case C nf on an artfca generated tranng set. Another remark to menton here s that up to no the agorthm and so the above ntroduced cassfers are on ntended for the bnar case. But as e see n chapter 8 ths can be eas etended to the Mutcass case. 6.3 Summar Kernes are a ver poerfu too hen deang th nonnear separabe datasets. The usage of the Kerne Trck has ong been knon and therefore as studed n deta. B ts usage the probem to sove no st stas the same as n the prevous chapters but the dot product n the formuas s rertten usng the mpct kerne mappng. So the probem can be stated as: L Mamze b ξ β W D K ; subect to C

45 79 And th the same KT-condtons as n the summar under Then the overa decson functon for some unseen data z becomes: 6.7 Note: Ths Kerne-representaton no be used and to gve the nk to the near case of chapter 5 here ; K s repaced b ths kerne be caed the Lnear Kerne. ; sgn ; sgn sgn SV b K b K b b f z z z z

46 Chapter 7 Mode Seecton Though as ntroduced n the ast chapter thout budng an on kerne based on the knoedge about the probem at hand as a frst tr t s ntutve to use the four common and e knon kernes. Ths approach s man used as the eampes n append A sho. But as a frst step there s the choce hch kerne to use for the begnnng. Afterards the penat parameter C and the kerne parameters have to be chosen too. 7. The RBF Kerne As suggested n [Ln3] the RBF kerne s n genera a reasonabe frst choce. Athough f the probem at hand s near the same as some aread formdabe soved ones hand dgt recognton face recognton hch are documented n deta a frst tr shoud be gven to the kernes used there. But the parameters most have to be chosen n other ranges appcabe to the actua probem. Some eampes of such aread soved probems and nks to further readngs about them be gven n append A. As shon n the ast chapter the RBF kerne as others maps sampes nto a hgher dmensona space so t s abe to n contrast to the near kerne hande the case here the reaton beteen cass abes and the attrbutes s nonnear. Furthermore the near kerne s a speca case of the RBF one as [Ke3] shos that the near kerne th a penat parameter C has the same performance as the RBF kerne th some parameters C γ 5. In addton the sgmod kerne behaves ke RBF for certan parameters [L3]. Another reason s the number of hperparameters hch nfuence the compet of mode seecton. The ponoma kerne has more of them than the RBF kerne. 5 γ σ 8

47 Fna the RBF kerne has ess numerca dffcutes. One ke pont s < K n contrast to ponoma kernes of hch the kerne vaues ma go toards nfnt. Moreover as sad n the ast chapter the sgmod kerne s not vad.e. not the nner product of to vectors under some parameters. 7. Cross-Vadaton In the case of RBF kernes there are to tunng parameters: C and σ. It s not knon beforehand hch vaues for them are the best for the probem at hand. So some parameter search must be done to dentf the optma ones. Optma ones means fndng C and σ so that the cassfer can accurate predct unknon data after tranng.e. testng data. In ths a t not be usefu to acheve hgh tranng accurac b the cost of generazaton abt. Therefore a common a s to separate the tranng data nto to parts of hch one s consdered unknon n tranng the cassfer. Then the predcton accurac on ths set can more precse refect the performance on cassfng unknon data. An mproved verson of ths technque s knon as cross-vadaton. In the so caed k-fod cross-vadaton the tranng set s dvded nto k subsets of equa sze. Sequenta one subset s tested usng the cassfer traned on the remanng k subsets. Thus each nstance of the hoe tranng set s predcted once so the cross-vadaton accurac s the percentage of data hch are correct cassfed. The man dsadvantage of ths procedure s ts computatona ntenst because the mode has to be traned k tmes. A more smpe technque can be etracted from ths mode b choosng k-fod cross-vadaton th k. Ths means sequenta removng the -th tranng subset and tran th the remanng subsets. Ths procedure s knon as Leave-One-Out oo. Another technque s knon as Grd-Search. Ths approach has been chosen b [Ln3]. The man dea behnd ths s to basca tr pars of C γ and the one th the best cross-vadaton accurac s pcked. Mentoned n ths paper s the observaton that trng eponenta grong sequences of C and γ s a practca a to fnd good parameters. Ths means e.g. C... ; γ.... Sure ths search method s straghtforard and stupd n some a. But as sad n the paper above sure there are advanced technques for grd-searchng but the are an ehaustve parameter search b appromaton or heurstcs. Another reason s that t has been shon that the computatona tme to fnd good parameters b the orgna grd-search s not much more than that b advanced methods snce there are st the same to parameters to be optmzed. 8

48 Chapter 8 Mutcass Cassfcaton Up to no the stud has been mted to the to-cass case caed the bnar case here on to casses of data have to be separated. Hoever n rea-ord probems there are n genera m casses to dea th. The n tranng set st conssts of pars here R but no {... n} th. The frst straghtforard dea be to reduce the Mutcass probem to man to-cass probems so each resutng cass s separated from the remanng ones. 8. One-Versus-Rest OVR So as mentoned above the frst dea for a procedure to construct a mutcass cassfer s the constructon of n to-cass cassfers th foong decson functons: f z sgn k z b ; k n 8. k k Ths means that the cassfer for cass k separates ths cass from a other casses: f k f beongs to cass k - otherse So the step-b-step procedure starts th cass one: construct frst bnar cassfer for cass postve versus a others negatve cass versus a others. cass k n versus a others. The resutng combned OVR decson functon chooses the cass for a sampe that corresponds to the mamum vaue of k bnar decson functons.e. the furthest postve hperpane. For carfcaton see fgure 8. and tabe 8.. Ths hoe frst approach to gan a Mutcass cassfer s computatona ver epensve because there s need of sovng n 8

49 quadratc programmng QP optmzaton probems of sze tranng set sze no. As an eampe consder a three-cass probem th near kerne ntroduced n fgure 8.. The OVR method eds a decson surface dvded b three separatng hperpanes the dashed nes. The shaded regons n the fgure correspond to te stuatons here to or none cassfers are actve.e. vote postve at the same tme aso see tabe 8.. Fgure 8.: OVR apped to a three-cass A B C eampe th near kerne No consder the cassfcaton of a ne unseen sampe heagona n fgure 8. n the ambguous regon 3. Ths sampe receves postve votes from both the A-cass and C-cass bnar cassfers. Hoever the dstance of the sampe from the A-cass-vs.-a hperpane s arger than from the C-cass-vs.-a one. Hence the sampe s cassfed to beong to the A cass. In the same a the ambguous regon 7 th no votes s handed. So the fna combned OVR decson functon resuts n the decson surface separated b the sod ne n fgure 8.. Notce hoever that ths fna decson functon sgnfcant dffers from the orgna one hch corresponded to the souton of k here 3 QP optmzaton probems. The maor draback here s therefore that on three ponts back bas n fgure 8. of the resutng bordernes concde th the orgna ones cacuated b the n Support Vector Machnes. So t seems that the benefts of mama 83

50 margn hperpanes are ost. Summarzed t can be sad that ths s the smpest Mutcass SVM method [Krs99 and Stat]. Regon A vs. B and C Decson of the cassfer B vs. A and C C vs. A and B Resutng cass - B C? - - C C 3 A - C? 4 A - - A 5 A B -? 6 - B - B ? Tabe 8.: Three bnar OVR cassfers apped to the correspondng eampe fgure 8.. The coumn Resutng cass contans the resutng cassfcaton of each regon. Ces th? correspond to te stuatons hen to or none cassfer are actve at the same tme. See tet for ho tes are resoved. 8. One-Versus-One OVO The dea behnd ths approach s to construct a decson functon f : R n { } for each par of casses k m ; k m n: km f km f beongs to cass k - f beongs to cass m n n n So n tota there are pars because ths technque nvoves the constructon of the standard bnar cassfer for a pars of casses. In other ords for ever par of casses a bnar SVM s soved th the underng optmzaton probem to mamze the margn. The decson functon therefore assgns an nstance to a cass hch has the argest number of votes after the sampe has been tested aganst a decson func- 84

51 n n n tons. So the cassfcaton no nvoves comparsons and n each one the cass to hch the sampe beongs n that bnar case get s a added to ts number of votes Ma Wns strateg. Sure that there can st be te stuatons. In such a case the sampe be assgned based on the cassfcaton provded b the furthest hperpane as n the OVR case [Krs99 and Stat]. As some researchers have proposed ths can be smpfed b choosng the cass th the oest nde hen a te occurs because even then the resuts are st most accurate and appromated enough [Ln3] thout addtona computaton of dstances. But ths has to be verfed for the probem at hand. The man beneft of ths approach s that for ever par of casses the optmzaton probem to dea th s much smaer.e. n tota there s on need of sovng nn-/ QP probems of sze smaer than tranng set sze because there are on to casses nvoved and not the hoe tranng set n each probem as n the OVR approach. Agan consder the three-cass eampe from the prevous chapter. Usng the OVO technque th a near kerne a decson surface s dvded b three separate hperpanes dashed nes obtaned b the bnar SVMs see fgure 8.. The appcaton of Ma Wns strateg see tabe 8. resuts n the dvson of the decson surface nto three regons separated b the thcker dashed nes and the sma shaded ambguous regon n the mdde. After the te-brakng strateg from above furthest hperpane apped to the ambguous regon 7 n the mdde the fna decson functon becomes the sod back nes and the thcker dashed ones together. Notce here that the fna decson functon does not dffer sgnfcant from the orgna one correspondng to the souton of nn-/ optmzaton probems. So the man advantage here n contrast to the OVR technque s the fact that the fna bordernes are parts of the cacuated par se decson functons hch as not the case n the OVR approach. 85

MARKOV CHAIN AND HIDDEN MARKOV MODEL

MARKOV CHAIN AND HIDDEN MARKOV MODEL MARKOV CHAIN AND HIDDEN MARKOV MODEL JIAN ZHANG JIANZHAN@STAT.PURDUE.EDU Markov chan and hdden Markov mode are probaby the smpest modes whch can be used to mode sequenta data,.e. data sampes whch are not

More information

Diplomarbeit. Support Vector Machines in der digitalen Mustererkennung

Diplomarbeit. Support Vector Machines in der digitalen Mustererkennung Fachberech Informatk Dpomarbet Support Vector Machnes n der dgtaen Mustererkennung Ausgeführt be der Frma Semens VDO n Regensburg vorgeegt von: Chrstan Mkos St.-Wofgangstrasse 9305 Regensburg Betreuer:

More information

Associative Memories

Associative Memories Assocatve Memores We consder now modes for unsupervsed earnng probems, caed auto-assocaton probems. Assocaton s the task of mappng patterns to patterns. In an assocatve memory the stmuus of an ncompete

More information

Pattern Classification

Pattern Classification Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Image Classification Using EM And JE algorithms

Image Classification Using EM And JE algorithms Machne earnng project report Fa, 2 Xaojn Sh, jennfer@soe Image Cassfcaton Usng EM And JE agorthms Xaojn Sh Department of Computer Engneerng, Unversty of Caforna, Santa Cruz, CA, 9564 jennfer@soe.ucsc.edu

More information

Numerical integration in more dimensions part 2. Remo Minero

Numerical integration in more dimensions part 2. Remo Minero Numerca ntegraton n more dmensons part Remo Mnero Outne The roe of a mappng functon n mutdmensona ntegraton Gauss approach n more dmensons and quadrature rues Crtca anass of acceptabt of a gven quadrature

More information

Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06

Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06 Dervng the Dua Prof. Bennett Math of Data Scence /3/06 Outne Ntty Grtty for SVM Revew Rdge Regresson LS-SVM=KRR Dua Dervaton Bas Issue Summary Ntty Grtty Need Dua of w, b, z w 2 2 mn st. ( x w ) = C z

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

Neural network-based athletics performance prediction optimization model applied research

Neural network-based athletics performance prediction optimization model applied research Avaabe onne www.jocpr.com Journa of Chemca and Pharmaceutca Research, 04, 6(6):8-5 Research Artce ISSN : 0975-784 CODEN(USA) : JCPRC5 Neura networ-based athetcs performance predcton optmzaton mode apped

More information

CHAPTER 4. Vector Spaces

CHAPTER 4. Vector Spaces man 2007/2/16 page 234 CHAPTER 4 Vector Spaces To crtcze mathematcs for ts abstracton s to mss the pont entrel. Abstracton s what makes mathematcs work. Ian Stewart The man am of ths tet s to stud lnear

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

Linear discriminants. Nuno Vasconcelos ECE Department, UCSD

Linear discriminants. Nuno Vasconcelos ECE Department, UCSD Lnear dscrmnants Nuno Vasconcelos ECE Department UCSD Classfcaton a classfcaton problem as to tpes of varables e.g. X - vector of observatons features n te orld Y - state class of te orld X R 2 fever blood

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Note 2. Ling fong Li. 1 Klein Gordon Equation Probablity interpretation Solutions to Klein-Gordon Equation... 2

Note 2. Ling fong Li. 1 Klein Gordon Equation Probablity interpretation Solutions to Klein-Gordon Equation... 2 Note 2 Lng fong L Contents Ken Gordon Equaton. Probabty nterpretaton......................................2 Soutons to Ken-Gordon Equaton............................... 2 2 Drac Equaton 3 2. Probabty nterpretaton.....................................

More information

Cyclic Codes BCH Codes

Cyclic Codes BCH Codes Cycc Codes BCH Codes Gaos Feds GF m A Gaos fed of m eements can be obtaned usng the symbos 0,, á, and the eements beng 0,, á, á, á 3 m,... so that fed F* s cosed under mutpcaton wth m eements. The operator

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Support Vector Machines

Support Vector Machines CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at

More information

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students.

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students. Exampe: Suppose we want to bud a cassfer that recognzes WebPages of graduate students. How can we fnd tranng data? We can browse the web and coect a sampe of WebPages of graduate students of varous unverstes.

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them? Image classfcaton Gven te bag-of-features representatons of mages from dfferent classes ow do we learn a model for dstngusng tem? Classfers Learn a decson rule assgnng bag-offeatures representatons of

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory Advanced Scence and Technoogy Letters Vo.83 (ISA 205), pp.60-65 http://dx.do.org/0.4257/ast.205.83.2 Research on Compex etworks Contro Based on Fuzzy Integra Sdng Theory Dongsheng Yang, Bngqng L, 2, He

More information

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks Shengyang Sun, Changyou Chen, Lawrence Carn Suppementary Matera: Learnng Structured Weght Uncertanty n Bayesan Neura Networks Shengyang Sun Changyou Chen Lawrence Carn Tsnghua Unversty Duke Unversty Duke

More information

Dynamic Analysis Of An Off-Road Vehicle Frame

Dynamic Analysis Of An Off-Road Vehicle Frame Proceedngs of the 8th WSEAS Int. Conf. on NON-LINEAR ANALYSIS, NON-LINEAR SYSTEMS AND CHAOS Dnamc Anass Of An Off-Road Vehce Frame ŞTEFAN TABACU, NICOLAE DORU STĂNESCU, ION TABACU Automotve Department,

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

A General Column Generation Algorithm Applied to System Reliability Optimization Problems

A General Column Generation Algorithm Applied to System Reliability Optimization Problems A Genera Coumn Generaton Agorthm Apped to System Reabty Optmzaton Probems Lea Za, Davd W. Cot, Department of Industra and Systems Engneerng, Rutgers Unversty, Pscataway, J 08854, USA Abstract A genera

More information

Application of support vector machine in health monitoring of plate structures

Application of support vector machine in health monitoring of plate structures Appcaton of support vector machne n heath montorng of pate structures *Satsh Satpa 1), Yogesh Khandare ), Sauvk Banerjee 3) and Anrban Guha 4) 1), ), 4) Department of Mechanca Engneerng, Indan Insttute

More information

The University of Auckland, School of Engineering SCHOOL OF ENGINEERING REPORT 616 SUPPORT VECTOR MACHINES BASICS. written by.

The University of Auckland, School of Engineering SCHOOL OF ENGINEERING REPORT 616 SUPPORT VECTOR MACHINES BASICS. written by. The Unversty of Auckand, Schoo of Engneerng SCHOOL OF ENGINEERING REPORT 66 SUPPORT VECTOR MACHINES BASICS wrtten by Vojsav Kecman Schoo of Engneerng The Unversty of Auckand Apr, 004 Vojsav Kecman Copyrght,

More information

Boundary Value Problems. Lecture Objectives. Ch. 27

Boundary Value Problems. Lecture Objectives. Ch. 27 Boundar Vaue Probes Ch. 7 Lecture Obectves o understand the dfference between an nta vaue and boundar vaue ODE o be abe to understand when and how to app the shootng ethod and FD ethod. o understand what

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009

A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009 A utoral on Data Reducton Lnear Dscrmnant Analss (LDA) hreen Elhaban and Al A Farag Unverst of Lousvlle, CVIP Lab eptember 009 Outlne LDA objectve Recall PCA No LDA LDA o Classes Counter eample LDA C Classes

More information

Recap: the SVM problem

Recap: the SVM problem Machne Learnng 0-70/5-78 78 Fall 0 Advanced topcs n Ma-Margn Margn Learnng Erc Xng Lecture 0 Noveber 0 Erc Xng @ CMU 006-00 Recap: the SVM proble We solve the follong constraned opt proble: a s.t. J 0

More information

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng robablstc Classfer Gven an nstance, hat does a probablstc classfer do dfferentl compared to, sa, perceptron? It does not drectl predct Instead,

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

[WAVES] 1. Waves and wave forces. Definition of waves

[WAVES] 1. Waves and wave forces. Definition of waves 1. Waves and forces Defnton of s In the smuatons on ong-crested s are consdered. The drecton of these s (μ) s defned as sketched beow n the goba co-ordnate sstem: North West East South The eevaton can

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Solutions to Homework 7, Mathematics 1. 1 x. (arccos x) (arccos x) 1

Solutions to Homework 7, Mathematics 1. 1 x. (arccos x) (arccos x) 1 Solutons to Homework 7, Mathematcs 1 Problem 1: a Prove that arccos 1 1 for 1, 1. b* Startng from the defnton of the dervatve, prove that arccos + 1, arccos 1. Hnt: For arccos arccos π + 1, the defnton

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

LECTURE 21 Mohr s Method for Calculation of General Displacements. 1 The Reciprocal Theorem

LECTURE 21 Mohr s Method for Calculation of General Displacements. 1 The Reciprocal Theorem V. DEMENKO MECHANICS OF MATERIALS 05 LECTURE Mohr s Method for Cacuaton of Genera Dspacements The Recproca Theorem The recproca theorem s one of the genera theorems of strength of materas. It foows drect

More information

Hopfield Training Rules 1 N

Hopfield Training Rules 1 N Hopfeld Tranng Rules To memorse a sngle pattern Suppose e set the eghts thus - = p p here, s the eght beteen nodes & s the number of nodes n the netor p s the value requred for the -th node What ll the

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

CHAPTER-5 INFORMATION MEASURE OF FUZZY MATRIX AND FUZZY BINARY RELATION

CHAPTER-5 INFORMATION MEASURE OF FUZZY MATRIX AND FUZZY BINARY RELATION CAPTER- INFORMATION MEASURE OF FUZZY MATRI AN FUZZY BINARY RELATION Introducton The basc concept of the fuzz matr theor s ver smple and can be appled to socal and natural stuatons A branch of fuzz matr

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng 2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

A finite difference method for heat equation in the unbounded domain

A finite difference method for heat equation in the unbounded domain Internatona Conerence on Advanced ectronc Scence and Technoogy (AST 6) A nte derence method or heat equaton n the unbounded doman a Quan Zheng and Xn Zhao Coege o Scence North Chna nversty o Technoogy

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning Advanced Introducton to Machne Learnng 10715, Fall 2014 The Kernel Trck, Reproducng Kernel Hlbert Space, and the Representer Theorem Erc Xng Lecture 6, September 24, 2014 Readng: Erc Xng @ CMU, 2014 1

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Strain Energy in Linear Elastic Solids

Strain Energy in Linear Elastic Solids Duke Unverst Department of Cv and Envronmenta Engneerng CEE 41L. Matr Structura Anass Fa, Henr P. Gavn Stran Energ n Lnear Eastc Sods Consder a force, F, apped gradua to a structure. Let D be the resutng

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0 MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

Chapter 6. Rotations and Tensors

Chapter 6. Rotations and Tensors Vector Spaces n Physcs 8/6/5 Chapter 6. Rotatons and ensors here s a speca knd of near transformaton whch s used to transforms coordnates from one set of axes to another set of axes (wth the same orgn).

More information

Machine Learning. What is a good Decision Boundary? Support Vector Machines

Machine Learning. What is a good Decision Boundary? Support Vector Machines Machne Learnng 0-70/5 70/5-78 78 Sprng 200 Support Vector Machnes Erc Xng Lecture 7 March 5 200 Readng: Chap. 6&7 C.B book and lsted papers Erc Xng @ CMU 2006-200 What s a good Decson Boundar? Consder

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Intro to Visual Recognition

Intro to Visual Recognition CS 2770: Computer Vson Intro to Vsual Recognton Prof. Adrana Kovashka Unversty of Pttsburgh February 13, 2018 Plan for today What s recognton? a.k.a. classfcaton, categorzaton Support vector machnes Separable

More information

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis Resource Allocaton and Decson Analss (ECON 800) Sprng 04 Foundatons of Regresson Analss Readng: Regresson Analss (ECON 800 Coursepak, Page 3) Defntons and Concepts: Regresson Analss statstcal technques

More information

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory Proceedngs of the 2009 IEEE Internatona Conference on Systems Man and Cybernetcs San Antono TX USA - October 2009 Mutspectra Remote Sensng Image Cassfcaton Agorthm Based on Rough Set Theory Yng Wang Xaoyun

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

On the Power Function of the Likelihood Ratio Test for MANOVA

On the Power Function of the Likelihood Ratio Test for MANOVA Journa of Mutvarate Anayss 8, 416 41 (00) do:10.1006/jmva.001.036 On the Power Functon of the Lkehood Rato Test for MANOVA Dua Kumar Bhaumk Unversty of South Aabama and Unversty of Inos at Chcago and Sanat

More information

Optimization. Nuno Vasconcelos ECE Department, UCSD

Optimization. Nuno Vasconcelos ECE Department, UCSD Optmzaton Nuno Vasconcelos ECE Department, UCSD Optmzaton many engneerng problems bol on to optmzaton goal: n mamum or mnmum o a uncton Denton: gven unctons, g,,...,k an h,,...m ene on some oman Ω R n

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

2.3 Nilpotent endomorphisms

2.3 Nilpotent endomorphisms s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

SVMs for regression Non-parametric/instance based classification method

SVMs for regression Non-parametric/instance based classification method S 75 Mchne ernng ecture Mos Huskrecht mos@cs.ptt.edu 539 Sennott Squre SVMs for regresson Non-prmetrc/nstnce sed cssfcton method S 75 Mchne ernng Soft-mrgn SVM Aos some fet on crossng the seprtng hperpne

More information

Duality in linear programming

Duality in linear programming MPRA Munch Personal RePEc Archve Dualty n lnear programmng Mhaela Albc and Dela Teselos and Raluca Prundeanu and Ionela Popa Unversty Constantn Brancoveanu Ramncu Valcea 7 January 00 Onlne at http://mpraubun-muenchende/986/

More information

Lower Bounding Procedures for the Single Allocation Hub Location Problem

Lower Bounding Procedures for the Single Allocation Hub Location Problem Lower Boundng Procedures for the Snge Aocaton Hub Locaton Probem Borzou Rostam 1,2 Chrstoph Buchhem 1,4 Fautät für Mathemat, TU Dortmund, Germany J. Faban Meer 1,3 Uwe Causen 1 Insttute of Transport Logstcs,

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

Affine transformations and convexity

Affine transformations and convexity Affne transformatons and convexty The purpose of ths document s to prove some basc propertes of affne transformatons nvolvng convex sets. Here are a few onlne references for background nformaton: http://math.ucr.edu/

More information

Adaptive and Iterative Least Squares Support Vector Regression Based on Quadratic Renyi Entropy

Adaptive and Iterative Least Squares Support Vector Regression Based on Quadratic Renyi Entropy daptve and Iteratve Least Squares Support Vector Regresson Based on Quadratc Ren Entrop Jngqng Jang, Chu Song, Haan Zhao, Chunguo u,3 and Yanchun Lang Coege of Mathematcs and Computer Scence, Inner Mongoa

More information

NP-Completeness : Proofs

NP-Completeness : Proofs NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1

The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1 Abstract The Entre Souton Path for Support Vector Machne n Postve and Unabeed Cassfcaton 1 Yao Lmn, Tang Je, and L Juanz Department of Computer Scence, Tsnghua Unversty 1-308, FIT, Tsnghua Unversty, Bejng,

More information

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9 Chapter 9 Correlaton and Regresson 9. Correlaton Correlaton A correlaton s a relatonshp between two varables. The data can be represented b the ordered pars (, ) where s the ndependent (or eplanator) varable,

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

n-step cycle inequalities: facets for continuous n-mixing set and strong cuts for multi-module capacitated lot-sizing problem

n-step cycle inequalities: facets for continuous n-mixing set and strong cuts for multi-module capacitated lot-sizing problem n-step cyce nequates: facets for contnuous n-mxng set and strong cuts for mut-modue capactated ot-szng probem Mansh Bansa and Kavash Kanfar Department of Industra and Systems Engneerng, Texas A&M Unversty,

More information

Classification learning II

Classification learning II Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon

More information

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity Week3, Chapter 4 Moton n Two Dmensons Lecture Quz A partcle confned to moton along the x axs moves wth constant acceleraton from x =.0 m to x = 8.0 m durng a 1-s tme nterval. The velocty of the partcle

More information

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

COMPLEX NUMBERS AND QUADRATIC EQUATIONS COMPLEX NUMBERS AND QUADRATIC EQUATIONS INTRODUCTION We know that x 0 for all x R e the square of a real number (whether postve, negatve or ero) s non-negatve Hence the equatons x, x, x + 7 0 etc are not

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting. The Practce of Statstcs, nd ed. Chapter 14 Inference for Regresson Introducton In chapter 3 we used a least-squares regresson lne (LSRL) to represent a lnear relatonshp etween two quanttatve explanator

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Generative classification models

Generative classification models CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn

More information