Recap: the SVM problem

Size: px

Start display at page:

Download "Recap: the SVM problem"

Victor Jacobs
5 years ago
Views:

1 Machne Learnng 0-70/ Fall 0 Advanced topcs n Ma-Margn Margn Learnng Erc Xng Lecture 0 Noveber 0 Erc CMU Recap: the SVM proble We solve the follong constraned opt proble: a s.t. J 0 y 0. K y y hs s a quadratc prograng g proble. A global au of can alays be found. he soluton: Ho to predct: y Erc CMU

2 Non-lnearly Separable Probles s.t y + b Class Class We allo error ξ n classfcaton; t s based on the output of the dscrnant functon +b ξ approates the nuber of sclassfed saples Erc CMU Soft Margn Hyperplane No e have a slghtly dfferent opt proble: n b s.t y + b ξ ξ 0 + C ξ ξ are slack varables n optzaton Note that ξ 0 f there s no error for ξ s an upper bound of the nuber of errors C : tradeoff paraeter beteen error and argn Erc CMU

3 Lagrangan Dualty cont. Recall the Pral Proble: he Dual Proble: n a β heore eak dualty: L β β 0 a β 0 n L β d a β n n a 0 L β β 0 L β p heore strong dualty: Iff there est a saddle pont of L β e have d p Erc CMU A sketch of strong and eak dualty No gnorng h for splcty let's look at hat's happenng graphcally n the dualty theores. d a n f + g n a f + g 0 0 f p g Erc CMU

4 A sketch of strong and eak dualty No gnorng h for splcty let's look at hat's happenng graphcally n the dualty theores. d a n f + g n a f + g 0 0 f p g Erc CMU A sketch of strong and eak dualty No gnorng h for splcty let's look at hat's happenng graphcally n the dualty theores. d a n f + g n a f + g 0 0 f p g Erc CMU

5 5 he KK condtons If there ests soe saddle pont of L then the saddle pont satsfes the follong "Karush-Kuhn-ucker" KK g condtons: g l k K K K β β β L L heore: If and β satsfy the KK condton then t s also a soluton to the pral and the dual probles. g g K K K 9 Erc CMU he Optzaton Proble he dual of ths ne constraned optzaton proble s y y a J 0. 0 s.t. y C K hs s very slar to the optzaton proble n the lnear separable case ecept that there s an upper bound C on no Once agan a QP solver can be used to fnd 0 Erc CMU

6 he SMO algorth Consder solvng the unconstraned opt proble: We ve already see three opt algorths! Coordnate ascent Gradent ascent Neton-Raphson Coordnate ascend: Erc CMU Coordnate ascend Erc CMU

7 Sequental nal optzaton Constraned optzaton: a s.t. J 0 C y 0. K y y Queston: can e do coordnate along one drecton at a te.e. hold all [-] fed and update? Erc CMU he SMO algorth Repeat tll convergence. Select soe par and to update net usng a heurstc that tres to pck the to that ll allo us to ake the bggest progress toards the global au.. Re-optze J th respect to and hle holdng all the other k 's k ; fed. k ; Wll ths procedure converge? Erc CMU

8 Convergence of SMO a J y y KK: s.t. 0 C y 0. K k Let s hold 3 fed and reopt J.r.t. and Erc CMU Convergence of SMO he constrants: he obectve: Constraned opt: Erc CMU

9 Cross-valdaton error of SVM he leave-one-out cross-valdaton error does not depend on the densonalty of the feature space but only on the # of support vectors! # support vectors Leave - one - out CV error # of tranng eaples Erc CMU Advanced topcs n Ma-Margn Learnng a J y y Kernel Pont rule or average rule Can e predct vecy? Erc CMU

10 Outlne he Kernel trck Mau entropy dscrnaton Structured SVM aka Mau Margn Markov Netorks Erc CMU Non-lnear Decson Boundary So far e have only consdered large-argn classfer th a lnear decson boundary Ho to generalze t to becoe nonlnear? Key dea: transfor to a hgher densonal space to ake lfe easer Input space: the space the pont are located Feature space: the space of φ after transforaton Why transfor? Lnear operaton n the feature space s equvalent to non-lnear operaton n nput space Classfcaton can becoe easer th a proper transforaton. In the XOR proble for eaple addng a ne feature of ake the proble lnearly separable hoeork Erc CMU

11 he Kernel rck Is ths data lnearly-separable? Ho about a quadratc appng φ? Erc CMU he Kernel rck Recall the SVM optzaton proble a s.t. J 0 C y 0. K y y he data ponts only appear as nner product As long as e can calculate the nner product n the feature space e do not need the appng eplctly Many coon geoetrc operatons angles dstances can be epressed by nner products Defne the kernel functon K by K φ φ Erc CMU

12 II. he Kernel rck Coputaton depends on feature space Bad f ts denson s uch larger than nput space a s.t. 0 y 0. y y K K k Where K φ t φ y z sgn yk SV z + b Erc CMU ransforng the Data Input space φ. φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ Feature space Note: feature space s of hgher denson than the nput space n practce Coputaton n the feature space can be costly because t s hgh densonal he feature space s typcally nfnte-densonal! he kernel trck coes to rescue Erc CMU

13 An Eaple for feature appng and kernels Consder an nput [ ] Suppose φ. s gven as follos φ An nner product n the feature space s ' φ φ ' So f e defne the kernel functon as follos there s no need to carry out φ. eplctly ' K ' + Erc CMU More eaples of kernel functons Lnear kernel e've seen t K ' ' Polynoal kernel e ust sa an eaple ' p K ' + here p 3 o get the feature vectors e concatenate all pth order polynoal ters of the coponents of eghted approprately Radal bass kernel K ' ep ' In ths case the feature space conssts of functons and results n a nonparaetrc classfer. Erc CMU

14 he essence of kernel Feature appng but thout payng a cost E.g. polynoal kernel Ho any densons e ve got n the ne space? Ho any operatons t takes to copute K? Kernel desgn any prncple? Kz can be thought of as a slarty functon beteen and z hs ntuton can be ell reflected n the follong Gaussan functon Slarly one can easly coe up th other K n the sae sprt Is ths necessarly lead to a legal kernel? n the above partcular case K s a legal one do you kno ho any denson φ s? Erc CMU Kernel atr Suppose for no that K s ndeed a vald kernel correspondng to soe feature appng φ then for e can copute an atr here hs s called a kernel atr! No f a kernel functon s ndeed a vald kernel and ts eleents are dot-product n the transfored feature space t ust satsfy: Syetry KK proof Postve sedefnte proof? Erc CMU

15 Mercer kernel Erc CMU SVM eaples Erc CMU

Machine Learning. Support Vector Machines. Eric Xing. Lecture 4, August 12, Reading: Eric CMU,

Machine Learning. Support Vector Machines. Eric Xing. Lecture 4, August 12, Reading: Eric CMU, Machne Learnng Support Vector Machnes Erc Xng Lecture 4 August 2 200 Readng: Erc Xng @ CMU 2006-200 Erc Xng @ CMU 2006-200 2 What s a good Decson Boundar? Wh e a have such boundares? Irregular dstrbuton