Support Vector Machines

Size: px

Start display at page:

Download "Support Vector Machines"

Willis Conley
5 years ago
Views:

1 CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016

2 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at the latest on Monday Project proposal due tonght on CourseWeb How many of you want me to prnt handouts for net tme?

3 Plan for today Lnear Support Vector Machnes Non-lnear SVMs and the kernel trck Soft-margn SVMs Eample use of SVMs Advanced topcs (very brefly) Structured SVMs Latent varables How to solve the SVM problem (net class)

4 Lnes n R 2 Let w a c y a cy b 0 Krsten Grauman

5 Lnes n R 2 Let w a c y w a cy b 0 w b 0 Krsten Grauman

6 0, y 0 Lnes n R 2 Let w a c y w a cy b 0 w b 0 Krsten Grauman

7 0, y 0 D Lnes n R 2 Let w a c y w a cy b 0 w b 0 D Krsten Grauman a a 2 cy c 2 b w b w 0 0 dstance from pont to lne

8 0, y 0 D Lnes n R 2 Let w a c y w a cy b 0 w b 0 D Krsten Grauman a0 cy0 b w b 2 2 a c w dstance from pont to lne

9 Lnear classfers Fnd lnear functon to separate postve and negatve eamples postve negatve : : w w b b 0 0 Whch lne s best? C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

10 Support vector machnes Dscrmnatve classfer based on optmal separatng lne (for 2d case) Mamze the margn between the postve and negatve tranng eamples C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

11 Support vector machnes Want lne that mamzes the margn. postve negatve ( y ( y 1) : 1) : w b 1 w b 1 For support, vectors, w b 1 Support vectors Margn C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

12 Support vector machnes Want lne that mamzes the margn. postve negatve ( y ( y 1) : 1) : w b 1 w b 1 Support vectors Margn For support, vectors, w b 1 w b Dstance between pont and lne: w For support vectors: Τ w b M w w w w w C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

13 Support vector machnes Want lne that mamzes the margn. postve negatve ( y ( y 1) : 1) : w b 1 w b 1 Support vectors Margn For support, vectors, w b 1 w b Dstance between pont and lne: w Therefore, the margn s 2 / w C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

14 Fndng the mamum margn lne 1. Mamze margn 2/ w 2. Correctly classfy all tranng data ponts: postve ( y negatve ( y 1) : 1) : w b 1 w b 1 Quadratc optmzaton problem: Mnmze 1 2 w T w Subject to y (w +b) 1 One constrant for each tranng pont. Note sgn trck. C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

15 Fndng the mamum margn lne Soluton: w y Learned weght Support vector C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

16 Fndng the mamum margn lne Soluton: w y b = y w Classfcaton functon: f ( ) sgn ( w b) sgn y MORE DETAILS NEXT TIME (for any support vector) Notce that t reles on an nner product between the test pont and the support vectors (Solvng the optmzaton problem also nvolves computng the nner products j between all pars of tranng ponts) b If f() < 0, classfy as negatve, otherwse classfy as postve. C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

17 Inner product f ( ) sgn sgn ( w b) y b Adapted from Mlos Hauskrecht

18 Plan for today Lnear Support Vector Machnes Non-lnear SVMs and the kernel trck Soft-margn SVMs Eample use of SVMs Advanced topcs (very brefly) Structured SVMs Latent varables How to solve the SVM problem (net class)

19 Nonlnear SVMs Datasets that are lnearly separable work out great: 0 But what f the dataset s just too hard? 0 We can map t to a hgher-dmensonal space: 2 Andrew Moore 0

20 Nonlnear SVMs General dea: the orgnal nput space can always be mapped to some hgher-dmensonal feature space where the tranng set s separable: Φ: φ() Andrew Moore

21 Nonlnear kernel: Eample Consder the mappng ), ( ) ( ), ( ), ( ), ( ) ( ) ( y y y K y y y y y 2 Svetlana Lazebnk

22 The Kernel Trck The lnear classfer reles on dot product between vectors K(, j ) = j If every data pont s mapped nto hgh-dmensonal space va some transformaton Φ: φ( ), the dot product becomes: K(, j ) = φ( ) φ( j ) A kernel functon s smlarty functon that corresponds to an nner product n some epanded feature space The kernel trck: nstead of eplctly computng the lftng transformaton φ(), defne a kernel functon K such that: K(, j ) = φ( ) φ( j ) Andrew Moore

23 Eamples of kernel functons Lnear: K(, j ) T j Polynomals of degree up to d: Gaussan RBF: 2 j K(, j ) ep( ) 2 2 Hstogram ntersecton: K (, j ) mn( ( k), j ( k)) k Andrew Moore / Carlos Guestrn K(, j ) = ( T j + 1) d

24 (C) Dhruv Batra 24 Slde Credt: Blaschko & Lampert

25 (C) Dhruv Batra 25 Slde Credt: Blaschko & Lampert

26 Plan for today Lnear Support Vector Machnes Non-lnear SVMs and the kernel trck Soft-margn SVMs Eample use of SVMs Advanced topcs (very brefly) Structured SVMs Latent varables How to solve the SVM problem (net class)

27 Assumng data separable The w that mnmzes Mamze margn

28 Allowng msclassfcatons Msclassfcaton cost # data samples Slack varable The w that mnmzes Mamze margn Mnmze msclassfcaton BOARD

29 What about mult-class SVMs? In practce, we obtan a mult-class SVM by combnng two-class SVMs One vs. others Tranng: learn an SVM for each class vs. the others Testng: apply each SVM to the test eample, and assgn t to the class of the SVM that returns the hghest decson value One vs. one Tranng: learn an SVM for each par of classes Testng: each learned SVM votes for a class to assgn to the test eample There are also natvely mult-class formulatons Crammer and Snger, JMLR 2001 Svetlana Lazebnk / Carlos Guestrn

30 SVMs for recognton 1. Defne your representaton for each eample. 2. Select a kernel functon. 3. Compute parwse kernel values between labeled eamples 4. Use ths kernel matr to solve for SVM support vectors & weghts. 5. To classfy a new eample: compute kernel values between new nput and support vectors, apply weghts, check sgn of output. Krsten Grauman

31 Eample: learnng gender wth SVMs Moghaddam and Yang, Learnng Gender wth Support Faces, TPAMI Moghaddam and Yang, Face & Gesture Krsten Grauman

32 Support Faces Moghaddam and Yang, Learnng Gender wth Support Faces, TPAMI 2002.

33 Human vs. Machne SVMs performed better than any sngle human test subject, at ether resoluton Krsten Grauman

34 Plan for today Lnear Support Vector Machnes Non-lnear SVMs and the kernel trck Soft-margn SVMs Eample use of SVMs Advanced topcs (very brefly) Structured SVMs Latent varables How to solve the SVM problem (net class)

35 Structured SVMs y s a vector Tsochantards et al., Large Margn Methods for Structured and Interdependent Output Varables, JMLR 2005.

36 Adapted from S. Nowozn and C. Lampert Latent Varables

39 Pros SVMs: Pros and cons Kernel-based framework s very powerful, fleble Often a sparse set of support vectors compact at test tme Work very well n practce, even wth very small tranng sample szes Soluton can be formulated as a quadratc program (net tme) Many publcly avalable SVM packages: e.g. LIBSVM, LIBLINEAR, SVMLght (or use bult-n Matlab verson but slower) Cons Can be trcky to select best kernel functon for a problem Computaton, memory At tranng tme, must compute kernel values for all eample pars Learnng can take a very long tme for large-scale problems Adapted from Lana Lazebnk

51 Sequental Mnmal Optmzaton Convergence: all α s satsfy Karush-Kuhn-Tucker (KKT) condtons used to determne f at optmal soluton Repeat untl convergence: Pck α that volates the condtons Pck another α j Recompute new values for α and α j Proposed by John Platt n 1998: Fast Tranng of Support Vector Machnes usng Sequental Mnmal Optmzaton Further readng:

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research