Support Vector Machines

Size: px

Start display at page:

Download "Support Vector Machines"

Rosalyn Wood
5 years ago
Views:

1 Support Vector Machnes Konstantn Tretyakov MTAT Machne Learnng

2 So far

3 So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng

4 So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear models Neural networks, Decson trees, Assocaton rules Unsupervsed machne learnng Clusterng/EM, PCA Generc scaffoldng Probablstc modelng, ML/MAP estmaton Performance evaluaton, Statstcal learnng theory Lnear algebra, Optmzaton methods

5 Comng up next Supervsed machne learnng Lnear models Least squares regresson, SVM Fsher s dscrmnant, Perceptron, Logstc regresson, SVM Non-lnear models Neural networks, Decson trees, Assocaton rules SVM, Kernel-XXX Unsupervsed machne learnng Clusterng/EM, PCA, Kernel-XXX Generc scaffoldng Probablstc modelng, ML/MAP estmaton Performance evaluaton, Statstcal learnng theory Lnear algebra, Optmzaton methods Kernels

6 Frst thngs frst SVM: lbrary('e1071') (y { 1,1}) m = svm(x, y, kernel='lnear') predct(m, newx)

7 Quz Ths lne s called Ths vector s Those lnes are f x =? x 1 =? y 1 =? Functonal margn of x 1? Geometrc margn of x 1? Dstance to orgn?

8 Quz Separatng hyperplane Normal w Isolnes (level lnes) f x = w T x + b x 1 = (2, 6); y 1 = 1 y 1 f x 1 2 f(x 1 )/ w 3 2 d = b/ w

9 Quz Suppose we scale w and b by some constant. Wll t: Affect the separatng hyperplane? How? Affect the functonal margns? How? Affect the geometrc margns? How?

10 Quz Example: w 2w, b = 0

11 Quz Suppose we scale w and b by some constant. Wll t: Affect the separatng hyperplane? How? No: w T x + b = 0 2w T x + 2b = 0 Affect the functonal margns? How? Yes: 2w T x + 2b y = 2 w T x + b y Affect the geometrc margns? How? No: 2w T x+2b 2w = wt x+b w

12 Whch classfer s best?

13 Maxmal margn classfer

14 Why maxmal margn? Well-defned, sngle stable soluton Nose-tolerant Small parameterzaton (Farly) effcent algorthms exst for fndng t

15 Maxmal margn: Separable case f x = 1 f x = 1

16 Maxmal margn: Separable case f x = 1 f x y 1 f x = 1

17 Maxmal margn: Separable case f x = 1 The (geometrc) dstance to the solne f x = 1 s: f x = 1

18 Maxmal margn: Separable case f x = 1 The (geometrc) dstance to the solne f x = 1 s: f x d = w = 1 w f x = 1

19 Maxmal margn: Separable case Among all lnear classfers (w, b) whch keep all ponts at functonal margn of 1 or more, we shall look for the one whch has the largest dstance d to the correspondng solnes,.e. the largest geometrc margn. As d = 1, ths s equvalent to fndng the classfer w wth mnmal w. whch s equvalent to fndng the classfer wth mnmal w 2

24 Compare Generc lnear classfcaton (separable case): Fnd (w, b), such that all ponts are classfed correctly.e. f x y > 0 Maxmal margn classfcaton (separable case): Fnd (w, b), such that all ponts are classfed correctly wth a fxed functonal margn.e. f x y > 1 and w 2 s mnmal.

25 Remember SVM optmzaton problem (separable case): mn w,b 1 2 w 2 so that w T x + b y 1

26 General case ( soft margn ) The same, but we also penalze all margn volatons. SVM optmzaton problem: mn w,b 1 2 w 2 + C where ξ = 1 f x y + ξ ξ = 1 f x y +

27 General case ( soft margn ) The same, but we also penalze all margn volatons. SVM optmzaton problem: ξ = 1 f x y + mn w,b 1 2 w 2 + C 1 f x y +

28 General case ( soft margn ) The same, but we also penalze all margn volatons. SVM optmzaton problem: ξ = 1 f x y + mn w,b 1 2 w 2 + C 1 m +

29 General case ( soft margn ) The same, but we also penalze all margn volatons. mn w,b SVM optmzaton problem: 1 2 w 2 + C hnge(m ) where hnge m = 1 m + ξ = 1 f x y +

30 Hnge loss hnge m = 1 m +

31 Classfcaton loss functons Generc classfcaton: mn w,b [m < 0]

32 Classfcaton loss functons Perceptron:

33 Classfcaton loss functons Perceptron: mn w,b ( m ) +

34 Classfcaton loss functons Least squares classfcaton*: mn w,b m 1 2

35 Classfcaton loss functons Boostng: mn w,b exp( m )

36 Classfcaton loss functons Logstc regresson: mn w,b log(1 + e m )

37 Classfcaton loss functons Regularzed logstc regresson: mn w,b log(1 + e m ) +λ 1 2 w 2

38 Classfcaton loss functons SVM: mn w,b 1 m C w 2

39 Classfcaton loss functons L2-SVM: mn w,b 1 m C w 2

40 Classfcaton loss functons L1-regularzed L2-SVM: mn w,b 1 m C w etc

41 In general mn w,b φ(m ) + λ Ω(w) Model ft Model complexty

42 Compare to MAP estmaton max Model log P(x Model) + log P(Model) Lkelhood Model pror

43 Compare to MAP estmaton max Model log P(Data Model) + log P(Model) Lkelhood Model pror

44 Solvng the SVM mn w,b 1 2 w 2 + C 1 f x y +

45 Solvng the SVM such that mn w,b,ξ 1 2 w 2 + C f x y 1 ξ ξ 0 ξ

46 Solvng the SVM such that mn w,b,ξ 1 2 w 2 + C ξ f x y 1 ξ 0 ξ 0

47 Solvng the SVM such that mn w,b,ξ 1 2 w 2 + C ξ f x y 1 ξ 0 ξ 0 Quadratc functon wth lnear constrants!

48 Solvng the SVM such that Mnmze mn w,b subject to: 1 2 w 2 + C ξ Quadratc programmng ff xx = y 1 1 ξ 0 ξ2 xt Qx + c T x 0 Ax b Cx = d Quadratc functon wth lnear constrants!

49 Solvng the SVM such that Mnmze mn w,b subject to: 1 2 w 2 + C ξ Quadratc programmng f fx x y= 1 1 ξ 0 ξ2 xt Qx + c T x 0 Ax b Cx = d Quadratc functon wth lnear constrants! > lbrary(quadprog) > solve.qp(q, -c, A, b, neq)

50 A popular trck n optmzaton: s equvalent to: mn x f(x), s. t. g x 0 mn x max α 0 f x αg x

51 Solvng the SVM: Dual mn w,b,ξ 1 2 w 2 + C ξ such that f x y 1 ξ 0, ξ 0

52 Solvng the SVM: Dual mn w,b,ξ 1 2 w 2 + C ξ such that f x y 1 ξ 0, ξ 0 Is equvalent to: 1 2 w 2 + C ξ mn w,b,ξ max α 0,β 0 α (f x y 1 ξ ) β ξ

53 Solvng the SVM: Dual mn w,b,ξ 1 2 w 2 + C ξ such that f x y 1 ξ 0, ξ 0 Is equvalent to: 1 2 w 2 + C ξ mn w,b,ξ max α 0,β 0 α (f x y 1 ξ ) β ξ

54 Solvng the SVM: Dual mn w,b,ξ 1 2 w 2 + C ξ such that f x y 1 ξ 0, ξ 0 Is equvalent to: mn w,b,ξ max α 0,β w 2 + ξ C α β α f x y 1

55 Solvng the SVM: Dual mn w,b,ξ 1 2 w 2 + C ξ such that f x y 1 ξ 0, ξ 0 Is equvalent to: mn w,b,ξ max α 0,β w 2 + ξ C α β α f x y 1 C α β = 0

56 Solvng the SVM: Dual mn w,b,ξ 1 2 w 2 + C ξ such that f x y 1 ξ 0, ξ 0 Is equvalent to: mn w,b,ξ max α 0,β w 2 + ξ C α β α f x y 1 0 α C

57 Solvng the SVM: Dual mn w,b max α 1 2 w 2 α f x y 1 0 α C

58 Solvng the SVM: Dual mn w,b max α 1 2 w 2 α f x y 1 0 α C Sparsty: α s nonzero only for those ponts whch have f x y 1 < 0

59 Solvng the SVM: Dual mn w,b max α 1 2 w 2 α f x y 1 0 α C Now swap the mn and the max (can be done n partcular because everythng s nce and convex).

60 Solvng the SVM: Dual max α mn w,b 1 2 w 2 α f x y 1 0 α C Next solve the nner (unconstraned) mn as usual.

61 Solvng the SVM: Dual max α mn w,b 1 2 w 2 α f x y 1 0 α C Next solve the nner (unconstraned) mn as usual: w = w α y x = 0 b = α y = 0

62 Solvng the SVM: Dual max α mn w,b 1 2 w 2 α f x y 1 0 α C Express w and substtute: w = α y x α y = 0

63 Solvng the SVM: Dual max α mn w,b 1 2 w 2 α f x y 1 0 α C Express w and substtute: w = α y x Dual representaton α y = 0

64 Solvng the SVM: Dual max α mn w,b 1 2 w 2 α f x y 1 0 α C Express w and substtute: w = α y x α y = 0 Balance

65 Solvng the SVM: Dual max α mn w,b 1 2 w 2 α f x y 1 0 α C Express w and substtute: max α α 1 2,j α α j y y j x T x j 0 α C α y = 0

66 Solvng the SVM: Dual max α α 1 2,j α α j y y j x T x j 0 α C α y = 0

67 Solvng the SVM: Dual max α 1 T α 1 2 αt K Y α K j = x T x j, 0 α C y T α = 0 Y j = y y j

68 Solvng the SVM: Dual 1 mn α 2 αt K Y α 1 T α α 0 α C y T α = 0 Then fnd b from the condton*: f x y = 1 f 0 < α < C *see homework, t s actually not that easy!

69 Support vectors

70 Support vectors C C α y = α C 0

71 Sparsty The dual soluton s often very sparse, ths allows to perform optmzaton effcently Workng set approach.

72 Kernels f x = w T x + b w = α y x f x = α y x T x + b f x = α y K(x, x) + b

73 Kernels f x = w T x + b w = α y x Kernel functon f x = α y x T x + b f x = α y K(x, x) + b

74 Kernels f x = w T x + b f x = w 1 x + w 2 x 2 + b w = α y x f x = α y x T x + b f x = α y K(x, x) + b f x = α y exp( x x 2 ) + b

75 Quz SVM s a lnear classfer. Margn maxmzaton can be acheved va mnmzaton of. SVM uses loss and regularzaton. Besdes hnge loss I also know loss and loss. SVM n both prmal and dual form s solved usng programmng.

76 Quz In prmal formulaton we solve for parameter vector. In dual formulaton we solve for nstead. form of SVM s typcally sparse. Support vectors are those tranng ponts for whch. The relaton between prmal and dual varables s: =. A Kernel s a generalzaton of product.

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear