Geometrical intuition behind the dual problem

Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1

Geoetrical intuition behind the dual proble x 2 Convex hulls for two sets of points A and B A B x 1 2

Geoetrical intuition behind the dual proble x 2 Convex hulls for two sets of points A and B defined by all possible points P A and P B A P A B P B P A = i A x i P B = i B x i x 1 i A 0, =1 i A i B 0, =1 i B

Geoetrical intuition behind the dual proble x 2 For the plane separating A and B we choose the bisecting line between specific P A and P B that have inial distance A P A P B 2 B PB P A Here, for P A, a single coeff. is non-zero And for P B, two coeffs. are non-zero P A = i A x i P B = i B x i x 1 i A 0, =1 i A i B 0, =1 4 i B

Geoetrical intuition behind the dual proble x 2 in P A P B 2 = in A x i j B j x j 2 B A = in A y i x i j B j y j x j 2 +1-1 PB P A = in, j j y i y j x i. x j P A = i A x i P B = i B x i x 1 i A 0, =1 i A i B 0, =1 5 i B

Going fro the Prial to the Dual Constrained Optiization Proble 1 in w 2 w 2 s.t.: y i w. x i b 1, label input i=1,, A convex optiization proble (objective and constraints) Unique solution if datapoints are linearly separable 6

Going fro the Prial to the Dual Constrained Optiization Proble Lagrange: L w,b, = 1 2 w 2 i=1 1 in w 2 w 2 s.t.: y i w. x i b 1, label input [ y i w. x i b 1] i=1,, 7

Going fro the Prial to the Dual Constrained Optiization Proble Lagrange: KKT conditions: 1 in w 2 w 2 s.t.: y i w. x i b 1, label input L w,b, = 1 2 w 2 [ y i w. x i b 1] w L w,b, = w i i=1 y i x i = 0 b L w,b, = i i=1 y i = 0 i=1 i=1,, 8

Going fro the Prial to the Dual Constrained Optiization Proble Lagrange: L w,b, = 1 in w 2 w 2 s.t.: y i w. x i b 1, label L w,b, = 1 2 w 2 i=1 1 2 i i=1 y i x i 2 i, j=1 i=1 i y i b i=1 i input [ y i w. x i b 1] j y i y j x i. x j w = i=1 i=1,, i y i x i i i=1 y i = 0 L w,b, = 1 2 i, j=1 j y i y j x i. x j i=1 i 10

Going fro the Prial to the Dual Constrained Optiization Proble 1 in w 2 w 2 s.t.: y i w. x i b 1, label input i=1,, Equivalent Dual Proble: ax s.t.: 0, i=1 { 1 2 i, j=1 y i =0 i=1,..., j y i y j x i. x j i=1 } 11

Solution to the Dual Proble The Dual Proble below adits the following solution: sign h x = sign i=1 i y i x i. x b b = y i j y j x j. x i, i=1,..., j=1 Equivalent Dual Proble: ax s.t.: 0, i=1 { 1 2 i, j=1 y i =0 i=1,..., j y i y j x i. x j i=1 } 12

What are Support Vector Machines? Linear classifiers (Mostly) binary classifiers Supervised training Good generalization with explicit bounds 13

Main Ideas Behind Support Vector Machines Maxial argin Dual space Linear classifiers in high-diensional space using non-linear apping Kernel trick 14

Quadratic Prograing 15

Using the Lagrangian 16

Dual Space 17

Strong Duality 18

Dual For 19

Non-linear separation of datasets Non-linear separation is ipossible in ost probles Illustration fro Prof. Mohri's lecture notes 20

Non-separable datasets Solutions: 1) Nonlinear classifiers 2) Increase diensionality of dataset and add a non-linear apping Ф 21

Kernel Trick siilarity easure between 2 data saples 22

Kernel Trick Illustrated 23

Curse of Diensionality Due to the Non-Linear Mapping 24

Positive Sei-Definite (P.S.D.) Kernels (Mercer Condition) 25

Advantages of SVM Work very well... Error bounds easy to obtain: Generalization error sall and predictable Fool-proof ethod: (Mostly) three kernels to choose fro: Gaussian Linear and Polynoial Sigoid Very sall nuber of paraeters to optiize 26

Liitations of SVM Size liitation: Size of kernel atrix is quadratic with the nuber of training vectors Speed liitations: 1) During training: very large quadratic prograing proble solved nuerically Solutions: Chunking Sequential Minial Optiization (SMO) breaks QP proble into any sall QP probles solved analytically Hardware ipleentations 2) During testing: nuber of support vectors Solution: Online SVM 27