Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1
Geoetrical intuition behind the dual proble x 2 Convex hulls for two sets of points A and B A B x 1 2
Geoetrical intuition behind the dual proble x 2 Convex hulls for two sets of points A and B defined by all possible points P A and P B A P A B P B P A = i A x i P B = i B x i x 1 i A 0, =1 i A i B 0, =1 i B
Geoetrical intuition behind the dual proble x 2 For the plane separating A and B we choose the bisecting line between specific P A and P B that have inial distance A P A P B 2 B PB P A Here, for P A, a single coeff. is non-zero And for P B, two coeffs. are non-zero P A = i A x i P B = i B x i x 1 i A 0, =1 i A i B 0, =1 4 i B
Geoetrical intuition behind the dual proble x 2 in P A P B 2 = in A x i j B j x j 2 B A = in A y i x i j B j y j x j 2 +1-1 PB P A = in, j j y i y j x i. x j P A = i A x i P B = i B x i x 1 i A 0, =1 i A i B 0, =1 5 i B
Going fro the Prial to the Dual Constrained Optiization Proble 1 in w 2 w 2 s.t.: y i w. x i b 1, label input i=1,, A convex optiization proble (objective and constraints) Unique solution if datapoints are linearly separable 6
Going fro the Prial to the Dual Constrained Optiization Proble Lagrange: L w,b, = 1 2 w 2 i=1 1 in w 2 w 2 s.t.: y i w. x i b 1, label input [ y i w. x i b 1] i=1,, 7
Going fro the Prial to the Dual Constrained Optiization Proble Lagrange: KKT conditions: 1 in w 2 w 2 s.t.: y i w. x i b 1, label input L w,b, = 1 2 w 2 [ y i w. x i b 1] w L w,b, = w i i=1 y i x i = 0 b L w,b, = i i=1 y i = 0 i=1 i=1,, 8
Going fro the Prial to the Dual Constrained Optiization Proble Lagrange: KKT conditions: 1 in w 2 w 2 s.t.: y i w. x i b 1, label input L w,b, = 1 2 w 2 [ y i w. x i b 1] w L w,b, = w i i=1 y i x i = 0 b L w,b, = i i=1 y i = 0 w = i=1 i y i x i i=1 i=1,, Plus KKT: i i=1 y i = 0 [ y i w. x i b 1]=0, i=1,..., 9
Going fro the Prial to the Dual Constrained Optiization Proble Lagrange: L w,b, = 1 in w 2 w 2 s.t.: y i w. x i b 1, label L w,b, = 1 2 w 2 i=1 1 2 i i=1 y i x i 2 i, j=1 i=1 i y i b i=1 i input [ y i w. x i b 1] j y i y j x i. x j w = i=1 i=1,, i y i x i i i=1 y i = 0 L w,b, = 1 2 i, j=1 j y i y j x i. x j i=1 i 10
Going fro the Prial to the Dual Constrained Optiization Proble 1 in w 2 w 2 s.t.: y i w. x i b 1, label input i=1,, Equivalent Dual Proble: ax s.t.: 0, i=1 { 1 2 i, j=1 y i =0 i=1,..., j y i y j x i. x j i=1 } 11
Solution to the Dual Proble The Dual Proble below adits the following solution: sign h x = sign i=1 i y i x i. x b b = y i j y j x j. x i, i=1,..., j=1 Equivalent Dual Proble: ax s.t.: 0, i=1 { 1 2 i, j=1 y i =0 i=1,..., j y i y j x i. x j i=1 } 12
What are Support Vector Machines? Linear classifiers (Mostly) binary classifiers Supervised training Good generalization with explicit bounds 13
Main Ideas Behind Support Vector Machines Maxial argin Dual space Linear classifiers in high-diensional space using non-linear apping Kernel trick 14
Quadratic Prograing 15
Using the Lagrangian 16
Dual Space 17
Strong Duality 18
Dual For 19
Non-linear separation of datasets Non-linear separation is ipossible in ost probles Illustration fro Prof. Mohri's lecture notes 20
Non-separable datasets Solutions: 1) Nonlinear classifiers 2) Increase diensionality of dataset and add a non-linear apping Ф 21
Kernel Trick siilarity easure between 2 data saples 22
Kernel Trick Illustrated 23
Curse of Diensionality Due to the Non-Linear Mapping 24
Positive Sei-Definite (P.S.D.) Kernels (Mercer Condition) 25
Advantages of SVM Work very well... Error bounds easy to obtain: Generalization error sall and predictable Fool-proof ethod: (Mostly) three kernels to choose fro: Gaussian Linear and Polynoial Sigoid Very sall nuber of paraeters to optiize 26
Liitations of SVM Size liitation: Size of kernel atrix is quadratic with the nuber of training vectors Speed liitations: 1) During training: very large quadratic prograing proble solved nuerically Solutions: Chunking Sequential Minial Optiization (SMO) breaks QP proble into any sall QP probles solved analytically Hardware ipleentations 2) During testing: nuber of support vectors Solution: Online SVM 27