UVA CS 4501: Machine Learning

Size: px

Start display at page:

Download "UVA CS 4501: Machine Learning"

Patience Norton
5 years ago
Views:

1 UVA CS 4501: Machine Learning Lecture 16 Extra: Support Vector Machine Optimization with Dual Dr. Yanjun Qi University of Virginia Department of Computer Science

2 Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization ü SVM Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 2

3 Optimization with Quadratic programming (QP) Quadratic programming solves optimization problems of the following form: min U u T Ru 2 + d T u + c subject to n inequality constraints: a 11 u 1 + a 12 u b 1!!! Quadratic term a n1 u 1 + a n 2 u b n and k equivalency constraints: a n +1,1 u 1 + a n +1,2 u = b n +1!!! a n +k,1 u 1 + a n +k,2 u = b n +k When a problem can be specified as a QP problem we can use solvers that are better than gradient descent or simulated annealing 3/29/18 Dr. Yanjun Qi / UVA 3

4 SVM as a QP problem R as I matrix, d as zero vector, c as 0 value M = 2 w T w min U u T Ru 2 + d T u + c subject to n inequality constraints: a 11 u 1 + a 12 u b 1!!! Min (w T w)/2 subject to the following inequality constraints: For all x in class + 1 w T x+b >= 1 For all x in class - 1 w T x+b <= -1 } A total of n constraints if we have n input samples a n1 u 1 + a n 2 u b n and k equivalency constraints: a n +1,1 u 1 + a n +1,2 u = b n +1!!! a n +k,1 u 1 + a n +k,2 u = b n +k 3/29/18 Dr. Yanjun Qi / UVA 4

5 Optimization Review: Ingredients Objective function Variables Constraints Find values of the variables that minimize or maximize the objective function while satisfying the constraints 3/29/18 Dr. Yanjun Qi / UVA 5

6 Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization and dual ü Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 6

7 Optimization Review: Lagrangian Duality The Primal Problem Primal: min w s.t. f 0 (w) f i (w) 0, i = 1,,k The generalized Lagrangian: L(w,α )= f 0 (w)+ i=1 the α's (α ι 0) are called the Lagarangian multipliers Method of Lagrange multipliers convert to a higher-dimensional problem k α i f i (w) Lemma: max L(w,α )= f 0 (w) if w satisfies primal constraints α,αi 0 o/w A re-written Primal: 3/29/18 min w max L(w,α ) Dr. α Yanjun,αi 0 Qi / UVA Eric CMU,

8 Optimization Review: Lagrangian Duality, cont. Recall the Primal Problem: min w max α,αi 0 L(w,α ) The Dual Problem: max α,αi 0 min w L(w,α ) Theorem (weak duality): d * = max min L(w,α ) min max α,αi L(w,α )= p* 0 w w α,α i 0 Theorem (strong duality): Iff there exist a saddle point of 3/29/18 we have L(w,α ) * d = p * Dr. Yanjun Qi / UVA

9 min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 9

10 Optimization Review: Constrained Optimization min u u 2 s.t. u >= b Allowed min Case 1: b Global min Allowed min Case 2: b Global min 3/29/18 Dr. Yanjun Qi / UVA 10

11 Optimization Review: Constrained Optimization min u u 2 s.t. u >= b Allowed min Case 1: b Global min Allowed min Case 2: b Global min 3/29/18 Dr. Yanjun Qi / UVA 11

12 Optimization Review: Constrained Optimization min u u 2 s.t. u >= b Allowed min Case 1: b Global min Allowed min Case 2: b Global min 3/29/18 Dr. Yanjun Qi / UVA 12

13 min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 13

14 min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 14

15 min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 15

16 min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 16

17 min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 17

18 min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 18

19 min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 19

20 3/29/18 Dr. Yanjun Qi / UVA 20

21 3/29/18 Dr. Yanjun Qi / UVA 21

22 Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization ü SVM Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 22

23 w T w min w,b max α 2 α i [(wt x i + b)y i 1] i α! i 0 i 3/29/18 Dr. Yanjun Qi / UVA 23

24 w T w min w,b max α 2 α i [(wt x i + b)y i 1] i α! i 0 i 3/29/18 Dr. Yanjun Qi / UVA 24

25 N ( ) L primal = 1! 2 w 2 α i y i (w x i + b) 1 i=1 3/29/18 Dr. Yanjun Qi / UVA 25

26 Optimization Review: Dual Problem Solving dual problem if the dual form is easier than primal form Need to change primal minimization to dual maximization (OR è Need to change primal maximization to dual minimization) Primal Problem Dual Problem, Strong duality Only valid when the original optimization problem is convex/concave (strong duality) 3/29/18 Dr. Yanjun Qi / UVA 26

27 Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization ü SVM Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 27

28 KKT Condition for Strong Duality Primal Problem Dual Problem, Strong duality 3/29/18 Dr. Yanjun Qi / UVA 28

29 Optimization Review: Lagrangian (even more general standard form) 3/29/18 Dr. Yanjun Qi / UVA 29

30 Optimization Review: Lagrange dual function Inf(.): greatest lower bound 3/29/18 Dr. Yanjun Qi / UVA 30

31 Optimization Review: inf (.): greatest lower bound 3/29/18 Dr. Yanjun Qi / UVA 31

32 Optimization Review: inf (.): greatest lower bound 3/29/18 Dr. Yanjun Qi / UVA 32

33 3/29/18 Dr. Yanjun Qi / UVA 33

34 Optimization Review: Key for SVM Dual 3/29/18 Dr. Yanjun Qi / UVA 34

35 Dual formulation for linearly non separable case (soft SVM) 3/29/18 Dr. Yanjun Qi / UVA 35

36 Dual formulation for linearly non separable case 3/29/18 Dr. Yanjun Qi / UVA 36

37 Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization ü SVM Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 37

38 Fast SVM Implementations SMO: Sequential Minimal Optimization SVM-Light LibSVM BSVM 3/29/18 Dr. Yanjun Qi / UVA 38

39 SMO: Sequential Minimal Optimization Key idea Divide the large QP problem of SVM into a series of smallest possible QP problems, which can be solved analytically and thus avoids using a timeconsuming numerical QP in the loop (a kind of SQP method). Space complexity: O(n). Since QP is greatly simplified, most time-consuming part of SMO is the evaluation of decision function, therefore it is very fast for linear SVM and sparse data. 3/29/18 Dr. Yanjun Qi / UVA 39

40 SMO At each step, SMO chooses 2 Lagrange multipliers to jointly optimize, find the optimal values for these multipliers and updates the SVM to reflect the new optimal values. Three components An analytic method to solve for the two Lagrange multipliers A heuristic for choosing which (next) two multipliers to optimize A method for computing b at each step, so that the KTT conditions are fulfilled for both the two examples (corresponding to the two multipliers ) 3/29/18 Dr. Yanjun Qi / UVA 40

41 Choosing Which Multipliers to Optimize First multiplier Iterate over the entire training set, and find an example that violates the KTT condition. Second multiplier Maximize the size of step taken during joint optimization. E 1 -E 2, where E i is the error on the i-th example. 3/29/18 Dr. Yanjun Qi / UVA 41

42 References Big thanks to Prof. Ziv Bar-Joseph and Prof. Eric CMU for allowing me to reuse some of his slides Elements of Statistical Learning, by Hastie, Tibshirani and Friedman Prof. Andrew CMU s slides Tutorial slides from Dr. Tie-Yan Liu, MSR Asia A Practical Guide to Support Vector Classification Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, Tutorial slides from Stanford Convex Optimization I Boyd & Vandenberghe 3/29/18 Dr. Yanjun Qi / UVA CS 42

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 9: Classifica8on with Support Vector Machine (cont.

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 9: Classifica8on with Support Vector Machine (cont. UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 9: Classifica8on with Support Vector Machine (cont.) Yanjun Qi / Jane University of Virginia Department of Computer Science