UVA CS 4501: Machine Learning Lecture 16 Extra: Support Vector Machine Optimization with Dual Dr. Yanjun Qi University of Virginia Department of Computer Science
Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization ü SVM Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 2
Optimization with Quadratic programming (QP) Quadratic programming solves optimization problems of the following form: min U u T Ru 2 + d T u + c subject to n inequality constraints: a 11 u 1 + a 12 u 2 +... b 1!!! Quadratic term a n1 u 1 + a n 2 u 2 +... b n and k equivalency constraints: a n +1,1 u 1 + a n +1,2 u 2 +...= b n +1!!! a n +k,1 u 1 + a n +k,2 u 2 +...= b n +k When a problem can be specified as a QP problem we can use solvers that are better than gradient descent or simulated annealing 3/29/18 Dr. Yanjun Qi / UVA 3
SVM as a QP problem R as I matrix, d as zero vector, c as 0 value M = 2 w T w min U u T Ru 2 + d T u + c subject to n inequality constraints: a 11 u 1 + a 12 u 2 +... b 1!!! Min (w T w)/2 subject to the following inequality constraints: For all x in class + 1 w T x+b >= 1 For all x in class - 1 w T x+b <= -1 } A total of n constraints if we have n input samples a n1 u 1 + a n 2 u 2 +... b n and k equivalency constraints: a n +1,1 u 1 + a n +1,2 u 2 +... = b n +1!!! a n +k,1 u 1 + a n +k,2 u 2 +... = b n +k 3/29/18 Dr. Yanjun Qi / UVA 4
Optimization Review: Ingredients Objective function Variables Constraints Find values of the variables that minimize or maximize the objective function while satisfying the constraints 3/29/18 Dr. Yanjun Qi / UVA 5
Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization and dual ü Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 6
Optimization Review: Lagrangian Duality The Primal Problem Primal: min w s.t. f 0 (w) f i (w) 0, i = 1,,k The generalized Lagrangian: L(w,α )= f 0 (w)+ i=1 the α's (α ι 0) are called the Lagarangian multipliers Method of Lagrange multipliers convert to a higher-dimensional problem k α i f i (w) Lemma: max L(w,α )= f 0 (w) if w satisfies primal constraints α,αi 0 o/w A re-written Primal: 3/29/18 min w max L(w,α ) Dr. α Yanjun,αi 0 Qi / UVA Eric Xing @ CMU, 2006-2008
Optimization Review: Lagrangian Duality, cont. Recall the Primal Problem: min w max α,αi 0 L(w,α ) The Dual Problem: max α,αi 0 min w L(w,α ) Theorem (weak duality): d * = max min L(w,α ) min max α,αi L(w,α )= p* 0 w w α,α i 0 Theorem (strong duality): Iff there exist a saddle point of 3/29/18 we have L(w,α ) * d = p * Dr. Yanjun Qi / UVA
min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 9
Optimization Review: Constrained Optimization min u u 2 s.t. u >= b Allowed min Case 1: b Global min Allowed min Case 2: b Global min 3/29/18 Dr. Yanjun Qi / UVA 10
Optimization Review: Constrained Optimization min u u 2 s.t. u >= b Allowed min Case 1: b Global min Allowed min Case 2: b Global min 3/29/18 Dr. Yanjun Qi / UVA 11
Optimization Review: Constrained Optimization min u u 2 s.t. u >= b Allowed min Case 1: b Global min Allowed min Case 2: b Global min 3/29/18 Dr. Yanjun Qi / UVA 12
min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 13
min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 14
min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 15
min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 16
min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 17
min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 18
min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 19
3/29/18 Dr. Yanjun Qi / UVA 20
3/29/18 Dr. Yanjun Qi / UVA 21
Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization ü SVM Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 22
w T w min w,b max α 2 α i [(wt x i + b)y i 1] i α! i 0 i 3/29/18 Dr. Yanjun Qi / UVA 23
w T w min w,b max α 2 α i [(wt x i + b)y i 1] i α! i 0 i 3/29/18 Dr. Yanjun Qi / UVA 24
N ( ) L primal = 1! 2 w 2 α i y i (w x i + b) 1 i=1 3/29/18 Dr. Yanjun Qi / UVA 25
Optimization Review: Dual Problem Solving dual problem if the dual form is easier than primal form Need to change primal minimization to dual maximization (OR è Need to change primal maximization to dual minimization) Primal Problem Dual Problem, Strong duality Only valid when the original optimization problem is convex/concave (strong duality) 3/29/18 Dr. Yanjun Qi / UVA 26
Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization ü SVM Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 27
KKT Condition for Strong Duality Primal Problem Dual Problem, Strong duality 3/29/18 Dr. Yanjun Qi / UVA 28
Optimization Review: Lagrangian (even more general standard form) 3/29/18 Dr. Yanjun Qi / UVA 29
Optimization Review: Lagrange dual function Inf(.): greatest lower bound 3/29/18 Dr. Yanjun Qi / UVA 30
Optimization Review: inf (.): greatest lower bound 3/29/18 Dr. Yanjun Qi / UVA 31
Optimization Review: inf (.): greatest lower bound 3/29/18 Dr. Yanjun Qi / UVA 32
3/29/18 Dr. Yanjun Qi / UVA 33
Optimization Review: Key for SVM Dual 3/29/18 Dr. Yanjun Qi / UVA 34
Dual formulation for linearly non separable case (soft SVM) 3/29/18 Dr. Yanjun Qi / UVA 35
Dual formulation for linearly non separable case 3/29/18 Dr. Yanjun Qi / UVA 36
Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization ü SVM Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 37
Fast SVM Implementations SMO: Sequential Minimal Optimization SVM-Light LibSVM BSVM 3/29/18 Dr. Yanjun Qi / UVA 38
SMO: Sequential Minimal Optimization Key idea Divide the large QP problem of SVM into a series of smallest possible QP problems, which can be solved analytically and thus avoids using a timeconsuming numerical QP in the loop (a kind of SQP method). Space complexity: O(n). Since QP is greatly simplified, most time-consuming part of SMO is the evaluation of decision function, therefore it is very fast for linear SVM and sparse data. 3/29/18 Dr. Yanjun Qi / UVA 39
SMO At each step, SMO chooses 2 Lagrange multipliers to jointly optimize, find the optimal values for these multipliers and updates the SVM to reflect the new optimal values. Three components An analytic method to solve for the two Lagrange multipliers A heuristic for choosing which (next) two multipliers to optimize A method for computing b at each step, so that the KTT conditions are fulfilled for both the two examples (corresponding to the two multipliers ) 3/29/18 Dr. Yanjun Qi / UVA 40
Choosing Which Multipliers to Optimize First multiplier Iterate over the entire training set, and find an example that violates the KTT condition. Second multiplier Maximize the size of step taken during joint optimization. E 1 -E 2, where E i is the error on the i-th example. 3/29/18 Dr. Yanjun Qi / UVA 41
References Big thanks to Prof. Ziv Bar-Joseph and Prof. Eric Xing @ CMU for allowing me to reuse some of his slides Elements of Statistical Learning, by Hastie, Tibshirani and Friedman Prof. Andrew Moore @ CMU s slides Tutorial slides from Dr. Tie-Yan Liu, MSR Asia A Practical Guide to Support Vector Classification Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, 2003-2010 Tutorial slides from Stanford Convex Optimization I Boyd & Vandenberghe 3/29/18 Dr. Yanjun Qi / UVA CS 42