UVA CS 4501: Machine Learning - PDF Free Download

UVA CS 4501: Machine Learning Lecture 16 Extra: Support Vector Machine Optimization with Dual Dr. Yanjun Qi University of Virginia Department of Computer Science

Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization ü SVM Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 2

Optimization with Quadratic programming (QP) Quadratic programming solves optimization problems of the following form: min U u T Ru 2 + d T u + c subject to n inequality constraints: a 11 u 1 + a 12 u 2 +... b 1!!! Quadratic term a n1 u 1 + a n 2 u 2 +... b n and k equivalency constraints: a n +1,1 u 1 + a n +1,2 u 2 +...= b n +1!!! a n +k,1 u 1 + a n +k,2 u 2 +...= b n +k When a problem can be specified as a QP problem we can use solvers that are better than gradient descent or simulated annealing 3/29/18 Dr. Yanjun Qi / UVA 3

SVM as a QP problem R as I matrix, d as zero vector, c as 0 value M = 2 w T w min U u T Ru 2 + d T u + c subject to n inequality constraints: a 11 u 1 + a 12 u 2 +... b 1!!! Min (w T w)/2 subject to the following inequality constraints: For all x in class + 1 w T x+b >= 1 For all x in class - 1 w T x+b <= -1 } A total of n constraints if we have n input samples a n1 u 1 + a n 2 u 2 +... b n and k equivalency constraints: a n +1,1 u 1 + a n +1,2 u 2 +... = b n +1!!! a n +k,1 u 1 + a n +k,2 u 2 +... = b n +k 3/29/18 Dr. Yanjun Qi / UVA 4

Optimization Review: Ingredients Objective function Variables Constraints Find values of the variables that minimize or maximize the objective function while satisfying the constraints 3/29/18 Dr. Yanjun Qi / UVA 5

Today Extra q Optimization of SVM ü SVM as QP ü A simple example of constrained optimization and dual ü Optimization with dual form ü KKT condition ü SMO algorithm for fast SVM dual optimization 3/29/18 Dr. Yanjun Qi / UVA 6

Optimization Review: Lagrangian Duality The Primal Problem Primal: min w s.t. f 0 (w) f i (w) 0, i = 1,,k The generalized Lagrangian: L(w,α )= f 0 (w)+ i=1 the α's (α ι 0) are called the Lagarangian multipliers Method of Lagrange multipliers convert to a higher-dimensional problem k α i f i (w) Lemma: max L(w,α )= f 0 (w) if w satisfies primal constraints α,αi 0 o/w A re-written Primal: 3/29/18 min w max L(w,α ) Dr. α Yanjun,αi 0 Qi / UVA Eric Xing @ CMU, 2006-2008

Optimization Review: Lagrangian Duality, cont. Recall the Primal Problem: min w max α,αi 0 L(w,α ) The Dual Problem: max α,αi 0 min w L(w,α ) Theorem (weak duality): d * = max min L(w,α ) min max α,αi L(w,α )= p* 0 w w α,α i 0 Theorem (strong duality): Iff there exist a saddle point of 3/29/18 we have L(w,α ) * d = p * Dr. Yanjun Qi / UVA

min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 9

Optimization Review: Constrained Optimization min u u 2 s.t. u >= b Allowed min Case 1: b Global min Allowed min Case 2: b Global min 3/29/18 Dr. Yanjun Qi / UVA 10

Optimization Review: Constrained Optimization min u u 2 s.t. u >= b Allowed min Case 1: b Global min Allowed min Case 2: b Global min 3/29/18 Dr. Yanjun Qi / UVA 11

Optimization Review: Constrained Optimization min u u 2 s.t. u >= b Allowed min Case 1: b Global min Allowed min Case 2: b Global min 3/29/18 Dr. Yanjun Qi / UVA 12

min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 13

min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 14

min u u 2 s.t. u >= b 3/29/18 Dr. Yanjun Qi / UVA 15

min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 16

min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 17

min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 18

min u u 2 s.t. u >= b Dual Primal 3/29/18 Dr. Yanjun Qi / UVA 19

3/29/18 Dr. Yanjun Qi / UVA 20

3/29/18 Dr. Yanjun Qi / UVA 21

w T w min w,b max α 2 α i [(wt x i + b)y i 1] i α! i 0 i 3/29/18 Dr. Yanjun Qi / UVA 23

w T w min w,b max α 2 α i [(wt x i + b)y i 1] i α! i 0 i 3/29/18 Dr. Yanjun Qi / UVA 24

N ( ) L primal = 1! 2 w 2 α i y i (w x i + b) 1 i=1 3/29/18 Dr. Yanjun Qi / UVA 25

Optimization Review: Dual Problem Solving dual problem if the dual form is easier than primal form Need to change primal minimization to dual maximization (OR è Need to change primal maximization to dual minimization) Primal Problem Dual Problem, Strong duality Only valid when the original optimization problem is convex/concave (strong duality) 3/29/18 Dr. Yanjun Qi / UVA 26

KKT Condition for Strong Duality Primal Problem Dual Problem, Strong duality 3/29/18 Dr. Yanjun Qi / UVA 28

Optimization Review: Lagrangian (even more general standard form) 3/29/18 Dr. Yanjun Qi / UVA 29

Optimization Review: Lagrange dual function Inf(.): greatest lower bound 3/29/18 Dr. Yanjun Qi / UVA 30

Optimization Review: inf (.): greatest lower bound 3/29/18 Dr. Yanjun Qi / UVA 31

Optimization Review: inf (.): greatest lower bound 3/29/18 Dr. Yanjun Qi / UVA 32

3/29/18 Dr. Yanjun Qi / UVA 33

Optimization Review: Key for SVM Dual 3/29/18 Dr. Yanjun Qi / UVA 34

Dual formulation for linearly non separable case (soft SVM) 3/29/18 Dr. Yanjun Qi / UVA 35

Dual formulation for linearly non separable case 3/29/18 Dr. Yanjun Qi / UVA 36

Fast SVM Implementations SMO: Sequential Minimal Optimization SVM-Light LibSVM BSVM 3/29/18 Dr. Yanjun Qi / UVA 38

SMO: Sequential Minimal Optimization Key idea Divide the large QP problem of SVM into a series of smallest possible QP problems, which can be solved analytically and thus avoids using a timeconsuming numerical QP in the loop (a kind of SQP method). Space complexity: O(n). Since QP is greatly simplified, most time-consuming part of SMO is the evaluation of decision function, therefore it is very fast for linear SVM and sparse data. 3/29/18 Dr. Yanjun Qi / UVA 39

SMO At each step, SMO chooses 2 Lagrange multipliers to jointly optimize, find the optimal values for these multipliers and updates the SVM to reflect the new optimal values. Three components An analytic method to solve for the two Lagrange multipliers A heuristic for choosing which (next) two multipliers to optimize A method for computing b at each step, so that the KTT conditions are fulfilled for both the two examples (corresponding to the two multipliers ) 3/29/18 Dr. Yanjun Qi / UVA 40

Choosing Which Multipliers to Optimize First multiplier Iterate over the entire training set, and find an example that violates the KTT condition. Second multiplier Maximize the size of step taken during joint optimization. E 1 -E 2, where E i is the error on the i-th example. 3/29/18 Dr. Yanjun Qi / UVA 41

References Big thanks to Prof. Ziv Bar-Joseph and Prof. Eric Xing @ CMU for allowing me to reuse some of his slides Elements of Statistical Learning, by Hastie, Tibshirani and Friedman Prof. Andrew Moore @ CMU s slides Tutorial slides from Dr. Tie-Yan Liu, MSR Asia A Practical Guide to Support Vector Classification Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, 2003-2010 Tutorial slides from Stanford Convex Optimization I Boyd & Vandenberghe 3/29/18 Dr. Yanjun Qi / UVA CS 42