Constrained Optimization and Support Vector Machines

Similar documents
Support Vector Machine (continued)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Jeff Howbert Introduction to Machine Learning Winter

Machine Learning. Support Vector Machines. Manfred Huber

Pattern Recognition 2018 Support Vector Machines

Support Vector Machine (SVM) and Kernel Methods

Learning with kernels and SVM

Support Vector Machine

Linear & nonlinear classifiers

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machines and Speaker Verification

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

L5 Support Vector Classification

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines

Linear & nonlinear classifiers

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Kernel Methods and Support Vector Machines

Introduction to Support Vector Machines

Lecture Notes on Support Vector Machine

Support Vector Machines: Maximum Margin Classifiers

Introduction to Support Vector Machines

Statistical Machine Learning from Data

Lecture 10: A brief introduction to Support Vector Machine

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Support Vector Machines and Kernel Methods

Max Margin-Classifier

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Support Vector Machines

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Support Vector Machines

ICS-E4030 Kernel Methods in Machine Learning

CSC 411 Lecture 17: Support Vector Machine

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Support vector machines

Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Statistical Pattern Recognition

CS798: Selected topics in Machine Learning

Non-linear Support Vector Machines

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Introduction to SVM and RVM

(Kernels +) Support Vector Machines

Support Vector Machines for Classification and Regression

Support Vector Machines

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

An introduction to Support Vector Machines

Review: Support vector machines. Machine learning techniques and image analysis

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Support Vector Machines

Support Vector Machine

Support Vector Machines, Kernel SVM

CS145: INTRODUCTION TO DATA MINING

Data Mining - SVM. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - SVM 1 / 55

Machine Learning And Applications: Supervised Learning-SVM

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

ML (cont.): SUPPORT VECTOR MACHINES

Support Vector Machines. Maximizing the Margin

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

CS-E4830 Kernel Methods in Machine Learning

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

A Tutorial on Support Vector Machine

Support Vector Machines

Support Vector Machine for Classification and Regression

Support Vector Machine. Industrial AI Lab.

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Announcements - Homework

Lecture Support Vector Machine (SVM) Classifiers

Lecture 10: Support Vector Machine and Large Margin Classifier

PCA and LDA. Man-Wai MAK

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Soft-margin SVM can address linearly separable problems with outliers

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Machine Learning A Geometric Approach

Introduction to Support Vector Machines

Brief Introduction of Machine Learning Techniques for Content Analysis

SUPPORT VECTOR MACHINE

Support Vector Machine & Its Applications

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Statistical Methods for NLP

1 Boosting. COS 511: Foundations of Machine Learning. Rob Schapire Lecture # 10 Scribe: John H. White, IV 3/6/2003

Support Vector Machines

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support Vector Machines Explained

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Lecture 18: Optimization Programming

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

Transcription:

Constrained Optimization and Support Vector Machines Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: C. Bishop, Pattern Recognition and Machine Learning, Appendix E, Springer, 2006. S.Y. Kung, M.W. Mak and S.H. Lin, Biometric Authentication: A Machine Learning Approach, Prentice Hall, 2005, Chapter 4. Lagrange multipliers and constrained optimization, www.khancademy.org March 19, 2018 Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 1 / 37

Overview 1 Motivations: Why Study Constrained Optimization and SVM Why Study Constrained Optimization Why Study SVM 2 Constrained Optimization Hard-Constrained Optimization Lagrangian Function Inequality Constraint Multiple Constraints Software Tools 3 Support Vector Machines Linear SVM: Separable Case Linear SVM: Fuzzy Separation (Optional) Nonlinear SVM SVM for Pattern Classification Software Tools Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 2 / 37

Why Study Constrained Optimization? Constrained optimization is used in almost every discipline: Power Electronics: Design of a boost power factor correction converter using optimization techniques, IEEE Transactions on Power Electronics, vol. 19, no. 6, pp. 1388-1396, Nov. 2004. Wireless Communication: Energy-constrained modulation optimization, IEEE Transactions on Wireless Communications, vol. 4, no. 5, pp. 2349-2360, Sept. 2005 Photonics: Module Placement Based on Resistive Network Optimization, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 3, no. 3, pp. 218-225, July 1984. Multimedia: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60.1-4 (1992): 259-268. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 3 / 37

Why Study SVM? SVM is a typical application of constraint optimization. SVMs are used everywhere: Power Electronics: Support Vector Machines Used to Estimate the Battery State of Charge, IEEE Transactions on Power Electronics, vol. 28, no. 12, pp. 5919-5926, Dec. 2013. Wireless Communication: Localization In Wireless Sensor Networks Based on Support Vector Machines, IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 7, pp. 981-994, July 2008. Photonics: Development of robust calibration models using support vector machines for spectroscopic monitoring of blood glucose. Analytical chemistry 82.23 (2010): 9719-9726. Multimedia: Support vector machines using GMM supervectors for speaker verification, IEEE Signal Processing Letters, vol. 13, no. 5, pp. 308-311, May 2006. Bioinformatics: Gene selection for cancer classification using support vector machines. Machine Learning, 46.1-3 (2002): 389-422. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 4 / 37

Constrained Optimization Constrained optimization is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The objective function is either a cost function or energy function which is to be minimized, or a reward function or utility function, which is to be maximized. Constraints can be either hard constraints which set conditions for the variables that are required to be satisfied, or soft constraints which have some variable values that are penalized in the objective function if the conditions on the variables are not satisfied. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 5 / 37

Constrained Optimization A general constrained minimization problem: min subject to f (x) g i (x) = c i for i = 1,..., n (Equality constraints) h j (x) d j for j = 1,..., m (Inquality constraints) (1) where g i (x) = c i and h j (x) d j are called hard constraints. If the constrained problem has only equality constraints, the method of Lagrange multipliers can be used to convert it into an unconstrained problem whose number of variables is the original number of variables plus the original number of equality constraints. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 6 / 37

Constrained Optimization Example: Maximization of a function of two variables with equality constraints: max f (x, y) (2) subject to g(x, y) = 0 At the optimal point (x, y ), the gradient of f (x, y) and g(x, y) are anti-parallel, i.e., f (x, y ) = λ g(x, y ), where λ is called the Lagrange multiplier. (See Tutorial for explanation.) Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 7 / 37

Constrained Optimization Example: max f (x, y) = x 2 y subject to x 2 + y 2 = 1 Note that the red curve (x 2 + y 2 = 1) is of 2-dimension. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 8 / 37

Constrained Optimization Extension to function of D variables: max f (x) subject to g(x) = 0 (3) where x R D. Optimal occurs when f (x) + λ g(x) = 0. (4) Note that the red curve is of dimension D 1. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 9 / 37

Constrained Optimization Left: Gradients of the objective function f (x, y) = x 2 y Right: Gradients of g(x, y) = x 2 + y 2. Note that λ < 0 in this example, which means that the gradients of f (x, y) and g(x, y) are parallel at the optimal point. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 10 / 37

Lagrangian Function Define the Lagrangian function as L(x, λ) f (x) + λg(x) (5) where λ 0 is the Lagrange multiplier. The optimal condition (Eq. 4) will be satisfied when x L = 0. Note that L/ λ = 0 leads to the constrained equation g(x) = 0. The constrained maximization can be written as: max L(x, λ) = f (x) + λg(x) subject to λ 0, g(x) = 0 (6) Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 11 / 37

Lagrangian Function: 2D Example Find the stationary point of the function f (x 1, x 2 ): max f (x 1, x 2 ) = 1 x1 2 x 2 2 subject to g(x 1, x 2 ) = x 1 + x 2 1 = 0 (7) Lagrangian function: L(x, λ) = 1 x 2 1 x 2 2 + λ(x 1 + x 2 1) Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 12 / 37

Lagrangian Function: 2D Example Differenting L(x, λ) w.r.t. x 1, x 2, and λ and set the results to 0, we obtain 2x 1 + λ = 0 2x 2 + λ = 0 x 1 + x 2 1 = 0 The solution is (x1, x 2 ) = ( 1 2, 1 2 ), and the corresponding λ = 1. As λ > 0, the gradients of f (x 1, x 2 ) and g(x 1, x 2 ) are anti-parallel at (x1, x 2 ). Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 13 / 37

Inequality Constraint Maximization with inequality constraint max f (x) subject to g(x) 0 (8) Two possible solutions for the max of L(x, µ) = f (x) + µg(x): Inactive Constraint : g(x) > 0, µ = 0, f (x) = 0 Active Constraint : g(x) = 0, µ > 0, f (x) = µ g(x) (9) Therefore, the maximization can be rewritten as max L(x, µ) = f (x) + µg(x) subject to g(x) 0, µ 0, µg(x) = 0 (10) which is known as the Karush-Kuhn-Tucker (KKT) condition. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 14 / 37

Inequality Constraint For minimization, min f (x) subject to g(x) 0 We can also express the minimization as min L(x, µ) = f (x) µg(x) subject to g(x) 0, µ 0, µg(x) = 0 (11) (12) Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 15 / 37

Multiple Constraints Maximization with multiple equality and inequality constraints: max subject to f (x) g j (x) = 0 for j = 1,..., J h k (x) 0 for k = 1,..., K. (13) This maximization can be written as max subject to L(x, {λ j }, {µ k }) = f (x) + J λ j g j (x) + K µ k h k (x) j=1 k=1 λ j 0, g j (x) = 0 for j = 1,..., J and µ k 0, h k (x) 0, µ k h k (x) = 0 for k = 1,..., K. (14) Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 16 / 37

Software Tools for Constrained Optimization Matlab Optimization Toolbox: fmincon can find the minimum of a function subject to nonlinear multivariable constraints. Python: scipy.optimize.minimize provides a common interface to unconstrained and constrained minimization algorithms for multivariate scalar functions Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 17 / 37

Linear SVM: Separable Case x c 1 a) x Consider 1 x a training set {x i, y i ; i = 1,..., N} 1 X {+1, 1} shown below, where X is the set of input (b) data in R D and y i are the labels. w/ w y i = 1 y i = 1 Margin: d x 2 w/ w : x 1 : x 2 x c) c Decision Plan x 1 c 2 w x + b = 1 w x + b = +1 w x + b = 0 x 1 (d) Figure: Linear SVM on 2-D space stration of a two-class classification process, from finding : y s for a basic i = +1; : y classifier to i = 1. the determination of decision and nes of the SVM classifier. (a) N sets of labeled data {x i, y i ;i = Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 18 / 37

Linear SVM: Separable Case A linear support vector machine (SVM) aims to find a decision plane (a line for the case of 2D) x w + b = 0 that maximizes the margin of separation (see Fig. 1). Assume that all data points satisfy the constraints: x i w + b +1 for i {1,..., N} where y i = +1. (15) x i w + b 1 for i {1,..., N} where y i = 1. (16) Data points x 1 and x 2 in previous page satisfy the equality constraint: x 1 w + b = +1 x 2 w + b = 1 (17) Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 19 / 37

Linear SVM: Separable Case Using Eq. 17 and Fig. 1, the distance between the two separating hyperplane (also called the margin of separation) can be computed: d(w) = (x 1 x 2 ) w w = 2 w Maximizing d(w) is equivalent to minimizing w 2. So, the constrained optimization problem in SVM is 1 min 2 w 2 subject to y i (x i w + b) 1 i = 1,..., N (18) Equivalently, minimizing a Lagrangian function: min L(w, b, {α i }) = 1 2 w 2 N i=1 α i[y i (x i w + b) 1] subject to α i 0, y i (x i w + b) 1 0, α i [y i (x i w + b) 1] = 0, i = 1,..., N (19) Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 20 / 37

Linear SVM: Separable Case Setting b L(w, b, {α i}) = 0 and w L(w, b, {α i}) = 0, (20) subject to the constraint α i 0, results in N α iy i = 0 and w = N α iy i x i. (21) i=1 i=1 Substituting these results back into the Lagrangian function: L(w, b, {α i }) = 1 N 2 (w w) N N α i y i (x i w) α i y i b + = 1 2 = N α i y i x i i=1 N α i 1 2 i=1 N N α j y j x j j=1 i=1 j=1 i=1 N α i y i x i i=1 N α i α j y i y j (x i x j ). i=1 N α j y j x j + Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 21 / 37 j=1 N i=1 i=1 α i α i

Linear SVM: Separable Case This results in the following Wolfe dual formulation: max α N α i 1 2 i=1 N i=1 j=1 N α i α j y i y j (x i x j ) subject to N α i y i = 0 and α i 0, i = 1,..., N. i=1 (22) The solution contains two kinds of Lagrange multiplier: 1 α i = 0: The corresponding x i are irrelevant 2 α i > 0: The corresponding x i are critical x k for which α k > 0 are called support vectors. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 22 / 37

Linear SVM: Separable Case The SVM output is given by f (x) = w x + b = k S α k y k x k x + b where S is the set of indexes for which α k > 0. b can be computed by using the KTT condition, i.e., for any k such that y k = 1 and α k > 0, we have α k [y k (x k w + b) 1] = 0 = b = 1 x k w. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 23 / 37

Linear SVM: Fuzzy Separation (Optional) If the data patterns are not separable by a linear hyperplane, a set of slack variables {ξ = ξ 1,..., ξ N } is introduced with ξ i 0 such that the inequality constraints in SVM become y i (x i w + b) 1 ξ i i = 1,..., N. (23) The slack variables {ξ i } N i=1 allow some data to violate the constraints in Eq. 18. The value of ξ i indicates the degree of violation of the constraint. The minimization problem becomes min 1 2 w 2 + C i ξ i, subject to y i (x i w + b) 1 ξ i, (24) where C is a user-defined penalty parameter to penalize any violation of the safety margin for all training data. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 24 / 37

Linear SVM: Fuzzy Separation (Optional) The new Lagrangian is L(w, b, α) = 1 2 w 2 +C N N ξ i α i (y i (x i w+b) 1+ξ i ) β i ξ i, i i=1 i=1 (25) where α i 0 and β i 0 are, respectively, the Lagrange multipliers to ensure that y i (x i w + b) 1 ξ i and that ξ i 0. Differentiating L(w, b, α) w.r.t. w, b, and ξ i, we obtain the Wolfe dual: N max α i 1 N N α i α j y i y j (x i x j ) (26) α 2 i=1 i=1 j=1 subject to 0 α i C, i = 1,..., N, N i=1 α iy i = 0. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 25 / 37

Linear SVM: Fuzzy Separation (Optional) Three types of support vectors: Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 26 / 37

Nonlinear SVM Assume that we have a nonlinear function φ(x) that map x from the input space to a much higher (possibly infinite) dimensional space called the feature space. While data are not linearly separable in the input space, they will become Section linearly 4.5. Nonlinear separable SVMs in the feature space. 109 x 2 Decision boundary y i = 1 y i = 1 y i = 1 y i = 1 Decision boundary K(x, x i) x 1 Input Space Feature Space Figure 4.8. Use of a kernel function to map nonlinearly separable data in the Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 27 / 37

Nonlinear SVM A 1-D problem requiring two decision boundaries (thresholds). 1-D linear SVMs could not solve this problem because they can only provide one decision threshold. Decision Boundaries x x = 0 x (a) φ(x) x 2 x x 2 c (b) Decision Boundary on the feature space x = 0 x Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 28 / 37

Nonlinear SVM We may use a nonlinear function φ is to perform the mapping: φ : x [x x 2 ] T. The decision boundary in Fig. 28(b) is a straight line that can perfectly separate the two classes. We may write the decision function as [ x 2 x c = [0 1] x 2 Or equivalently, where w = [0 1] T, φ(x) = [x ] c = 0 w T φ(x) + b = 0, (27) x 2 ] T, and b = c. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 29 / 37

Nonlinear SVM Left: A 2-D example in which linear SVMs will not be able to perfectly separate the two classes. Right: By transforming x = [x 1 x 2 ] T to: φ : x [x 2 1 2x1 x 2 x 2 2 ] T, (28) we will be able to use a linear SVM to separate the 2 classes in three dimensional space 3 Class 1 Class 2 2 9 8 1 7 6 5 x 2 0 z 3 4 3 2 1 1 0 10 2 3 3 2 1 0 1 2 3 x 1 5 0 z 2 5 10 0 2 4 z 1 6 8 10 Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 30 / 37

Nonlinear SVM The linear SVM has the form f (x) = i S α i y i φ(x i ) T φ(x) + b = w T φ(x) + b, where S is the set of support vector indexes and w = i S α iy i φ(x i ). In this simple problem, the dot products φ(x i ) T φ(x j ) for any x i and x j in the input space can be easily evaluated φ(x i ) T φ(x j ) = x 2 i1x 2 j1 + 2x i1 x i2 x j1 x j2 + x 2 i2x 2 j2 = (x T i x j ) 2. (29) Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 31 / 37

Nonlinear SVM The SVM output becomes f (x) = N α i y i φ(x i ) φ(x) + b i=1 However, the dimension of φ(x) is very high and could be infinite in some cases, meaning that this function may not be implementable. Fortunately, the dot product φ(x i ) φ(x) can be replaced by a kernel function: φ(x i ) φ(x) = φ(x) T φ(x) = K(x i, x) which can be efficiently implemented. Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 32 / 37

Nonlinear SVM Common kernel functions include ( Polynomial Kernel : K(x, x i ) = 1 + x x ) i p, p > 0 (30) σ 2 RBF Kernel : K(x, x i ) = exp { x x i 2 } 2σ 2 (31) Sigmoidal Kernel : K(x, x i ) = 1 1 + e x x i +b σ 2 (32) Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 33 / 37

Nonlinear SVM Comparing kernels: Comparing Kernels Linear SVM, C=1000.0, #SV=7, acc=95.00%, normw=0.94 10 RBF SVM, 2*sigma 2 =8.0, C=1000.0, #SV=7, acc=100.00% 10 9 18 16 14 9 18 16 14 8 11 12 17 8 11 12 17 7 1 7 1 6 2 6 2 x 2 5 3 4 13 x 2 5 3 4 13 4 20 10 19 15 4 20 10 19 15 3 5 3 5 2 1 6 8 2 6 8 Linear Kernel 7 9 1 RBF Kernel 7 9 0 0 0 2 4 6 8 10 0 2 4 6 8 10 x 1 x 1 Polynomial SVM, degree=2, C=10.0, #SV=7, acc=90.00% 10 9 18 16 14 8 11 12 17 7 1 6 2 x 2 5 3 4 13 4 20 10 19 15 3 2 1 6 5 7 8 9 Polynomial Kernel 0 0 2 4 6 8 10 x 1 Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 34 / 37

Nonlinear SVM 15 0 1 Support Vectors 15 0 1 Support Vectors 10 10 5 5 x 2 0 x 2 0 5 5 10 10 15 10 5 0 5 10 15 x 1 15 10 5 0 5 10 15 x 1 15 0 1 Support Vectors 10 5 x 2 0 5 10 15 10 5 0 5 10 15 x 1 Figure: Decision boundaries produced by a 2nd-order polynomial kernel (top), a 3rd-order polynomial kernel (left), and an RBF kernel (right). Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 35 / 37

SVM for Pattern Classification Support Vector Machines SVM is good for binary classification: SVM is good for binary classification. f (x) > 0 x Class 1; f (x) 0 x Class 2 To classify multiple classes, we can use the one-vs-rest To classify approach multiple to convert classes, K we binary use the classifications one-vs-restto approach a K-class to converting classification K binary classifications to a K-class classification: f (0) (x) = y i (0) α i (0) x i T x + b (0) i SV 0 k * = argmax f (k) (x) k Pick Max Score f (9) (9) (x) = y i α (9) i x T i x + b (9) SVM0 SVM1 SVM9 i SV 9 Classifying digit 0 and the rest Convert to Vector Classifying digit 9 from the rest 39! Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 36 / 37

Software Tools for SVM Matlab: fitcsvm trains an SVM for two-class classification. Python: svm from the sklearn package provides a set of supervised learning methods used for classification, regression and outliers detection. C/C++: LibSVM is a library for SVM. It also has Java, Perl, Python, Cuda, and Matlab interface. Java: SVM-JAVA implements sequential minimal optimization for training SVM in Java. Javascript: http://cs.stanford.edu/people/karpathy/svmjs/demo/ Man-Wai MAK (EIE) Constrained Optimization and SVM March 19, 2018 37 / 37