Soft-Margin Support Vector Machine

Similar documents
Learning From Data Lecture 25 The Kernel Trick

Machine Learning Techniques

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Machine Learning. Support Vector Machines. Manfred Huber

Introduction to Support Vector Machines

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machines

Support Vector Machine (continued)

Lecture 10: A brief introduction to Support Vector Machine

Support Vector Machine (SVM) and Kernel Methods

Perceptron Revisited: Linear Separators. Support Vector Machines

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machines, Kernel SVM

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Machine Learning Techniques

ICS-E4030 Kernel Methods in Machine Learning

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines and Kernel Methods

Support Vector Machines

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Introduction to Support Vector Machines

Machine Learning Techniques

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Support Vector Machines

Support Vector Machines

Linear & nonlinear classifiers

Lecture Support Vector Machine (SVM) Classifiers

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Lecture 18: Optimization Programming

Support Vector Machine

Lecture 10: Support Vector Machine and Large Margin Classifier

Support vector machines

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

L5 Support Vector Classification

CS-E4830 Kernel Methods in Machine Learning

Machine Learning Techniques

Lecture Notes on Support Vector Machine

Support Vector Machine

Linear & nonlinear classifiers

Support Vector Machines and Speaker Verification

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

SUPPORT VECTOR MACHINE

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Announcements - Homework

Statistical Machine Learning from Data

Support Vector Machines for Classification and Regression

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Support Vector Machines

Jeff Howbert Introduction to Machine Learning Winter

Support Vector Machines

Support Vector Machines for Regression

Learning with kernels and SVM

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Pattern Recognition 2018 Support Vector Machines

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

CSC 411 Lecture 17: Support Vector Machine

CS798: Selected topics in Machine Learning

Convex Optimization and Support Vector Machine

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Support Vector Machine & Its Applications

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

(Kernels +) Support Vector Machines

CS145: INTRODUCTION TO DATA MINING

10701 Recitation 5 Duality and SVM. Ahmed Hefny

Machine Learning A Geometric Approach

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Support vector machines Lecture 4

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

SVMs, Duality and the Kernel Trick

LECTURE 7 Support vector machines

Support Vector Machine

Kernel Methods and Support Vector Machines

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Statistical Pattern Recognition

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Introduction to SVM and RVM

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes

Applied Machine Learning Annalisa Marsico

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Modelli Lineari (Generalizzati) e SVM

Support Vector Machines

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Statistical Methods for SVM

Non-linear Support Vector Machines

Constrained Optimization and Support Vector Machines

Data Mining - SVM. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - SVM 1 / 55

Lecture 2: Linear SVM in the Dual

SVMs: nonlinearity through kernels

Transcription:

Soft-Margin Support Vector Machine Chih-Hao Chang Institute of Statistics, National University of Kaohsiung @ISS Academia Sinica Aug., 8 Chang (8) Soft-margin SVM Aug., 8 / 35

Review for Hard-Margin SVM Hard-Margin Versus Soft-Margin 3 Soft-Margin SVM Discussion Chang (8) Soft-margin SVM Aug., 8 / 35

Data set: {y n,x n } N, y n {+, } and x n R d. Example for d = and N = : x Chang (8) Soft-margin x SVM Aug., 8 3 / 35

Original problem of a linear classifier, h(x) = w x+b = : max margin(h), subject to h classifies every (x n,y n ) correctly, Parametrization of the problem: max margin(h), margin(h) = min,...,n distance(x n,h). subject to y n (w x n +b) > ; n =,...,N, margin(h) = min,...,n w y n(w x n +b). Chang (8) Soft-margin SVM Aug., 8 / 35

The modified problem of the classifier, h(x) = w x+b = : max margin(h), subject to y n (w x n +b) > ; n =,...,N, margin(h) = min,...,n w y n(w x n +b). Special scaling the minimal distance at : max b,w subject to w, min y n(w x n +b) =.,...,N Chang (8) Soft-margin SVM Aug., 8 5 / 35

Why can we scale the minimal distance at an arbitrary value? Suppose min,...,n y n (w x n +b) =. The margin is then / w. 3 To maximize / w is the same to maximize / w. The optimization problem is max b,w subject to w, and compare the the previous one: max b,w subject to min,...,n y n( w x n + b w, Obviously, w = w / w = / w. ) =. min y n(w x n +b) =.,...,N Chang (8) Soft-margin SVM Aug., 8 / 35

The scaled problem of the classifier, h(x) = w x+b = : max b,w subject to w, min y n(w x n +b) =.,...,N Release of constraints: min b,w w w, subject to y n (w x n +b) ; n =,...,N. Chang (8) Soft-margin SVM Aug., 8 7 / 35

Why can we release the constraints? If the optimal (b,w) / {(b,w) : min,...,n y n (w x n +b) = }, then say min,...,n y n (w x n +b) =.. As we previously mentioned, there is another equivalent optimal pair: (b, w) = (b/., w/.). 3 The resulting margin is w =. w > w, which fails the optimality of (b, w), a contradiction. Hence the optimal (b,w) {(b,w) : min,...,n y n (w x n +b) = } Chang (8) Soft-margin SVM Aug., 8 8 / 35

The released problem: min b,w w w, subject to y n (w x n +b) ; n =,...,N. Solution from the process of quadratic programming (QP): min u Qu+p u, subject to a m u c m ; m =,...,M. u Chang (8) Soft-margin SVM Aug., 8 9 / 35

The linear SVM: min b,w w w subject to y n (w x n +b) ; n =,...,N, Solution from the method of Lagrange multipliers: L(b,w,α) = w w+ α n ( y n (w x n +b)), where ( )) SVM = min max L(b,w,α) b,w α n Chang (8) Soft-margin SVM Aug., 8 / 35

Why ( SVM = min b,w max α n ( w w+ Can y n (w x n +b) < be happened? If yes, y n (w x n +b) >, then α n =. 3 This cannot be a solution of SVM. If y n (w x n +b) <, α n =. )) α n ( y n (w x n +b))? 3 Hence α n > can only be happened on y n (w x n +b) = (complementary slackness). Chang (8) Soft-margin SVM Aug., 8 / 35

For L(b,w,α) = w w+ α n ( y n (w x n +b)), the Lagrange dual problem (weak duality) is: ( )) SVM = min max L(b,w,α) max b,w α n The equality holds for strong duality: Convex; Existence of he solution; 3 Linear constraints. α n ( ) min L(b,w,α). b,w Chang (8) Soft-margin SVM Aug., 8 / 35

Simplify the dual problem: under the strong duality, since ( SVM = max min α n b,w w w + ( = max α min n, y nα n= w w w + ) α n ( y n (w x n +b)) ) α n ( y n (w x n )), L(b,w,α) b = α n y n =. Chang (8) Soft-margin SVM Aug., 8 3 / 35

Further simplify the dual problem: under the strong duality and by L(b,w,α) w i and hence SVM = = w i α n y n x n,i = w = ( max α min n, y nα n= w w w+ = max α n, y nα n=,w= α ny nx n ( w w+ α n y n x n, ) α n ( y n (x n w)) ) α n w w = max ( N α n, y α n y n x n + α n ). nα n=,w= α ny nx n Chang (8) Soft-margin SVM Aug., 8 / 35

Karush-Kuhn-Tucker (KKT) conditions to solve b and w from optimal α: Primal feasible: Dual feasible: y n (w x n +b) ; α n ; 3 Dual-inner optimal: yn α n =, and w = α n y n x n ; Primal-inner optimal (complementary slackness): α n ( y n (w x n +b)) =. Chang (8) Soft-margin SVM Aug., 8 5 / 35

The dual problem: max ( N ) α n, y α n y n x n + α n nα n=,w= α ny nx n The equivalent standard dual SVM problem: min α subject to m= α n α m y n y m x nx m y n α n = ; α n ; for n =,...,N, α n, which has N variables and N + constraints, and is a convex quadratic programming problem. Chang (8) Soft-margin SVM Aug., 8 / 35

By KKT conditions, we have w = α n y n x n and for complementary slackness condition: α n ( y n (w x n +b)) = ; n =,...,N, b = y s w x s, with α s >, Or let I = {s =,...,N : α s > }, b = I y s w x s, s I where I is the set of support vectors: y s (w x n +b) = ; s I. Chang (8) Soft-margin SVM Aug., 8 7 / 35

Primal linear SVM (suitable for small d): min b,w d+ variables; N constraints; w w subject to y n (w x n +b) ; n =,...,N. Dual SVM (suitable for small N): min α α Qα α; Q = {q n,m }; q n,m = y n y m x n x m, subject to y α = ; N variables; N + constraints. α n ; for n =,...,N, Chang (8) Soft-margin SVM Aug., 8 8 / 35

Review on linear and Kernel SVM How about non-linear boundary?.5..5 x x..5..5 x x Chang (8) Soft-margin SVM Aug., 8 9 / 35

Let Φ : R d R d and z n = Φ(x n ); n =,...,N. Kernel function: K Φ (x n,x m ) = Φ(x n ) Φ(x m ). 3 The dual problem on the transform function Φ: min α α Q Φ α α;, subject to y α = ; α n ; for n =,...,N, where Q Φ = {q n,m }; q n,m = y n y m K Φ (x n,x m ). Chang (8) Soft-margin SVM Aug., 8 / 35

Polynomial kernel function: K q (x,x ) = (ζ +γx x ) q ; γ >, ζ, which is commonly applied for polynomial SVM. Gaussian kernel function: K(x,x ) = exp( γ x x ), γ >, which is a kernel of infinite dimensional transform and for d = (i.e. x = x), ( Φ(x) = exp( x ),! x, )! x,..., Chang (8) Soft-margin SVM Aug., 8 / 35

The classifier of kernel svm, gsvm(x) = sign(w Φ(x)+b): ( N ) zs b = y s w z s = y s α n y n z n w Φ(x) = = y s α n y n K(x n,x s ), N α n y n z n z = α n y n K(x n,x). We don t need to know w and the hyperplane classifier can be a mystery. Chang (8) Soft-margin SVM Aug., 8 / 35

Hard-Margin SVM..5 x x x..5. x x x 5 8 x x 5 x 5 5 Chang (8) Soft-margin SVM Aug., 8 3 / 35

Soft-Margin SVM..5 x x x..5. x x x..5 x x x..5. Chang (8) Soft-margin SVM Aug., 8 / 35

Soft-Margin SVM If always insisting on separable, SVM tends overfitting to noise. Hard-margin SVM: min b,w,ξ w w, subject to y n (w x n +b). 3 Tolerance on the noise by allowing the data points being incorrectly classified. Record the margin violation by ξ n ; n =,...,N. 5 Soft-margin SVM: min b,w,ξ w w+c ξ n subject to y n (w x n +b) ξ n, ξ n ; n =,...,N, Chang (8) Soft-margin SVM Aug., 8 5 / 35

Soft-Margin Dual SVM Quadratic programming with d + + N variables and N constraints: min b,w,ξ w w+c ξ n subject to y n (w x n +b) ξ n, ξ n ; n =,...,N, ( ) The dual problem: SVM = min max L(b,w,ξ,α,β), b,w,ξ α n,β n L(b,w,ξ,α,β) = w w +C + ξ n + β n ( ξ n ) α n ( ξ n y n (w z n +b)). Chang (8) Soft-margin SVM Aug., 8 / 35

Soft-Margin Dual SVM Simplify the soft-margin dual problem: under the strong duality, SVM = ( max min α n,β n b,w,ξ w w+c + ξ n + ) α n ( ξ n y n (w z n +b)) ( = max min α n C,β n=c α n b,w w w+ β n ( ξ n ) ) α n ( y n (w z n +b)), since L ξ n = C α n β n = β n = C α n. Chang (8) Soft-margin SVM Aug., 8 7 / 35

Soft-Margin Dual SVM Further simplify the soft-margin dual problem: under the strong duality and by we have SVM = L b = N L w i = w = α n y n =, α n y n z n, { max α n C,β n=c α n, y nα n=,w= α ny nz n N α n y n z n + α n }. Chang (8) Soft-margin SVM Aug., 8 8 / 35

Soft-Margin Dual SVM The soft-margin dual SVM: SVM = { max α n C,β n=c α n, y nα n=,w= α ny nz n N α n y n z n + α n }. QP with N variables and N + constraints: min α n α m y n y m z α nz m subject to m= α n, y n α n =, α n C; n =,...,N, implicity w = α n y n z n, β n = C α n ; n =,...,N, Chang (8) Soft-margin SVM Aug., 8 9 / 35

Soft-Margin Kernel SVM QP with N variables and N + constraints: min α subject to m= α n α m y n y m K Φ (x n,x m ) α n, y n α n =, α n C; n =,...,N, implicity w = α n y n z n, β n = C α n ; n =,...,N, The hypothesis hyperplane of soft-margin kernel svm: ( N ) gsvm(x) = sign α n y n K Φ (x n,x)+b, where b can be solved by the complementary slackness. Chang (8) Soft-margin SVM Aug., 8 3 / 35

Soft-Margin Kernel SVM The hypothesis hyperplane: ( N ) gsvm(x) = sign α n y n K Φ (x n,x)+b, The complementary slackness conditions: α n ( ξ n y n (w z n +b)) =, (C α n )ξ n =. For support vectors (with α s > ): b = y s y s ξ s w z s. For free support vectors (with < α s < C): ξ s = b = y s w z s. Chang (8) Soft-margin SVM Aug., 8 3 / 35

Physical Meanings of α n Complementary slackness: α n ( ξ n y n (w z n +b)) =, (C α n )ξ n =. Non-SV (α n = ): ξ n = ; SV ( < α n < C): ξ n = ; 3 Bounded SV (α n = C): ξ n = y n (w z n +b). x α n can be used for data analysis. Chang (8) Soft-margin SVM Aug., 8 3 / 35 x

Selection of Penalty C 5 x x x 5 x x x....5.... x x. x...5....... Chang (8) Soft-margin SVM Aug., 8 33 / 35

R-code of the toy example library(kernlab) set.seed(5) # data generation x = c(rnorm(n,3,),rnorm(n,-3,),rnorm(n,,)) x = c(rnorm(n,3,),rnorm(n,-3,),rnorm(n,,)) y = factor(c(rep(t,n),rep(f,n),rep(c(t,f),n/))) data = data.frame(y =y,x=x,x=x) # fit the soft-margin Gaussian kernel SVM model.ksvm = ksvm(y ~ x + x, data = data, kernel="rbfdot", kpar=list(sigma=),c=) plot(model.ksvm, data=data) Chang (8) Soft-margin SVM Aug., 8 3 / 35

References Special thanks to Prof. Hsuan-Tien Lin Handout slides and youtube vedios: https://www.csie.ntu.edu.tw/~htlin/mooc/ Chang (8) Soft-margin SVM Aug., 8 35 / 35