Lecture 6: Minimum encoding ball and Support vector data description (SVDD)

Similar documents
Robust minimum encoding ball and Support vector data description (SVDD)

Lecture 2: Linear SVM in the Dual

SVM and Kernel machine

A Revisit to Support Vector Data Description (SVDD)

Support Vector Clustering

Non-linear Support Vector Machines

Support Vector Machine (SVM) and Kernel Methods

Lecture 9: Multi Kernel SVM

Understanding SVM (and associated kernel machines) through the development of a Matlab toolbox

Support Vector Machine (SVM) and Kernel Methods

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Soft-margin SVM can address linearly separable problems with outliers

Support Vector Machine (SVM) and Kernel Methods

Perceptron Revisited: Linear Separators. Support Vector Machines

Lecture Support Vector Machine (SVM) Classifiers

Formulation with slack variables

ν =.1 a max. of 10% of training set can be margin errors ν =.8 a max. of 80% of training can be margin errors

Lecture 10: A brief introduction to Support Vector Machine

ICS-E4030 Kernel Methods in Machine Learning

Machine Learning. Support Vector Machines. Manfred Huber

Comments on the Core Vector Machines: Fast SVM Training on Very Large Data Sets

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

CS-E4830 Kernel Methods in Machine Learning

Polyhedral Computation. Linear Classifiers & the SVM

Lecture Notes on Support Vector Machine

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Support Vector Machine (continued)

Support Vector Machines

Support Vector Machines

CS145: INTRODUCTION TO DATA MINING

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

SUPPORT VECTOR MACHINE

Convex Optimization and Support Vector Machine

Support Vector Machines

Convex Optimization and SVM

Support Vector Machines

Support Vector Machines Explained

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Lecture 10: Support Vector Machine and Large Margin Classifier

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

An introduction to Support Vector Machines

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Support Vector Machine

Support vector machines

Jeff Howbert Introduction to Machine Learning Winter

CS798: Selected topics in Machine Learning

Support Vector Machines and Kernel Algorithms

Support Vector Machines and Kernel Methods

Support Vector Machines

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013

Introduction to Support Vector Machines

Applied inductive learning - Lecture 7

ML (cont.): SUPPORT VECTOR MACHINES

Support Vector Machines, Kernel SVM

Support Vector Machine

Learning From Data Lecture 25 The Kernel Trick

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Statistical Machine Learning from Data

Convex Optimization M2

Soft-margin SVM can address linearly separable problems with outliers

Support Vector Machines

Support Vector Machines: Maximum Margin Classifiers

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION

Sequential Minimal Optimization (SMO)

A note on the generalization performance of kernel classifiers with margin. Theodoros Evgeniou and Massimiliano Pontil

Learning the Kernel Matrix with Semidefinite Programming

Lecture 3 January 28

A NEW MULTI-CLASS SUPPORT VECTOR ALGORITHM

Introduction to Support Vector Machines

Statistical Pattern Recognition

Robust Kernel-Based Regression

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Learning with kernels and SVM

Kernel methods and the exponential family

Soft-Margin Support Vector Machine

Machine Learning : Support Vector Machines

Solving the SVM Optimization Problem

L5 Support Vector Classification

Learning with Rigorous Support Vector Machines

Classifier Complexity and Support Vector Classifiers

Learning the Kernel Matrix with Semi-Definite Programming

Support Vector Machines

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Model Selection for LS-SVM : Application to Handwriting Recognition

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem

Support Vector Machines and Kernel Methods

Linear & nonlinear classifiers

Support Vector Machine I

Announcements - Homework

Transcription:

Lecture 6: Minimum encoding ball and Support vector data description (SVDD) Stéphane Canu stephane.canu@litislab.eu Sao Paulo 2014 May 12, 2014

Plan 1 Support Vector Data Description (SVDD) SVDD, the smallest enclosing ball problem The imum enclosing ball problem with errors The imum enclosing ball problem in a RKHS The two class Support vector data description (SVDD)

The imum enclosing ball problem [Tax and Duin, 2004] Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 3 / 35

The imum enclosing ball problem [Tax and Duin, 2004] the center Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 3 / 35

The imum enclosing ball problem [Tax and Duin, 2004] the radius Given n points, {x i, i = 1, n} { R 2 R IR,c IR d with x i c 2 R 2, i = 1,...,n. What is that in the convex programg hierarchy? LP, QP, QCQP, SOCP and SDP Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 3 / 35

The convex programg hierarchy (part of) LP x with and f x Ax d 0 x QCQP 1 x 2 x Gx+f x with x B i x+a i x d i i = 1, n QP { x 1 2 x Gx+f x with Ax d SOCP x with f x x a i b i x+d i i = 1, n The convex programg hierarchy? Model generality: LP < QP < QCQP < SOCP < SDP Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 4 / 35

MEB as a QP in the primal Theorem (MEB as a QP) The two following problems are equivalent, { R 2 R IR,c IR d with x i c 2 R 2, i = 1,...,n { w,ρ 1 2 w 2 ρ with w x i ρ+ 1 2 x i 2 with ρ = 1 2 ( c 2 R 2 ) and w = c. Proof: x i c 2 R 2 x i 2 2x i c+ c 2 R 2 2x i c R 2 x i 2 c 2 2x i c R 2 + x i 2 + c 2 x i c 1 2 ( c 2 R 2 ) + 1 2 x i 2 }{{} ρ Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 5 / 35

MEB and the one class SVM SVDD: { w,ρ 1 2 w 2 ρ with w x i ρ+ 1 2 x i 2 SVDD and linear OCSVM (Supporting Hyperplane) if i = 1, n, x i 2 = constant, it is the the linear one class SVM (OC SVM) The linear one class SVM [Schölkopf and Smola, 2002] { w,ρ 1 2 w 2 ρ with w x i ρ with ρ = ρ+ 1 2 x i 2 OC SVM is a particular case of SVDD Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 6 / 35

When i = 1, n, x i 2 = 1 c 0 with x i c 2 R 2 w x i ρ ρ = 1 2 ( c 2 R + 1) SVDD and OCSVM "Belonging to the ball" is also "being above" an hyperplane Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 7 / 35

MEB: KKT L(c, R,α) = R 2 + ( α i xi c 2 R 2) KKT conditionns : stationarty 2c α i 2 n α i x i = 0 1 α i = 0 primal admiss. x i c 2 R 2 dual admiss. α i 0 ( complementarity α i xi c 2 R 2) = 0 The representer theorem i = 1, n i = 1, n Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 8 / 35

MEB: KKT L(c, R,α) = R 2 + ( α i xi c 2 R 2) the radius KKT conditionns : stationarty 2c α i 2 n α i x i = 0 1 α i = 0 primal admiss. x i c 2 R 2 dual admiss. α i 0 ( complementarity α i xi c 2 R 2) = 0 The representer theorem i = 1, n i = 1, n Complementarity tells us: two groups of points the support vectors x i c 2 = R 2 and the insiders α i = 0 Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 8 / 35

MEB: Dual The representer theorem: c = α i x i = α i α i x i ( L(α) = α i xi α j x j 2) j=1 α i α j xi x j = α Gα j=1 and α i xi x i = α diag(g) with G = XX the Gram matrix: G ij = xi x j, α Gα α diag(g) α IR n with e α = 1 and 0 α i, i = 1...n Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 9 / 35

SVDD primal vs. dual Primal Dual R 2 R IR,c IR d with x i c 2 R 2, i = 1,...,n α Gα α diag(g) α with e α = 1 and 0 α i, i = 1...n d + 1 unknown n constraints can be recast as a QP perfect when d << n n unknown with G the pairwise influence Gram matrix n box constraints easy to solve to be used when d > n

SVDD primal vs. dual Primal Dual R 2 R IR,c IR d with x i c 2 R 2, i = 1,...,n α Gα α diag(g) α with e α = 1 and 0 α i, i = 1...n d + 1 unknown n constraints can be recast as a QP perfect when d << n n unknown with G the pairwise influence Gram matrix n box constraints easy to solve to be used when d > n But where is R 2?

Looking for R 2 { α Gα α diag(g) α with e α = 1, 0 α i, i = 1, n The Lagrangian: L(α,µ,β) = α Gα α diag(g)+µ(e α 1) β α Stationarity cond.: α L(α,µ,β) = 2Gα diag(g)+µe β = 0 The bi dual { α Gα+µ α with 2Gα+diag(G) µe by identification R 2 = µ+α Gα = µ+ c 2 µ is the Lagrange multiplier associated with the equality constraint α i = 1 Also, because of the complementarity condition, if x i is a support vector, then β i = 0 implies α i > 0 and R 2 = x i c 2.

Plan 1 Support Vector Data Description (SVDD) SVDD, the smallest enclosing ball problem The imum enclosing ball problem with errors The imum enclosing ball problem in a RKHS The two class Support vector data description (SVDD) Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 12 / 35

The imum enclosing ball problem with errors the slack The same road map: initial formuation reformulation (as a QP) Lagrangian, KKT dual formulation bi dual Initial formulation: for a given C R,a,ξ R 2 + C ξ i with x i c 2 R 2 +ξ i, i = 1,...,n and ξ i 0, i = 1,...,n Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 13 / 35

The MEB with slack: QP, KKT, dual and R 2 SVDD as a QP: w,ρ again with OC SVM as a particular case. With G = XX Dual SVDD: 1 2 w 2 ρ+ C 2 with w x i ρ+ 1 2 x i 2 1 2 ξ i and ξ i 0, i = 1, n α α Gα α diag(g) with e α = 1 and 0 α i C, i = 1, n for a given C 1. If C is larger than one it is useless (it s the no slack case) R 2 = µ+c c with µ denoting the Lagrange multiplier associated with the equality constraint n α i = 1. ξ i

Variations over SVDD Adaptive SVDD: the weighted error case for given w i, i = 1, n c IR p,r IR,ξ IR n with R + C w i ξ i x i c 2 R+ξ i ξ i 0 i = 1, n The dual of this problem is a QP [see for instance Liu et al., 2013] { α XX α α diag(xx ) α IR n n with α i = 1 0 α i Cw i i = 1, n Density induced SVDD (D-SVDD): c IR p,r IR,ξ IR n with R + C ξ i w i x i c 2 R+ξ i ξ i 0 i = 1, n

Plan 1 Support Vector Data Description (SVDD) SVDD, the smallest enclosing ball problem The imum enclosing ball problem with errors The imum enclosing ball problem in a RKHS The two class Support vector data description (SVDD) Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 16 / 35

SVDD in a RKHS The feature map: IR p H c f( ) x i k(x i, ) x i c IR p R 2 k(x i, ) f( ) 2 H R2 Kernelized SVDD (in a RKHS) is also a QP R 2 + C ξ i f H,R IR,ξ IR n with k(x i, ) f( ) 2 H R2 +ξ i i = 1, n ξ i 0 i = 1, n

SVDD in a RKHS: KKT, Dual and R 2 L = R 2 + C = R 2 + C ξ i + ξ i + ( α i k(xi,.) f(.) 2 H R 2 ) ξ i β i ξ i α i ( k(xi,x i ) 2f(x i )+ f 2 H R 2 ξ i ) β i ξ i KKT conditions Stationarity 2f(.) n α i 2 n α ik(.,x i ) = 0 The representer theorem 1 n α i = 0 C α i β i = 0 Primal admissibility: k(x i,.) f(.) 2 R 2 +ξ i, ξ i 0 Dual admissibility: α i 0, β i 0 Complementarity αi ( k(xi,.) f(.) 2 R 2 ξ i ) = 0 β i ξ i = 0

SVDD in a RKHS: Dual and R 2 L(α) = = α i k(x i,x i ) 2 α i k(x i,x i ) f(x i )+ f 2 H with f(.) = j=1 α i α j k(x i,x j ) }{{} G ij α j k(.,x j ) j=1 G ij = k(x i,x j ) α Gα α diag(g) α with e α = 1 and 0 α i C, i = 1...n As it is in the linear case: R 2 = µ+ f 2 H with µ denoting the Lagrange multiplier associated with the equality constraint n α i = 1.

SVDD train and val in a RKHS Train using the dual form (in: G, C; out: α, µ) α α Gα α diag(g) with e α = 1 and 0 α i C, i = 1...n Val with the center in the RKHS: f(.) = n α ik(.,x i ) φ(x) = k(x,.) f(.) 2 H R2 = k(x,.) 2 H 2 k(x,.), f(.) H + f(.) 2 H R2 = k(x,x) 2f(x)+R 2 µ R 2 = 2f(x)+k(x,x) µ = 2 α i k(x,x i )+k(x,x) µ φ(x) = 0 is the decision border Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 20 / 35

An important theoretical result For a well-calibrated bandwidth, The SVDD estimates the underlying distribution level set [Vert and Vert, 2006] The level sets of a probability density function IP(x) are the set C p = {x IR d IP(x) p} It is well estimated by the empirical imum volume set V p = {x IR d k(x,.) f(.) 2 H R 2 0} The frontiers coincides

SVDD: the generalization error For a well-calibrated bandwidth, (x 1,...,x n ) i.i.d. from some fixed but unknown IP(x) Then [Shawe-Taylor and Cristianini, 2004] with probability at least 1 δ, ( δ ]0, 1[), for any margin m > 0 IP ( k(x,.) f(.) 2 H R 2 + m ) 1 mn ξ i + 6R2 ln(2/δ) m n + 3 2n

Equivalence between SVDD and OCSVM for translation invariant kernels (diagonal constant kernels) Theorem Let H be a RKHS on some domain X endowed with kernel k. If there exists some constant c such that x X, k(x,x) = c, then the two following problems are equivalent, f,r,ξ with R + C ξ i k(x i,.) f(.) 2 H R+ξ i ξ i 0 i = 1, n with ρ = 1 2 (c + f 2 H R) and ε i = 1 2 ξ i. f,ρ,ξ with 1 2 f 2 H ρ+c f(x i ) ρ ε i ε i 0 i = 1, n ε i

Proof of the Equivalence between SVDD and OCSVM f H,R IR,ξ IR n R + C ξ i with k(x i,.) f(.) 2 H R+ξ i, ξ i 0 i = 1, n since k(x i,.) f(.) 2 H = k(x i,x i )+ f 2 H 2f(x i) R + C ξ i f H,R IR,ξ IR n with 2f(x i ) k(x i,x i )+ f 2 H R ξ i, ξ i 0 i = 1, n. Introducing ρ = 1 2 (c + f 2 H R) that is R = c + f 2 H 2ρ, and since k(x i,x i ) is constant and equals to c the SVDD problem becomes 1 f H,ρ IR,ξ IR n 2 f 2 H ρ+ C 2 with f(x i ) ρ 1 2 ξ i, ξ i 0 i = 1, n ξ i

leading to the classical one class SVM formulation (OCSVM) 1 f H,ρ IR,ξ IR n 2 f 2 H ρ+c ε i with f(x i ) ρ ε i, ε i 0 i = 1, n with ε i = 1 2 ξ i. Note that by putting ν = 1 nc we can get the so called ν formulation of the OCSVM 1 f H,ρ IR,ξ IR n 2 f 2 H nνρ + ξ i with f (x i ) ρ ξ i, ξ i 0 i = 1, n with f = Cf, ρ = Cρ, and ξ = Cξ.

Duality Note that the dual of the SVDD is { α IR n with α Gα α g n α i = 1 0 α i C i = 1, n where G is the kernel matrix of general term G i,j = k(x i,x j ) and g the diagonal vector such that g i = k(x i,x i ) = c. The dual of the OCSVM is the following equivalent QP { 1 α IR n 2 α Gα with n α i = 1 0 α i C i = 1, n Both dual forms provide the same solution α, but not the same Lagrange multipliers. ρ is the Lagrange multiplier of the equality constraint of the dual of the OCSVM and R = c +α Gα 2ρ. Using the SVDD dual, it turns out that R = λ eq +α Gα where λ eq is the Lagrange multiplier of the equality constraint of the SVDD dual form.

Plan 1 Support Vector Data Description (SVDD) SVDD, the smallest enclosing ball problem The imum enclosing ball problem with errors The imum enclosing ball problem in a RKHS The two class Support vector data description (SVDD) Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 27 / 35

The two class Support vector data description (SVDD) 4 4 3 3 2 2 1 1 0 0 1 1 2 3 4 3 2 1 0 1 2 3 2 3 4 3 2 1 0 1 2 3. R 2 +C c,r,ξ +,ξ ( ξ + i + y i =1 ξ i y i = 1 ) with x i c 2 R 2 +ξ + i, ξ + i 0 i such that y i = 1 and x i c 2 R 2 ξ i, ξ i 0 i such that y i = 1 Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 28 / 35

The two class SVDD as a QP R 2 +C c,r,ξ +,ξ ( ξ + i + y i =1 ξ i y i = 1 ) with x i c 2 R 2 +ξ + i, ξ + i 0 i such that y i = 1 and x i c 2 R 2 ξ i, ξ i 0 i such that y i = 1 { xi 2 2x i c+ c 2 R 2 +ξ + i, ξ + i 0 i such that y i = 1 x i 2 2x i c+ c 2 R 2 ξ i, ξ i 0 i such that y i = 1 2x i c c 2 R 2 + x i 2 ξ + i, ξ + i 0 i such that y i = 1 2x i c c 2 + R 2 x i 2 ξ i, ξ i 0 i such that y i = 1 2y i x i c y i ( c 2 R 2 + x i 2 ) ξ i, ξ i 0 i = 1, n change variable: ρ = c 2 R 2 c,ρ,ξ c 2 ρ + C n ξ i with 2y i x i c y i (ρ x i 2 ) ξ i i = 1, n and ξ i 0 i = 1, n

The dual of the two class SVDD G ij = y i y j x i x j The dual formulation: α IR n with α Gα n α iy i x i 2 y i α i = 1 0 α i C i = 1, n Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 30 / 35

The two class SVDD vs. one class SVDD The two class SVDD (left) vs. the one class SVDD (right) Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 31 / 35

Small Sphere and Large Margin (SSLM) approach Support vector data description with margin [Wu and Ye, 2009] ( R 2 +C ξ + w,r,ξ IR n i + ) ξ i y i =1 y i = 1 with x i c 2 R 2 1+ξ + i, ξ + i 0 i such that y i = 1 and x i c 2 R 2 + 1 ξ i, ξ i 0 i such that y i = 1 x i c 2 R 2 + 1 ξ i and y i = 1 y i x i c 2 y i R 2 1+ξ i L(c, R,ξ,α,β) = R 2 ( +C ξ i + α i yi x i c 2 y i R 2 ) + 1 ξ i β i ξ i 4 3 2 1 0 1 2 3 4 3 2 1 0 1 2 3

SVDD with margin dual formulation L(c, R,ξ,α,β) = R 2 +C Optimality: c = ξ i + α i y i x i ; L(α) = ( α i yi x i c 2 y i R 2 ) + 1 ξ i β i ξ i α i y i = 1 ; 0 α i C ( α i yi x i = j=1 α i y j x j 2) + j=1 α j α i y i y j x j x i + α i x i 2 y i α i + Dual SVDD is also a quadratic program α Gα e α f α α IR n problem D with y α = 1 and 0 α i C i = 1, n with G a symmetric matrix n n such that G ij = y i y j x j x i and f i = x i 2 y i α i

Conclusion Applications outlier detection change detection clustering large number of classes variable selection,... A clear path reformulation (to a standart problem) KKT Dual Bidual a lot of variations L 2 SVDD two classes non symmetric two classes in the symmetric classes (SVM) the multi classes issue practical problems with translation invariant kernels.

Bibliography Bo Liu, Yanshan Xiao, Longbing Cao, Zhifeng Hao, and Feiqi Deng. Svdd-based outlier detection on uncertain data. Knowledge and information systems, 34(3):597 618, 2013. B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002. John Shawe-Taylor and Nello Cristianini. Kernel methods for pattern analysis. Cambridge university press, 2004. David MJ Tax and Robert PW Duin. Support vector data description. Machine learning, 54(1):45 66, 2004. Régis Vert and Jean-Philippe Vert. Consistency and convergence rates of one-class svms and related algorithms. The Journal of Machine Learning Research, 7:817 854, 2006. Mingrui Wu and Jieping Ye. A small sphere and large margin approach for novelty detection using training data with outliers. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(11):2088 2092, 2009. Stéphane Canu (INSA Rouen - LITIS) May 12, 2014 35 / 35