Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Similar documents
Support Vector Machines

Polyhedral Computation. Linear Classifiers & the SVM

Machine Learning. Support Vector Machines. Manfred Huber

Jeff Howbert Introduction to Machine Learning Winter

Training Support Vector Machines: Status and Challenges

Support Vector Machine

Perceptron Revisited: Linear Separators. Support Vector Machines

Support Vector Machine (continued)

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Introduction to Support Vector Machines

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machine for Classification and Regression

Lecture 10: A brief introduction to Support Vector Machine

Support Vector Machines and Kernel Methods

Support Vector Machines

Support Vector Machines

Introduction to Support Vector Machines

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Support Vector Machine (SVM) and Kernel Methods

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Support Vector Machines and Speaker Verification

Support Vector Machines Explained

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines.

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Support Vector Machines

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

A Revisit to Support Vector Data Description (SVDD)

Support Vector Machines, Kernel SVM

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Introduction to Support Vector Machines

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Support Vector Machines

Support Vector Machines and Kernel Methods

Lecture 10: Support Vector Machine and Large Margin Classifier

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Learning with kernels and SVM

Introduction to Logistic Regression and Support Vector Machine

Introduction to Discriminative Machine Learning

Support Vector Machine (SVM) and Kernel Methods

Lecture Notes on Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Machines

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Formulation with slack variables

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Support Vector Machine & Its Applications

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Support Vector Machines for Classification: A Statistical Portrait

STATE GENERALIZATION WITH SUPPORT VECTOR MACHINES IN REINFORCEMENT LEARNING. Ryo Goto, Toshihiro Matsui and Hiroshi Matsuo

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Statistical Pattern Recognition

Support Vector Machines: Maximum Margin Classifiers

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Introduction to SVM and RVM

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Linear & nonlinear classifiers

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Review: Support vector machines. Machine learning techniques and image analysis

ICS-E4030 Kernel Methods in Machine Learning

Pattern Recognition 2018 Support Vector Machines

CS145: INTRODUCTION TO DATA MINING

Kernels and the Kernel Trick. Machine Learning Fall 2017

SVMC An introduction to Support Vector Machines Classification

ML (cont.): SUPPORT VECTOR MACHINES

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Lecture 3 January 28

Support Vector Machines

Announcements - Homework

Machine Learning Techniques

CS798: Selected topics in Machine Learning

Soft-Margin Support Vector Machine

Convex Optimization and Support Vector Machine

Classifier Complexity and Support Vector Classifiers

Applied inductive learning - Lecture 7

Linear Classification and SVM. Dr. Xin Zhang

Learning From Data Lecture 25 The Kernel Trick

SUPPORT VECTOR MACHINE

Support Vector Machine

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Support Vector Machines

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Max Margin-Classifier

Support Vector Machines

Kernel Methods and Support Vector Machines

SVMs, Duality and the Kernel Trick

Linear & nonlinear classifiers

Support vector machines

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Support Vector and Kernel Methods

Transcription:

Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 2 / 22

Data Classification Basic concepts: SVM and kernels Given training data in different classes (labels known) Predict test data (labels unknown) Training and testing Chih-Jen Lin (National Taiwan Univ.) 3 / 22

Basic concepts: SVM and kernels Support Vector Classification Training vectors : x i, i = 1,..., l Feature vectors. For example, A patient = [height, weight,...] T Consider a simple case with two classes: Define an indicator vector y { 1 if xi in class 1 y i = 1 if x i in class 2 A hyperplane which separates all data Chih-Jen Lin (National Taiwan Univ.) 4 / 22

Basic concepts: SVM and kernels A separating hyperplane: w T x + b = 0 (w T x i ) + b 1 if y i = 1 (w T x i ) + b 1 if y i = 1 w T x + b = [ +1 ] 0 1 Decision function f (x) = sgn(w T x + b), x: test data Many possible choices of w and b Chih-Jen Lin (National Taiwan Univ.) 5 / 22

Maximal Margin Basic concepts: SVM and kernels Distance between w T x + b = 1 and 1: 2/ w = 2/ w T w A quadratic programming problem (Boser et al., 1992) min w,b 1 2 wt w subject to y i (w T x i + b) 1, i = 1,..., l. Chih-Jen Lin (National Taiwan Univ.) 6 / 22

Basic concepts: SVM and kernels Data May Not Be Linearly Separable An example: Allow training errors Higher dimensional ( maybe infinite ) feature space φ(x) = [φ 1 (x), φ 2 (x),...] T. Chih-Jen Lin (National Taiwan Univ.) 7 / 22

Basic concepts: SVM and kernels Standard SVM (Boser et al., 1992; Cortes and Vapnik, 1995) min w,b,ξ 1 2 wt w + C l i=1 subject to y i (w T φ(x i ) + b) 1 ξ i, ξ i 0, i = 1,..., l. Example: x R 3, φ(x) R 10 φ(x) = [1, 2x 1, 2x 2, 2x 3, x 2 1, x 2 2, x 2 3, 2x 1 x 2, 2x 1 x 3, 2x 2 x 3 ] T ξ i Chih-Jen Lin (National Taiwan Univ.) 8 / 22

Basic concepts: SVM and kernels Finding the Decision Function w: maybe infinite variables The dual problem: finite number of variables min α subject to 1 2 αt Qα e T α 0 α i C, i = 1,..., l y T α = 0, where Q ij = y i y j φ(x i ) T φ(x j ) and e = [1,..., 1] T At optimum w = l i=1 α iy i φ(x i ) A finite problem: #variables = #training data Chih-Jen Lin (National Taiwan Univ.) 9 / 22

Kernel Tricks Basic concepts: SVM and kernels Q ij = y i y j φ(x i ) T φ(x j ) needs a closed form Example: x i R 3, φ(x i ) R 10 φ(x i ) = [1, 2(x i ) 1, 2(x i ) 2, 2(x i ) 3, (x i ) 2 1, (x i ) 2 2, (x i ) 2 3, 2(x i ) 1 (x i ) 2, 2(x i ) 1 (x i ) 3, 2(x i ) 2 (x i ) 3 ] T Then φ(x i ) T φ(x j ) = (1 + x T i x j ) 2. Kernel: K(x, y) = φ(x) T φ(y); common kernels: e γ x i x j 2, (Radial Basis Function) (x T i x j /a + b) d (Polynomial kernel) Chih-Jen Lin (National Taiwan Univ.) 10 / 22

Basic concepts: SVM and kernels Can be inner product in infinite dimensional space Assume x R 1 and γ > 0. e γ x i x j 2 = e γ(x i x j ) 2 = e γx i 2+2γx ix j γxj 2 ( 2γx i x j 1 + + (2γx ix j ) 2 =e γx 2 i γx 2 j =e γx 2 i γx 2 j where + (2γx ix j ) 3 + ) 1! 2! 3! ( 2γ 2γ (2γ) 1 1+ 1! x i 1! x 2 (2γ) j + xi 2 2 2! 2! (2γ) 3 (2γ) + xi 3 3 xj 3 + ) = φ(x i ) T φ(x j ), 3! 3! φ(x) = e γx 2 [1, ] 2γ (2γ) 1! x, 2 (2γ) x 2 3 T, x 3,. 2! 3! x 2 j Chih-Jen Lin (National Taiwan Univ.) 11 / 22

Decision function Basic concepts: SVM and kernels At optimum Decision function = = w = l i=1 α iy i φ(x i ) w T φ(x) + b l α i y i φ(x i ) T φ(x) + b i=1 l α i y i K(x i, x) + b i=1 Only φ(x i ) of α i > 0 used support vectors Chih-Jen Lin (National Taiwan Univ.) 12 / 22

Basic concepts: SVM and kernels Support Vectors: More Important Data Only φ(x i ) of α i > 0 used support vectors 1.2 1 0.8 0.6 0.4 0.2 0-0.2-1.5-1 -0.5 0 0.5 1 Chih-Jen Lin (National Taiwan Univ.) 13 / 22

Outline SVM primal/dual problems Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 14 / 22

Deriving the Dual SVM primal/dual problems Consider the problem without ξ i 1 min w,b 2 wt w subject to y i (w T φ(x i ) + b) 1, i = 1,..., l. Its dual min α 1 2 αt Qα e T α subject to 0 α i, i = 1,..., l, y T α = 0. Chih-Jen Lin (National Taiwan Univ.) 15 / 22

Lagrangian Dual SVM primal/dual problems where ( max min L(w, b, α)), α 0 w,b L(w, b, α) = 1 2 w 2 l ( α i yi (w T φ(x i ) + b) 1 ) i=1 Strong duality ( min Primal = max min α 0 w,b L(w, b, α)) Chih-Jen Lin (National Taiwan Univ.) 16 / 22

SVM primal/dual problems Simplify the dual. When α is fixed, min L(w, b, α) = w,b { if l i=1 α iy i 0, 1 min w 2 wt w l i=1 α i[y i (w T φ(x i ) 1] if l i=1 α iy i = 0. If l i=1 α iy i 0, decrease in L(w, b, α) to b l α i y i i=1 Chih-Jen Lin (National Taiwan Univ.) 17 / 22

SVM primal/dual problems If l i=1 α iy i = 0, optimum of the strictly convex 1 2 wt w l i=1 α i[y i (w T φ(x i ) 1] happens when Thus, L(w, b, α) = 0. w w = l α i y i φ(x i ). i=1 Chih-Jen Lin (National Taiwan Univ.) 18 / 22

SVM primal/dual problems Note that w T w = = i,j ( l i=1 ) T ( l α i y i φ(x i ) j=1 α i α j y i y j φ(x i ) T φ(x j ) ) α j y j φ(x j ) max α 0 The dual is l α i 1 2 α i α j y i y j φ(x i ) T φ(x j ) i=1 i,j if l i=1 α iy i = 0, if l i=1 α iy i 0. Chih-Jen Lin (National Taiwan Univ.) 19 / 22

SVM primal/dual problems Lagrangian dual: max α 0 ( minw,b L(w, b, α) ) definitely not maximum of the dual Dual optimal solution not happen when. Dual simplified to max α R l l α i y i 0 i=1 l α i 1 2 i=1 l i=1 l α i α j y i y j φ(x i ) T φ(x j ) j=1 subject to y T α = 0, α i 0, i = 1,..., l. Chih-Jen Lin (National Taiwan Univ.) 20 / 22

SVM primal/dual problems Our problems may be infinite dimensional Can still use Lagrangian duality See a rigorous discussion in Lin (2001) Chih-Jen Lin (National Taiwan Univ.) 21 / 22

References I SVM primal/dual problems B. E. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144 152. ACM Press, 1992. C. Cortes and V. Vapnik. Support-vector network. Machine Learning, 20:273 297, 1995. C.-J. Lin. Formulations of support vector machines: a note from an optimization point of view. Neural Computation, 13(2):307 317, 2001. Chih-Jen Lin (National Taiwan Univ.) 22 / 22