Jeff Howbert Introduction to Machine Learning Winter

Similar documents
Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machine (continued)

Lecture 10: A brief introduction to Support Vector Machine

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Support Vector Machines

Support Vector Machine (SVM) and Kernel Methods

Pattern Recognition 2018 Support Vector Machines

Lecture Notes on Support Vector Machine

Machine Learning. Support Vector Machines. Manfred Huber

Introduction to SVM and RVM

Support Vector Machines

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Perceptron Revisited: Linear Separators. Support Vector Machines

Support Vector Machines

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Support Vector Machine (SVM) and Kernel Methods

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

CSC 411 Lecture 17: Support Vector Machine

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Statistical Pattern Recognition

Introduction to Support Vector Machines

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Support Vector Machines

Review: Support vector machines. Machine learning techniques and image analysis

Support Vector Machines for Classification and Regression

Support Vector Machines Explained

Linear & nonlinear classifiers

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Introduction to Support Vector Machines

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

SVMs, Duality and the Kernel Trick

CS145: INTRODUCTION TO DATA MINING

Support Vector Machines

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear & nonlinear classifiers

Convex Optimization and Support Vector Machine

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Support Vector Machines.

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Kernel Methods and Support Vector Machines

Announcements - Homework

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machine

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Constrained Optimization and Support Vector Machines

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

Support Vector Machine. Industrial AI Lab.

Support Vector Machine II

ML (cont.): SUPPORT VECTOR MACHINES

Support Vector Machine & Its Applications

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

Support Vector Machines and Kernel Methods

Support Vector Machine for Classification and Regression

Support vector machines

Statistical Machine Learning from Data

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Lecture Support Vector Machine (SVM) Classifiers

Classification and Support Vector Machine

Support vector machines Lecture 4

CS798: Selected topics in Machine Learning

L5 Support Vector Classification

Support Vector Machine

Linear Classification and SVM. Dr. Xin Zhang

Learning with kernels and SVM

(Kernels +) Support Vector Machines

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Non-linear Support Vector Machines

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Machine Learning And Applications: Supervised Learning-SVM

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Soft-margin SVM can address linearly separable problems with outliers

Support Vector Machines

6.036 midterm review. Wednesday, March 18, 15

Kernelized Perceptron Support Vector Machines

Support Vector Machines and Speaker Verification

Support Vector Machines

Max Margin-Classifier

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines

Classifier Complexity and Support Vector Classifiers

Kernel methods, kernel SVM and ridge regression

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Transcription:

Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1

Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable classes SVM classifiers for nonlinear decision boundaries kernel functions Other applications of SVMs Software Jeff Howbert Introduction to Machine Learning Winter 2012 2

Linearly separable classes Goal: find a linear decision boundary (hyperplane) that separates the classes Jeff Howbert Introduction to Machine Learning Winter 2012 3

One possible solution Jeff Howbert Introduction to Machine Learning Winter 2012 4

Another possible solution Jeff Howbert Introduction to Machine Learning Winter 2012 5

Other possible solutions Jeff Howbert Introduction to Machine Learning Winter 2012 6

Which h one is better? B1 or B2? How do you define better? Jeff Howbert Introduction to Machine Learning Winter 2012 7

Hyperplane that maximizes the margin will have better generalization => B1 is better than B2 Jeff Howbert Introduction to Machine Learning Winter 2012 8

B 1 test sample B 2 b 21 b 22 margin b 11 Hyperplane that maximizes the margin will have better generalization => B1 is better than B2 Jeff Howbert Introduction to Machine Learning Winter 2012 9 b 12

B 1 test sample B 2 b 21 b 22 margin b 11 Hyperplane that maximizes the margin will have better generalization => B1 is better than B2 Jeff Howbert Introduction to Machine Learning Winter 2012 10 b 12

B 1 W w x + b = 0 w x + b = 1 w x + b = +1 b 11 y i = f ( x ) = + 1 if w x + b 1 2 margin = 1 if w x + b 1 w Jeff Howbert Introduction to Machine Learning Winter 2012 11 b 12

We want to maximize: margin = 2 w Which is equivalent to minimizing: L( w ) = w 2 2 But subject to the following constraints: y i + 1 if w x + b 1 = f ( x) = 1 if w x + b 1 This is a constrained convex optimization problem Solve with numerical approaches, e.g. quadratic programming Jeff Howbert Introduction to Machine Learning Winter 2012 12

Solving for w that gives maximum margin: 1. Combine objective function and constraints into new objective function, using Lagrange multipliers λ i N 1 2 L primal = w λi ( yi ( w xi + b ) 1 ) 2 i= 1 2. To minimize this Lagrangian, we take derivatives of w and b and set them to 0: L L p w p = 0 w = λ i y ix N N i= 1 = 0 λ y b i= 1 Jeff Howbert Introduction to Machine Learning Winter 2012 13 y i i = 0 i

Solving for w that gives maximum margin: 3. Substituting and rearranging gives the dual of the Lagrangian: L N 1 λ λ λ y y x x dual = i i j i y j i j i= 1 2 i, j which we try to maximize (not minimize). 4. Once we have the λ i, we can substitute into previous equations to get w and b. 5. This defines w and b as linear combinations of the training data. Jeff Howbert Introduction to Machine Learning Winter 2012 14

Optimizing the dual is easier. Function of λ i only, not λ i and w. Convex optimization guaranteed to find global optimum. Most of the λ i go to zero. The x i for which λ i 0 are called the support vectors. These support (lie on) the margin boundaries. The x i for which λ i = 0 lie away from the margin boundaries. They are not required for defining the maximum margin hyperplane. Jeff Howbert Introduction to Machine Learning Winter 2012 15

Example of solving for maximum margin hyperplane Jeff Howbert Introduction to Machine Learning Winter 2012 16

What if the classes are not linearly separable? Jeff Howbert Introduction to Machine Learning Winter 2012 17

Now which one is better? B1 or B2? How do you define better? Jeff Howbert Introduction to Machine Learning Winter 2012 18

What if the problem is not linearly separable? Solution: introduce slack variables Need to minimize: Subject to: y i = f ( x ) = 2 w = + N k L( w) C ξi 2 i= 1 + 1 if w x + b 1 + ξ i 1 if w x + b 1 + ξ i C is an important hyperparameter, whose value is usually optimized by cross-validation. Jeff Howbert Introduction to Machine Learning Winter 2012 19

Slack variables for nonseparable data Jeff Howbert Introduction to Machine Learning Winter 2012 20

What if decision boundary is not linear? Jeff Howbert Introduction to Machine Learning Winter 2012 21

Solution: nonlinear transform of attributes Φ :[ x 1, x2] [ x1,( x1 + x 2 ) 4 ] Jeff Howbert Introduction to Machine Learning Winter 2012 22

Solution: nonlinear transform of attributes 2 Φ :[ x1, x2] [( x1 x1 ),( x2 x2)] 2 Jeff Howbert Introduction to Machine Learning Winter 2012 23

Issues with finding useful nonlinear transforms Not feasible to do manually as number of attributes grows (i.e. any real world problem) Usually involves transformation to higher dimensional space increases computational burden of SVM optimization curse of dimensionality With SVMs, can circumvent all the above via the kernel trick Jeff Howbert Introduction to Machine Learning Winter 2012 24

Kernel trick Support vector machines Don t need to specify the attribute transform Φ( x ) Only need to know how to calculate the dot product of any two transformed samples: k( x 1, x 2 ) = Φ( x 1 ) Φ( x 2 ) The kernel function k is substituted into the dual of the Lagrangian, allowing determination ti of a maximum margin hyperplane in the (implicitly) transformed space Φ( x ) All subsequent calculations, including predictions on test samples, are done using the kernel in place of Φ( x 1 ) Φ( x 2 ) Jeff Howbert Introduction to Machine Learning Winter 2012 25

Common kernel functions for SVM linear k x, x ) ( 1 2 = x1 x2 polynomial k( x1, x2) = ( γ x1 x2 + c) d ( ) 2 Gaussian or radial basis k( x, x ) = exp γ x x 1 2 1 2 sigmoid k ( x, x2) = tanh( γ x1 x2 + 1 c ) Jeff Howbert Introduction to Machine Learning Winter 2012 26

For some kernels (e.g. Gaussian) the implicit transform Φ( x ) is infinite-dimensional! But calculations with kernel are done in original space, so computational burden and curse of dimensionality aren t a problem. Jeff Howbert Introduction to Machine Learning Winter 2012 27

Jeff Howbert Introduction to Machine Learning Winter 2012 28

Applications of SVMs to machine learning Classification binary multiclass one-class Regression Transduction (semi-supervised learning) Ranking Clustering Structured labels Jeff Howbert Introduction to Machine Learning Winter 2012 29

Software SVM light http://svmlight.joachims.org/ org/ libsvm http://www.csie.ntu.edu.tw/~cjlin/libsvm/ includes MATLAB / Octave interface Jeff Howbert Introduction to Machine Learning Winter 2012 30