Geometrical intuition behind the dual problem

Similar documents
Support Vector Machines: Maximum Margin Classifiers

Kernel Methods and Support Vector Machines

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machine (continued)

Support Vector Machines. Maximizing the Margin

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines. Goals for the lecture

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Jeff Howbert Introduction to Machine Learning Winter

1 Bounding the Margin

Machine Learning. Support Vector Machines. Manfred Huber

Support Vector Machines and Kernel Methods

Lecture 9: Multi Kernel SVM

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

CSC 411 Lecture 17: Support Vector Machine

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Statistical Machine Learning from Data

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Soft-margin SVM can address linearly separable problems with outliers

Announcements - Homework

Lecture 10: A brief introduction to Support Vector Machine

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Polyhedral Computation. Linear Classifiers & the SVM

Probabilistic Machine Learning

ML (cont.): SUPPORT VECTOR MACHINES

Bayes Decision Rule and Naïve Bayes Classifier

Support Vector Machines for Classification and Regression

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Support Vector Machine (SVM) and Kernel Methods

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

UVA CS 4501: Machine Learning

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS Lecture 13. More Maximum Likelihood

Introduction to SVM and RVM

SVMs, Duality and the Kernel Trick

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Support Vector Machines

Introduction to Support Vector Machines

Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Convex Optimization and Support Vector Machine

Lecture Notes on Support Vector Machine

Introduction to Kernel methods

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Research Article Robust ε-support Vector Regression

PAC-Bayes Analysis Of Maximum Entropy Learning

Pattern Recognition and Machine Learning. Artificial Neural networks

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Kernel Methods and Support Vector Machines

Support Vector Machines, Kernel SVM

LECTURE 7 Support vector machines

CS , Fall 2011 Assignment 2 Solutions

Lecture 13 Eigenvalue Problems

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Support Vector Machine

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES

Perceptron Revisited: Linear Separators. Support Vector Machines

Support Vector Machine (SVM) and Kernel Methods

Support vector machines

1 Proof of learning bounds

Support Vector Machines

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Learning SVM Classifiers with Indefinite Kernels

Support Vector Machines

Machine Learning : Support Vector Machines

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Support Vector Machines

Support Vector Machines.

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Statistical Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Artificial Neural networks

CS798: Selected topics in Machine Learning

Lecture Support Vector Machine (SVM) Classifiers

SMO Algorithms for Support Vector Machines without Bias Term

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Pattern Recognition 2018 Support Vector Machines

Kernels and the Kernel Trick. Machine Learning Fall 2017

Support vector machines Lecture 4

Machine Learning And Applications: Supervised Learning-SVM

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Support Vector Machines for Regression

CS145: INTRODUCTION TO DATA MINING

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

Computational and Statistical Learning Theory

Support Vector Machines and Speaker Verification

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Incorporating detractors into SVM classification

Support Vector Machines

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Transcription:

Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1

Geoetrical intuition behind the dual proble x 2 Convex hulls for two sets of points A and B A B x 1 2

Geoetrical intuition behind the dual proble x 2 Convex hulls for two sets of points A and B defined by all possible points P A and P B A P A B P B P A = i A x i P B = i B x i x 1 i A 0, =1 i A i B 0, =1 i B

Geoetrical intuition behind the dual proble x 2 For the plane separating A and B we choose the bisecting line between specific P A and P B that have inial distance A P A P B 2 B PB P A Here, for P A, a single coeff. is non-zero And for P B, two coeffs. are non-zero P A = i A x i P B = i B x i x 1 i A 0, =1 i A i B 0, =1 4 i B

Geoetrical intuition behind the dual proble x 2 in P A P B 2 = in A x i j B j x j 2 B A = in A y i x i j B j y j x j 2 +1-1 PB P A = in, j j y i y j x i. x j P A = i A x i P B = i B x i x 1 i A 0, =1 i A i B 0, =1 5 i B

Going fro the Prial to the Dual Constrained Optiization Proble 1 in w 2 w 2 s.t.: y i w. x i b 1, label input i=1,, A convex optiization proble (objective and constraints) Unique solution if datapoints are linearly separable 6

Going fro the Prial to the Dual Constrained Optiization Proble Lagrange: L w,b, = 1 2 w 2 i=1 1 in w 2 w 2 s.t.: y i w. x i b 1, label input [ y i w. x i b 1] i=1,, 7

Going fro the Prial to the Dual Constrained Optiization Proble Lagrange: KKT conditions: 1 in w 2 w 2 s.t.: y i w. x i b 1, label input L w,b, = 1 2 w 2 [ y i w. x i b 1] w L w,b, = w i i=1 y i x i = 0 b L w,b, = i i=1 y i = 0 i=1 i=1,, 8

Going fro the Prial to the Dual Constrained Optiization Proble Lagrange: KKT conditions: 1 in w 2 w 2 s.t.: y i w. x i b 1, label input L w,b, = 1 2 w 2 [ y i w. x i b 1] w L w,b, = w i i=1 y i x i = 0 b L w,b, = i i=1 y i = 0 w = i=1 i y i x i i=1 i=1,, Plus KKT: i i=1 y i = 0 [ y i w. x i b 1]=0, i=1,..., 9

Going fro the Prial to the Dual Constrained Optiization Proble Lagrange: L w,b, = 1 in w 2 w 2 s.t.: y i w. x i b 1, label L w,b, = 1 2 w 2 i=1 1 2 i i=1 y i x i 2 i, j=1 i=1 i y i b i=1 i input [ y i w. x i b 1] j y i y j x i. x j w = i=1 i=1,, i y i x i i i=1 y i = 0 L w,b, = 1 2 i, j=1 j y i y j x i. x j i=1 i 10

Going fro the Prial to the Dual Constrained Optiization Proble 1 in w 2 w 2 s.t.: y i w. x i b 1, label input i=1,, Equivalent Dual Proble: ax s.t.: 0, i=1 { 1 2 i, j=1 y i =0 i=1,..., j y i y j x i. x j i=1 } 11

Solution to the Dual Proble The Dual Proble below adits the following solution: sign h x = sign i=1 i y i x i. x b b = y i j y j x j. x i, i=1,..., j=1 Equivalent Dual Proble: ax s.t.: 0, i=1 { 1 2 i, j=1 y i =0 i=1,..., j y i y j x i. x j i=1 } 12

What are Support Vector Machines? Linear classifiers (Mostly) binary classifiers Supervised training Good generalization with explicit bounds 13

Main Ideas Behind Support Vector Machines Maxial argin Dual space Linear classifiers in high-diensional space using non-linear apping Kernel trick 14

Quadratic Prograing 15

Using the Lagrangian 16

Dual Space 17

Strong Duality 18

Dual For 19

Non-linear separation of datasets Non-linear separation is ipossible in ost probles Illustration fro Prof. Mohri's lecture notes 20

Non-separable datasets Solutions: 1) Nonlinear classifiers 2) Increase diensionality of dataset and add a non-linear apping Ф 21

Kernel Trick siilarity easure between 2 data saples 22

Kernel Trick Illustrated 23

Curse of Diensionality Due to the Non-Linear Mapping 24

Positive Sei-Definite (P.S.D.) Kernels (Mercer Condition) 25

Advantages of SVM Work very well... Error bounds easy to obtain: Generalization error sall and predictable Fool-proof ethod: (Mostly) three kernels to choose fro: Gaussian Linear and Polynoial Sigoid Very sall nuber of paraeters to optiize 26

Liitations of SVM Size liitation: Size of kernel atrix is quadratic with the nuber of training vectors Speed liitations: 1) During training: very large quadratic prograing proble solved nuerically Solutions: Chunking Sequential Minial Optiization (SMO) breaks QP proble into any sall QP probles solved analytically Hardware ipleentations 2) During testing: nuber of support vectors Solution: Online SVM 27