Support Vector Machine. Natural Language Processing Lab lizhonghua

Similar documents
Support Vector Machines. Machine Learning Fall 2017

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Introduction to Support Vector Machines

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machines

Support Vector Machine. Industrial AI Lab.

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machine (SVM) and Kernel Methods

Introduction to Support Vector Machines

Computational Learning Theory

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Brief Introduction to Machine Learning

Non-linear Support Vector Machines

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Support Vector Machine

Support Vector Machine (continued)

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

L5 Support Vector Classification

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine for Classification and Regression

Kernel Methods and Support Vector Machines

LECTURE NOTE #8 PROF. ALAN YUILLE. Can we find a linear classifier that separates the position and negative examples?

PAC-learning, VC Dimension and Margin-based Bounds

Support vector machines Lecture 4

Support Vector Machines.

Machine Learning : Support Vector Machines

Linear classifiers Lecture 3

Support Vector Machine & Its Applications

Introduction to SVM and RVM

Jeff Howbert Introduction to Machine Learning Winter

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Introduction to Logistic Regression and Support Vector Machine

FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Linear & nonlinear classifiers

Machine Learning Lecture 7

Support Vector Machines

Support Vector Machines

Statistical Pattern Recognition

Linear Classification and SVM. Dr. Xin Zhang

Support Vector Machines, Kernel SVM

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

SUPPORT VECTOR MACHINE

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Neural networks and support vector machines

Cheng Soon Ong & Christian Walder. Canberra February June 2018

An introduction to Support Vector Machines

SVMs: nonlinearity through kernels

Support Vector Machine (SVM) and Kernel Methods

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition

Support Vector Machines and Kernel Methods

18.9 SUPPORT VECTOR MACHINES

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

Machine Learning

Support Vector Machines

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Formulation with slack variables

Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008)

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Pattern Recognition 2018 Support Vector Machines

Support Vector Machines for Classification: A Statistical Portrait

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Support Vector Machines

A Tutorial on Support Vector Machine

Linear & nonlinear classifiers

Support vector machines

Support Vector Machines for Classification and Regression

CSC 411 Lecture 17: Support Vector Machine

Support Vector Machines

Applied Machine Learning Annalisa Marsico

Incorporating detractors into SVM classification

Machine Learning

ML (cont.): SUPPORT VECTOR MACHINES

Statistical and Computational Learning Theory

Introduction to Support Vector Machines

PAC-learning, VC Dimension and Margin-based Bounds

Discriminative Models

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Announcements - Homework

Support Vector Machines. Maximizing the Margin

Introduction to Machine Learning

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

Discriminative Models

Perceptron Revisited: Linear Separators. Support Vector Machines

Support Vector Machines and Kernel Methods

Lecture Slides for INTRODUCTION TO. Machine Learning. By: Postedited by: R.

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

COMS 4771 Introduction to Machine Learning. Nakul Verma

Support Vector Machine

MIRA, SVM, k-nn. Lirong Xia

Lecture Support Vector Machine (SVM) Classifiers

Support Vector Machines with Example Dependent Costs

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Transcription:

Support Vector Machine Natural Language Processing Lab lizhonghua

Support Vector Machine Introduction Theory SVM primal and dual problem Parameter selection and practical issues Compare to other classifier Conclusion and discussion

Some Notation: Training data Introduction generated by sampling from an unknown underlying distribution P(x, y)

Introduction Which one is better?

Theory the right one is better, why? Large margin require small Small Small VC Dimension of Margin Hyperplanes ([1] Theorem 5.5 ) Small VC Dimension lower true error bound

Theory Large margin require small w The distance between the two parallel hyperplanes is 2/ w

Theory

Theory Small VC Dimension leads to lower true error Bound With a probability at least the above inequality established, L is the size of the training example,h is the VC dimension

Derive a VC Bound Given a fixed function f,for each example the loss is either 0 or 1 all examples are drawn independently,so are independently sampled from a random variable Chernoff Bound

Derive a VC Bound For all f in F Set =

Derive a VC Bound The cardinality of F the number of function form F that can be distinguished from their values on{x1,x2...x2m} it is the number of different outputs(y1,y2...y2m) that the functions in F can achieve on samples of a given size.

Derive a VC Bound

VC dimention The VC dimension is a property of a set of functions{f(a)}. If a given set of l points can be labeled in all 2^l ways,and for each labeling, a member of the set {f(a)} can be found which correctly assigns those labels, we say that that set of points is shattered by that set of functions. The VC dimension for the set of functions{f(a)} Is defined as the maximum number of training points that can be shattered by {f(a)}.

VC dimention

Derive a VC Bound The capacity term is a property of the function class of F, thus the bound can t be minimized over choice of f. We introduce structure on F and minimize the bound over the choice of the structure. Structural risk minimization!!

SVM primal and dual problem ulinear Support Vector Machines The Separable Case The Non-Separable Case u Nonlinear Support Vector Machines

The Separable Case

Dual Problem : The Separable Case Decision function :

The Non-Separable Case

The Non-Separable Case the standard approach is to allow the fat decision margin to make a few mistakes (some points - outliers or noisy examples - are inside or on the wrong side of the margin). We then pay a cost for each misclassified example, which depends on how far it is from meeting the margin requirement. To implement this, we introduce slack variables.

The Non-Separable Case Have an algorithm which can tolerate a certain fraction of outliers. Introduce slack variables Use relaxed constraints Object function

Nonlinear Support Vector Machines

Nonlinear Support Vector Machines

Nonlinear Support Vector Machines Some Common Kernels : What conditions should the K satisfy? that the K can be a kernel function

Nonlinear Support Vector Machines Kernel Matrix The element of the matrix is the innerproduct of the training examples,and we use the kernel function to get the innerproduct.

Nonlinear Support Vector Machines If the kernel matrix is positive semi-definite for finite examples,we say that the K satisfies the finitely positive semi-definite property, then K can be a kernel function.

Parameter selection Train the SVM,we can find W,b The SVM have hyperparameters: the soft margin constant C,width of Gaussian kernel, degree of polynomial kernel.

soft margin constant C

soft margin constant C when C is small, it is easy to account for some data points with the use of slack variables and to have a fat margin placed so it models the bulk of the data. as C becomes large, it is better to respect the data at the cost of reducing the geometric margin, and the complexity of the function class increases.

degree of polynomial kernel The lowest degree polynomial is linear kernel,it s not sufficient when a non-linear relationship between the two class. Degree-2 is enough. Degree-5 with greater curvature.

width of Gaussian kernel Small gamma leads to smooth boundary, big gamma leads to greater curvature of the decision boundary. Gamma=100 leads to overfitting the data.

A simple procedure Chih-Jen Lin Support Vector Machines at Machine Learning Summer School 2006

Compare to other classifiers Decision Tree trends to overfit Naïve Bayes Classifier feature independent, data sparse SVM kernel selection/design

Conclusion and Discussion Intuitive Has linear or non-linear decision boundaries Does not make unreasonable assumption about the data Does not overfit Does not have lots of parameters,easy to model and train

References [1] Bernhard Scholkopf,Alexander J. Smola. Learning with Kernels [2]CHRISTOPHER J.C. BURGES A tutorial on Support Vector Machine for Pattern Recognition. Data Mining and Knowledge Discovery,2, 121-167(1998) [3] Asa Ben-Hur, Jason Weston. A User s Guide to Support Vector Machines [4] Chih-Jen Lin Support Vector Machines at Machine Learning Summer School 2006

Thanks for your attention!