Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Similar documents
Jeff Howbert Introduction to Machine Learning Winter

Support Vector Machine (continued)

Statistical Pattern Recognition

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines

Review: Support vector machines. Machine learning techniques and image analysis

Support Vector Machines

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Machine Learning. Support Vector Machines. Manfred Huber

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machines Explained

Support Vector Machines

Support Vector Machine. Industrial AI Lab.

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Support vector machines Lecture 4

Lecture 10: A brief introduction to Support Vector Machine

Support Vector Machine & Its Applications

Support Vector Machine (SVM) and Kernel Methods

Perceptron Revisited: Linear Separators. Support Vector Machines

Announcements - Homework

Introduction to SVM and RVM

SVMs, Duality and the Kernel Trick

Learning with kernels and SVM

Support Vector Machines and Speaker Verification

Introduction to Support Vector Machines

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Lecture Notes on Support Vector Machine

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

Linear & nonlinear classifiers

Pattern Recognition 2018 Support Vector Machines

Kernel Methods and Support Vector Machines

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

L5 Support Vector Classification

Linear Classification and SVM. Dr. Xin Zhang

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Support Vector Machines

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support Vector Machines

Linear & nonlinear classifiers

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Support Vector Machines

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

CS145: INTRODUCTION TO DATA MINING

Classifier Complexity and Support Vector Classifiers

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Machine

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Support Vector Machines.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Support Vector Machines. Maximizing the Margin

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Modelli Lineari (Generalizzati) e SVM

COMP 875 Announcements

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

ML (cont.): SUPPORT VECTOR MACHINES

Support Vector Machine

Kernelized Perceptron Support Vector Machines

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

CS798: Selected topics in Machine Learning

Statistical Machine Learning from Data

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Support Vector Machines: Maximum Margin Classifiers

SUPPORT VECTOR MACHINE

The Perceptron Algorithm, Margins

Modeling Dependence of Daily Stock Prices and Making Predictions of Future Movements

Lecture 10: Support Vector Machine and Large Margin Classifier

Support Vector Machines

Linear classifiers Lecture 3

Machine Learning : Support Vector Machines

Support Vector Machine for Classification and Regression

Constrained Optimization and Support Vector Machines

SVM optimization and Kernel methods

Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Support Vector Machines and Kernel Methods

Neural networks and support vector machines

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Support Vector Machines for Classification and Regression

Non-linear Support Vector Machines

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Convex Optimization and Support Vector Machine

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Machine Learning And Applications: Supervised Learning-SVM

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

Chapter 6: Classification

(Kernels +) Support Vector Machines

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Support Vector Machines for Classification: A Statistical Portrait

Applied inductive learning - Lecture 7

Transcription:

Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data 02/03/2018 Introduction to Data Mining 2

Support Vector Machines B 1 One Possible Solution 02/03/2018 Introduction to Data Mining 3 Support Vector Machines B 2 Another possible solution 02/03/2018 Introduction to Data Mining 4

Support Vector Machines B 2 Other possible solutions 02/03/2018 Introduction to Data Mining 5 Support Vector Machines B 1 B 2 Which one is better? B1 or B2? How do you define better? 02/03/2018 Introduction to Data Mining 6

Support Vector Machines B 1 B 2 b 21 b 22 margin b 11 b 12 Find hyperplane maximizes the margin => B1 is better than B2 02/03/2018 Introduction to Data Mining 7 Support Vector Machines B 1 w! x! + b = 0 w! x! + b = 1 w! x! + b = + 1 b 11! 1 if w x + b 1 f ( x) = 1 if w x + b 1 02/03/2018 Introduction to Data Mining 8 b 12 2 Margin = w!

Linear SVM Linear model:! 1 if w x + b 1 f ( x) = 1 if w x + b 1 Learning the model is equivalent to determining the values of w! and b How to find w! and b from training data? 02/03/2018 Introduction to Data Mining 9 Learning Linear SVM 2 Objective is to maximize: Margin = w!!! w Which is equivalent to minimizing: L( w) = 2 Subject to the following constraints: 1 if w xi + b 1 y i = 1 if w xi + b 1 or yi ( w xi + b) 1, i = 1,2,..., N u This is a constrained optimization problem Solve it using Lagrange multiplier method 2 02/03/2018 Introduction to Data Mining 10

Example of Linear SVM Support vectors x1 x2 y λ 0.3858 0.4687 1 65.5261 0.4871 0.611-1 65.5261 0.9218 0.4103-1 0 0.7382 0.8936-1 0 0.1763 0.0579 1 0 0.4057 0.3529 1 0 0.9355 0.8132-1 0 0.2146 0.0099 1 0 02/03/2018 Introduction to Data Mining 11 Learning Linear SVM Decision boundary depends only on support vectors If you have data set with same support vectors, decision boundary will not change How to classify using SVM once w and b are found? Given a test record, x i! 1 f ( x i ) = 1 if w xi + b 1 if w x + b 1 i 02/03/2018 Introduction to Data Mining 12

Support Vector Machines What if the problem is not linearly separable? 02/03/2018 Introduction to Data Mining 13 Support Vector Machines What if the problem is not linearly separable? Introduce slack variables u Need to minimize:! 2 N w k L( w) = + C ξi 2 u Subject to: i= 1 y i 1 = 1 if w xi + b 1-ξi if w x + b 1+ ξ u If k is 1 or 2, this leads to same objective function as linear SVM but with different constraints (see textbook) i i 02/03/2018 Introduction to Data Mining 14

Support Vector Machines B 1 B 2 b 21 b 22 margin b 11 b 12 Find the hyperplane that optimizes both factors 02/03/2018 Introduction to Data Mining 15 Nonlinear Support Vector Machines What if decision boundary is not linear? 02/03/2018 Introduction to Data Mining 16

Nonlinear Support Vector Machines Trick: Transform data into higher dimensional space Decision boundary: w Φ( x) + b = 0 02/03/2018 Introduction to Data Mining 17 Learning Nonlinear SVM Optimization problem: Which leads to the same set of equations (but involve Φ(x) instead of x) 02/03/2018 Introduction to Data Mining 18

Learning NonLinear SVM Issues: What type of mapping function Φ should be used? How to do the computation in high dimensional space? u Most computations involve dot product Φ(x i ) Φ(x j ) u Curse of dimensionality? 02/03/2018 Introduction to Data Mining 19 Learning Nonlinear SVM Kernel Trick: Φ(x i ) Φ(x j ) = K(x i, x j ) K(x i, x j ) is a kernel function (expressed in terms of the coordinates in the original space) u Examples: 02/03/2018 Introduction to Data Mining 20

Example of Nonlinear SVM SVM with polynomial degree 2 kernel 02/03/2018 Introduction to Data Mining 21 Learning Nonlinear SVM Advantages of using kernel: Don t have to know the mapping function Φ Computing dot product Φ(x i ) Φ(x j ) in the original space avoids curse of dimensionality Not all functions can be kernels Must make sure there is a corresponding Φ in some high-dimensional space Mercer s theorem (see textbook) 02/03/2018 Introduction to Data Mining 22

Characteristics of SVM Since the learning problem is formulated as a convex optimization problem, efficient algorithms are available to find the global minima of the objective function (many of the other methods use greedy approaches and find locally optimal solutions) Overfitting is addressed by maximizing the margin of the decision boundary, but the user still needs to provide the type of kernel function and cost function Difficult to handle missing values Robust to noise High computational complexity for building the model 02/03/2018 Introduction to Data Mining 23