Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Similar documents
Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support vector machines Lecture 4

Machine Learning A Geometric Approach

L5 Support Vector Classification

Support Vector Machine (SVM) and Kernel Methods

Jeff Howbert Introduction to Machine Learning Winter

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Support Vector Machines

Support Vector Machines

Support Vector Machines

Nonlinearity & Preprocessing

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machine (continued)

Support Vector Machines and Kernel Methods

18.9 SUPPORT VECTOR MACHINES

Perceptron Revisited: Linear Separators. Support Vector Machines

Statistical Pattern Recognition

Machine Learning A Geometric Approach

Support Vector Machines

Lecture 10: A brief introduction to Support Vector Machine

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Lecture Support Vector Machine (SVM) Classifiers

Support Vector Machines.

Deep Learning for Computer Vision

Machine Learning. Support Vector Machines. Manfred Huber

Warm up: risk prediction with logistic regression

Support Vector Machines

CSC 411 Lecture 17: Support Vector Machine

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

CS 590D Lecture Notes

Linear, Binary SVM Classifiers

SUPPORT VECTOR MACHINE

The Perceptron Algorithm

Linear classifiers Lecture 3

Support Vector Machines

CS145: INTRODUCTION TO DATA MINING

Announcements - Homework

ML (cont.): SUPPORT VECTOR MACHINES

Support Vector Machines, Kernel SVM

Support Vector Machine & Its Applications

Kernelized Perceptron Support Vector Machines

Kernels and the Kernel Trick. Machine Learning Fall 2017

Statistical Methods for SVM

Introduction to Machine Learning

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support vector machines

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Support Vector Machine

Learning with kernels and SVM

Introduction to Machine Learning

Kernel Methods and Support Vector Machines

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

(Kernels +) Support Vector Machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Support Vector Machines

LECTURE 7 Support vector machines

COMP 562: Introduction to Machine Learning

CS 188: Artificial Intelligence Spring Announcements

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification and SVM. Dr. Xin Zhang

Support Vector Machine II

More about the Perceptron

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Statistical Methods for Data Mining

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Support Vector Machines Explained

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

Support Vector Machines: Maximum Margin Classifiers

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Pattern Recognition 2018 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

Introduction to Machine Learning

Support Vector Machines

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Support Vector Machines

SVMs: nonlinearity through kernels

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Neural networks and support vector machines

Binary Classification / Perceptron

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Introduction to Support Vector Machines

Modelli Lineari (Generalizzati) e SVM

Machine Learning Practice Page 2 of 2 10/28/13

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Kernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

CIS 520: Machine Learning Oct 09, Kernel Methods

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Transcription:

Multi-class SVMs Lecture 17: Aykut Erdem April 2016 Hacettepe University

Administrative We will have a make-up lecture on Saturday April 23, 2016. Project progress reports are due April 21, 2016 2 days left! See http://web.cs.hacettepe.edu.tr/~aykut/ classes/spring2016/bbm406/project.html 2

Recap: Support Vector Machines hw, xi + b apple 1 hw, xi + b 1 linear function f(x) =hw, xi + b 3

Recap: Support Vector Machines hw, xi + b = 1 hw, xi + b =1 w optimization problem maximize w,b 1 kwk subject to y i [hx i,wi + b] 1 4

Recap: Support Vector Machines hw, xi + b = 1 hw, xi + b =1 w optimization problem minimize w,b 1 2 kwk2 subject to y i [hx i,wi + b] 1

Recap: Support Vector Machines minimize w,b 1 2 kwk2 subject to y i [hx i,wi + b] 1 w = X i y i i x i w maximize 1 2 X i,j i j y i y j hx i,x j i + X i subject to X i y i = 0 and i 0 i

Recap: Large Margin Classifier hw, xi + b = 1 hw, xi + b =1 w support i > 0=) vectors 7

Recap: Soft-margin Classifier hw, xi + b apple 1 hw, xi + b 1 minimum error separator Theorem (Minsky & Papert) is impossible Finding the minimum error separating hyperplane is NP hard

Recap: Adding Slack Variables i 0 hw, xi + b apple 1+ hw, xi + b 1 Convex optimization problem minimize amount of slack

Recap: Adding Slack Variables for 0 < apple 1 point is between the margin and correctly classified for i 0 point is misclassified hw, xi + b apple 1+ hw, xi + b 1 adopted from Andrew Zisserman Convex optimization problem minimize amount of slack

Adding Slack Variables Hard margin problem minimize w,b 1 2 kwk2 subject to y i [hw, x i i + b] 1 With slack variables minimize w,b 1 2 kwk2 + C X i i subject to y i [hw, x i i + b] 1 i and i 0 Problem is always feasible. Proof: w = 0 and b = 0 and i =1 (also yields upper bound)

Soft-margin classifier Optimisation problem: minimize w,b 1 2 kwk2 + C X i i subject to y i [hw, x i i + b] 1 i and i 0 C is a regularization parameter: small C allows constraints to be easily ignored large margin adopted from Andrew Zisserman large C makes constraints hard to ignore narrow margin C = enforces all constraints: hard margin

Demo time 13

This week Multi-class classification Introduction to kernels 14

Multi-class classification slide by Eric Xing 15

Multi-class classification Real-world problems often have multiple classes: text, speech, image, biological sequences. Algorithms studied so far: designed for binary classification problems. How do we design multi-class classification algorithms? - can the algorithms used for binary classification be generalized to multi-class classification? - can we reduce multi-class classification to binary slide by Eric Xing classification? 16

Multi-class classification slide by Eric Xing 17

Multi-class classification slide by Eric Xing 18

One versus all classification w + w - Learn&3&classifiers:& &.&vs.&{o,+},&weights&w.& +&vs.&{o,.},&weights&w +& o&vs.&{+,.},&weights&w o& w o Predict&label&using:& Any&problems?& slide by Eric Xing Could&we&learn&this&dataset?& 19

Multi-class SVM Simultaneously-learn-3-sets-- w + of-weights:-- w - How-do-we-guarantee-the-- correct-labels?-- Need-new-constraints!-- w o The- score -of-the-correct-- class-must-be-be?er-than- the- score -of-wrong-classes:-- slide by Eric Xing 20

Multi-class SVM As#for#the#SVM,#we#introduce#slack#variables#and#maximize#margin:## To predict, we use: Now#can#we#learn#it?### slide by Eric Xing 21

Kernels 22

Solving XOR (x 1,x 2 ) (x 1,x 2,x 1 x 2 ) XOR not linearly separable Mapping into 3 dimensions makes it easily solvable 23

Quadratic Features Quadratic Features in R 2 (x) := x 2 1, p 2x 1 x 2,x 2 2 Dot Product Dot Product Insight h (x), (x 0 )i = D x 2 1, p 2x 1 x 2,x 2 2 = hx, x 0 i 2., x 0 12, p E 2x 0 1x 0 2,x 0 2 2 Insight Trick works for any polynomials of order d via hx, x 0 i d. Trick works for any polynomials of order 24

Linear Separation with Quadratic Kernels 25

Computational Efficiency Problem Extracting features can sometimes be very costly. Example: second order features in 1000 dimensions. This leads to 5005 10 5 numbers. For higher order polynomial features much worse. Solution Solu%on Don t compute the features, try to compute dot products implicitly. For some features this works... Definition Defini%on A kernel function k : X X! R is a symmetric function in its arguments for which the following property holds k(x, x 0 )=h (x), (x 0 )i for some feature map. If k(x, x 0 ) is much cheaper to compute than (x)... 26

Recap: The Perceptron initialize w = 0 and b =0 repeat if y i [hw, x i i + b] apple 0 then w w + y i x i and b b + y i end if until all classified correctly Nothing happens if classified correctly Weight vector is linear combination Classifier is linear combination of inner products f(x) = X y i hx i,xi + b i2i w = X i2i y i x i 27

Recap: The Perceptron on features nction initialize w, b =0 repeat { } {± Pick (x i,y i ) from data if y i (w (x i )+b) apple 0 then w 0 = w + y i (x i ) b 0 = b + y i until y i (w (x i )+b) > 0 for all i d X Nothing happens if classified correctly Weight vector is linear combination Classifier is linear combination of inner products f(x) = X i2i y i h (x i ), (x)i + b w = X i2i y i (x i ) 28

The Kernel Perceptron { } {± } initialize f =0 repeat Pick (x i,y i ) from data if y i f(x i ) apple 0 then f( ) f( )+y i k(x i, )+y i until y i f(x i ) > 0 for all i d Nothing happens if classified correctly Weight vector is linear combination w = X i2i y i (x i ) Classifier is linear combination of inner products f(x) = X i2i y i h (x i ), (x)i + b = X i2i y i k(x i,x)+b 29