Linear classifiers Lecture 3

Similar documents
Support vector machines Lecture 4

Support Vector Machines

CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012

Announcements - Homework

Kernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin

Kernelized Perceptron Support Vector Machines

Support Vector Machines. Machine Learning Fall 2017

PAC-learning, VC Dimension and Margin-based Bounds

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Statistical Pattern Recognition

Support Vector Machine (SVM) and Kernel Methods

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

SVMs, Duality and the Kernel Trick

COMP 875 Announcements

Binary Classification / Perceptron

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

The Perceptron Algorithm, Margins

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Applied Machine Learning Annalisa Marsico

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

PAC-learning, VC Dimension and Margin-based Bounds

Support Vector Machine (SVM) and Kernel Methods

ML (cont.): SUPPORT VECTOR MACHINES

CSC 411 Lecture 17: Support Vector Machine

Pattern Recognition 2018 Support Vector Machines

Lecture Support Vector Machine (SVM) Classifiers

CS 6375 Machine Learning

Deep Learning for Computer Vision

Statistical and Computational Learning Theory

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machine (continued)

L5 Support Vector Classification

1 Boosting. COS 511: Foundations of Machine Learning. Rob Schapire Lecture # 10 Scribe: John H. White, IV 3/6/2003

Support Vector Machines

Support Vector Machines

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

6.867 Machine learning

Support Vector Machine & Its Applications

Perceptron Revisited: Linear Separators. Support Vector Machines

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes

LECTURE NOTE #8 PROF. ALAN YUILLE. Can we find a linear classifier that separates the position and negative examples?

Introduction to Support Vector Machines

MIRA, SVM, k-nn. Lirong Xia

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Jeff Howbert Introduction to Machine Learning Winter

Introduction to SVM and RVM

Dan Roth 461C, 3401 Walnut

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Support Vector Machines for Classification and Regression

Linear Classification and SVM. Dr. Xin Zhang

Learning Theory Continued

18.9 SUPPORT VECTOR MACHINES

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Midterm Exam Solutions, Spring 2007

Classifier Complexity and Support Vector Classifiers

Linear & nonlinear classifiers

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

CMU-Q Lecture 24:

Support Vector Machines Explained

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

CS 188: Artificial Intelligence. Outline

Support Vector Machines and Kernel Methods

Machine Learning : Support Vector Machines

Support Vector Machine. Natural Language Processing Lab lizhonghua

EE 511 Online Learning Perceptron

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Decision Trees Lecture 12

Support Vector Machines, Kernel SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Logistic Regression Logistic

Bias-Variance Tradeoff

Linear & nonlinear classifiers

Day 5: Generative models, structured classification

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

SVM optimization and Kernel methods

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

Support Vector Machines and Bayes Regression

Part of the slides are adapted from Ziko Kolter

Soft-margin SVM can address linearly separable problems with outliers

Introduc)on to Bayesian methods (con)nued) - Lecture 16

Introduction to Support Vector Machines

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Learning theory Lecture 4

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machine

FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Transcription:

Linear classifiers Lecture 3 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin

ML Methodology Data: labeled instances, e.g. emails marked spam/ham Training set Held out set (sometimes call Validation set) Test set Randomly allocate to these three, e.g. 60/20/20 Features: attribute-value pairs which characterize each x Training Data Experimentation cycle Select a hypothesis f (Tune hyperparameters on held-out or validation set) Compute accuracy of test set Very important: never peek at the test set! Evaluation Accuracy: fraction of instances predicted correctly Held-Out Data Test Data

Linear Separators Which of these linear separators is optimal?

Support Vector Machine (SVM) SVMs (Vapnik, 1990 s) choose the linear separator with the largest margin Robust to outliers! Good according to intuition, theory, practice V. Vapnik SVM became famous when, using images as input, it gave accuracy comparable to neural-network with hand-designed features in a handwriting recognition task

Planes and Hyperplanes A plane can be specified as the set of all points given by: p = a + su + tv, (s, t) 2 R. Vector from origin to a point in the plane Two non-parallel directions in the plane u n Alternatively, it can be specified as: (p a) n =0, p n = a n p v a Normal vector (we will call this w) Only need to specify this dot product, a scalar (we will call this the offset) Barber, Section 29.1.1-4

Normal to a plane w: normal vector for the plane w.x + b = 0 -- unit vector parallel to w -- projection of x j onto the plane is the length of the vector, i.e.

Scale invariance w.x + b = 0 Any other ways of writing the same dividing line? w.x + b = 0 2w.x + 2b = 0 1000w.x + 1000b = 0.

Scale invariance w.x + b = +1 w.x + b = 0 w.x + b = -1 During learning, we set the scale by asking that, for all t, for y t = +1, and for y t = -1, That is, we want to satisfy all of the linear constraints

What is as a function of w? w.x + b = +1 w.x + b = 0 w.x + b = -1 - γ We also know that: x 1 x 2 So, Final result: can maximize margin by minimizing w 2!!!

Support vector machines (SVMs) w.x + b = +1 w.x + b = 0 w.x + b = -1 Example of a convex optimization problem A quadratic program Polynomial-time algorithms to solve! Hyperplane defined by support vectors margin 2γ Non-support Vectors: everything else moving them will not change w Could use them as a lower-dimension basis to write down line, although we haven t seen how yet More on these later Support Vectors: data points on the canonical lines

What if the data is not linearly separable? What about overfitting? Add More Features!!! (x) = x (1)... x (n) x (1) x (2) x (1) x (3)... e x(1)...

What if the data is not linearly separable? + C #(mistakes) First Idea: Jointly minimize w.w and number of training mistakes How to tradeoff two criteria? Pick C using validation data Tradeoff #(mistakes) and w.w 0/1 loss Not QP anymore Also doesn t distinguish near misses and really bad mistakes NP hard to find optimal solution!!!

Allowing for slack: Soft margin SVM w.x + b = +1 w.x + b = 0 w.x + b = -1 + C Σ j j - j j 0 slack variables For each data point: If margin 1, don t care If margin < 1, pay linear penalty Slack penalty C > 0: C= have to separate the data! C=0 ignores the data entirely! Select using validation data

Allowing for slack: Soft margin SVM w.x + b = +1 w.x + b = 0 w.x + b = -1 + C Σ j j - j j 0 slack variables What is the (optimal) value of j as a function of w and b? If then j = 0 If then j =