Announcements - Homework

Similar documents
Support Vector Machines

Support vector machines Lecture 4

SVMs, Duality and the Kernel Trick

CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

CSC 411 Lecture 17: Support Vector Machine

Support Vector Machine (continued)

Support Vector Machines for Classification and Regression

Support Vector Machine (SVM) and Kernel Methods

Linear classifiers Lecture 3

Support Vector Machines, Kernel SVM

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Pattern Recognition 2018 Support Vector Machines

CS145: INTRODUCTION TO DATA MINING

Machine Learning And Applications: Supervised Learning-SVM

Support Vector Machine (SVM) and Kernel Methods

Jeff Howbert Introduction to Machine Learning Winter

Convex Optimization and Support Vector Machine

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Kernel Methods and Support Vector Machines

COMP 875 Announcements

Support Vector Machine

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Max Margin-Classifier

Support Vector Machines: Maximum Margin Classifiers

Lecture 10: A brief introduction to Support Vector Machine

(Kernels +) Support Vector Machines

Dan Roth 461C, 3401 Walnut

Machine Learning. Support Vector Machines. Manfred Huber

Support Vector Machines

10701 Recitation 5 Duality and SVM. Ahmed Hefny

Lecture Support Vector Machine (SVM) Classifiers

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machines. Machine Learning Fall 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Support vector machines

Support Vector Machines

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

L5 Support Vector Classification

Kernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin

Statistical Pattern Recognition

Support Vector Machines

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Support Vector Machines and Speaker Verification

Linear & nonlinear classifiers

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Support Vector Machines

LECTURE 7 Support vector machines

Warm up: risk prediction with logistic regression

Polyhedral Computation. Linear Classifiers & the SVM

Support Vector Machines and Kernel Methods

Support Vector Machines

Applied Machine Learning Annalisa Marsico

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

Kernelized Perceptron Support Vector Machines

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes

ICS-E4030 Kernel Methods in Machine Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Linear & nonlinear classifiers

Homework 3. Convex Optimization /36-725

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

SUPPORT VECTOR MACHINE

Support Vector Machines

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines for Classification: A Statistical Portrait

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

10-701/ Machine Learning - Midterm Exam, Fall 2010

PAC-learning, VC Dimension and Margin-based Bounds

1 Boosting. COS 511: Foundations of Machine Learning. Rob Schapire Lecture # 10 Scribe: John H. White, IV 3/6/2003

Support Vector Machines Explained

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Homework 2 Solutions Kernel SVM and Perceptron

Midterm Exam Solutions, Spring 2007

Review: Support vector machines. Machine learning techniques and image analysis

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Support Vector Machine

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

The Perceptron Algorithm, Margins

Support Vector Machines

Statistical Machine Learning from Data

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Support Vector Machines

Lecture Notes on Support Vector Machine

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Perceptron Revisited: Linear Separators. Support Vector Machines

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Lecture 10: Support Vector Machine and Large Margin Classifier

Introduction to Support Vector Machines

Non-linear Support Vector Machines

Introduction to Logistic Regression and Support Vector Machine

Transcription:

Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review

HW1 score distribution 40 HW1 total score 35 30 25 20 15 10 5 0 0~10 10~20 20~30 30~40 40~50 50~60 60~70 70~80 80~90 90~100 100~110 2

Announcements - Midterm When: Wednesday, 10/20 Where: In Class What: You, your pencil, your textbook, your notes, course slides, your calculator, your good mood :) What NOT: No computers, iphones, or anything else that has an internet connection. Material: Everything from the beginning of the semester, until, and including SVMs and the Kernel trick 3

Recitation Tomorrow! Boosting, SVM (convex optimization), Midterm review! Strongly recommended!! Place: NSH 3305 (Note: change from last time) Time: 5-6 pm Rob

Support Vector Machines Aarti Singh Machine Learning 10-701/15-781 Oct 13, 2010

At Pittsburgh G-20 summit 6

Linear classifiers which line is better? 7

Pick the one with the largest margin! 8

Parameterizing the decision boundary w.x = j w (j) x (j) w.x + b > 0 w.x + b < 0 Example i (= 1,2,,n): Data: 9

Parameterizing the decision boundary w.x + b > 0 w.x + b < 0 10

Maximizing the margin w.x + b > 0 w.x + b < 0 Distance of closest examples from the line/hyperplane margin = g = 2a/ǁwǁ g g 11

Maximizing the margin w.x + b > 0 w.x + b < 0 Distance of closest examples from the line/hyperplane margin = g = 2a/ǁwǁ g g max g = 2a/ǁwǁ w,b s.t. (w.x j +b) y j a j Note: a is arbitrary (can normalize equations by a) 12

Support Vector Machines w.x + b > 0 w.x + b < 0 min w.w w,b s.t. (w.x j +b) y j 1 j g g Solve efficiently by quadratic programming (QP) Well-studied solution algorithms Linear hyperplane defined by support vectors 13

Support Vectors w.x + b > 0 w.x + b < 0 Linear hyperplane defined by support vectors Moving other points a little doesn t effect the decision boundary g g only need to store the support vectors to predict labels of new points How many support vectors in linearly separable case? m+1 14

What if data is not linearly separable? Use features of features of features of features. x 12, x 22, x 1 x 2,., exp(x 1 ) But run risk of overfitting! 15

What if data is still not linearly separable? Allow error in classification min w.w + C #mistakes w,b s.t. (w.x j +b) y j 1 j Maximize margin and minimize # mistakes on training data C - tradeoff parameter Not QP 0/1 loss (doesn t distinguish between near miss and bad mistake) 16

What if data is still not linearly Allow error in classification separable? min w.w + C Σξ j w,b s.t. (w.x j +b) y j 1-ξ j j ξ j 0 j j Soft margin approach ξ j - slack variables = (>1 if x j misclassifed) pay linear penalty if mistake C - tradeoff parameter (chosen by cross-validation) Still QP 17

Slack variables Hinge loss Complexity penalization min w.w + C Σξ j w,b s.t. (w.x j +b) y j 1-ξ j j ξ j 0 j j Hinge loss 0-1 loss -1 0 1 18

SVM vs. Logistic Regression SVM : Hinge loss Logistic Regression : Log loss ( -ve log conditional likelihood) Log loss Hinge loss 0-1 loss -1 0 1 19

What about multiple classes? 20

One against all Learn 3 classifiers separately: Class k vs. rest (w k, b k ) k=1,2,3 y = arg max w k.x + b k k But w k s may not be based on the same scale. Note: (aw).x + (ab) is also a solution 21

Learn 1 classifier: Multi-class SVM Simultaneously learn 3 sets of weights Margin - gap between correct class and nearest other class y = arg max w (k).x + b (k) 22

Learn 1 classifier: Multi-class SVM Simultaneously learn 3 sets of weights y = arg max w (k).x + b (k) Joint optimization: w k s have the same scale. 23

What you need to know Maximizing margin Derivation of SVM formulation Slack variables and hinge loss Relationship between SVMs and logistic regression 0/1 loss Hinge loss Log loss Tackling multiple class One against All Multiclass SVMs 24

SVMs reminder Regularization Hinge loss min w.w + C Σξ j w,b s.t. (w.x j +b) y j 1-ξ j ξ j 0 j j Soft margin approach 25

Today s Lecture Learn one of the most interesting and exciting recent advancements in machine learning The kernel trick High dimensional feature spaces at no extra cost! But first, a detour Constrained optimization! 26

Constrained Optimization 27

Lagrange Multiplier Dual Variables Moving the constraint to objective function Lagrangian: Solve: Constraint is tight when a > 0 28

Duality Primal problem: Dual problem: Weak duality For all feasible points Strong duality (holds under KKT conditions) 29

Lagrange Multiplier Dual Variables b -ve b +ve Solving: When a > 0, constraint is tight 30