Day 4: Classification, support vector machines
|
|
- Aron Hawkins
- 5 years ago
- Views:
Transcription
1 Day 4: Classification, support vector machines Introduction to Machine Learning Summer School June 18, June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 21 June 2018
2 Topics so far Supervised learning, linear regression Linear regression o Overfitting, bias variance trade-off o Ridge and lasso regression, gradient descent Yesterday o Classification, logistic regression o Regularization for logistic regression o Multi-class classification Today o Maximum margin classifiers o Kernel trick 1
3 Classification Supervised learning: estimate a mapping f from input x X to output y Y o Regression Y = R or other continuous variables o Classification Y takes discrete set of values Examples: q Y = {spam, nospam}, q digits (not values) Y = {0,1,2,, 9} Many successful applications of ML in vision, speech, NLP, healthcare 2
4 Parametric classifiers H = x. x +? : R A,? R yb x = sign(f. x + F? ) F. x + F? = 0 (linear) decision boundary or separating hyperplane separates R A into to halfspaces (regions) F. x + F? > 0 gets label 1 and F. x + F? < 0 gets label -1 more generally, yb x = sign fj x à decision boundary is fj x = 0! # ç! " 3
5 Surrogate Losses The correct loss to use is 0-1 loss after thresholding l?l f x, y = 1 sign f x y = 1 sign f x y < 0 l(f x, y) 0 f(x)y 4
6 Surrogate Losses The correct loss to use is 0-1 loss after thresholding l?l f x, y = 1 sign f x y = 1 sign f x y < 0 Linear regression uses l OP f x, y = f x y R l(f x, y) 0 yby f(x)y 5
7 Hard to optimize over l?l, find another loss l(f(x), y) o Convex (for any fixed y) à easier to minimize o An upper bound of l?l à small l small l?l Satisfied by squared loss àbut has large loss even hen l?l f(x), y = 0 To more surrogate losses in in this course o Logistic loss l TUV f(x), y = log 1 + exp f(x)y o Hinge loss l Z[\]^ f(x), y = max(0,1 f(x)y) Surrogate Losses l(f x, y) 0 f(x)y 6
8 Logistic regression: ERM on surrogate loss Logistic loss l f x, y = log 1 + exp f(x)y l(f x, y) S = x i, y [ : i = 1,2,, N, X = R A, Y = { 1,1} Linear model f x = f x =. x +? Minimize training loss F, F? = argmin,d e f log 1 + exp. x i +? y [ [ Output classifier yb x = sign(. x +? ) 0 f(x)y 7
9 Logistic Regression F, F? = argmin,d e f log 1 + exp. x [ +? y [ [ Convex optimization problem Can solve using gradient descent Can also add usual regularization: l R, l L 8
10 Linear decision boundaries yb x = sign(. x +? ) {x:. x +? = 0} is a hyperplane in R A o decision boundary o is direction of normal o? is the offset {x:. x +? = 0} divides R A into to halfspaces (regions) o x:. x +? 0 get label +1 and {x:. x +? < 0} get label -1 x R x L 9
11 Linear decision boundaries yb x = sign(. x +? ) {x:. x +? = 0} is a hyperplane in R A o decision boundary o is direction of normal o? is the offset {x:. x +? = 0} divides R A into to halfspaces (regions) o x:. x +? 0 get label +1 and {x:. x +? < 0} get label -1 Maps x to a 1D coordinate x R x j =. x +? x x x L 10
12 Linear separators in 2D Slide credit: Nati Srebro 11
13 Linear separators in 2D Slide credit: Nati Srebro 12
14 Linear separators in 2D Slide credit: Nati Srebro 13
15 Margin of a classifier Margin: distance of the closest instance point to the linear hyperplane Large margins are more stable o small perturbations of the data do not change the # % prediction! x 1 x 2 &! # $ 10 14
16 Maximum margin classifier S = x i, y [ : i = 1,2,, N binary classes Y = { 1,1} # %! Assume data is linearly separable o,? such that for all i = 1,2,, N y [ = sign. x i +? y [ (. x i +? ) > 0 Maximum margin separator given by &! # $ 10 F, F? = argmax R m,d e min [ y [. x i +? smallest margin margin of sample i 15
17 Maximum margin classifier F, F? = argmax R m,d e R min [ y [. x i +? Claim 1: If F, F? is a solution, then for any γ > 0, γf, γf? is also a solution Option 1: We can fix F, F? = argmax ql,d e = 1 to get min [ y [. x i +? 16
18 Maximum margin classifier F, F? = argmax R m,d e R min [ y [. x i +? Claim 1: If F, F? is a solution, then for any γ > 0, γf, γf? is also a solution Option 1: e can fix = 1 to get F, F? = argmax ql min [ y [. x i +? Option 2: e can also fix min [ y [. x i +? = 1 o margin no is L o Instead of increasing margin e can reduce norm 17
19 Max-margin classifier equivalent formulation Solve: r, r? = argmin s.t. i, y [ (. x i +? ) 1 R Hard margin Support Vector Machine (SVM) Claim 2: Equivalent to previous slide is solution for à r r Proof:, dr e r F, F? = max min ql [ y [ (. x i +? ) 1. Let min y [ F. x i + F? = γb, then min y [ F [ [ tf. x i + df e tf 1 2. r F tf = L tf 3. min y [ r. x [ + dr e [ r r x r.x = min i ydr e [ r L r γb 18
20 Maximum margin classifier formulations Original formulation F, F? = argmax R m,d e R min [ y [. x i +? # % &!! Fixing = 1 F, F? = argmax,d e Fixing min min [ y [. x i +? s. t. = 1 y [. x i +? = 1 [ R s.t. i, y [ (. x i +? ) 1 r, r? = argmin # $ 10 Henceforth,? ill be absorbed into by adding an additonal feature of 1 to x 19
21 Margin and norm x.x i margin = min [ Remember in regression: small norm solutions have lo complexity! o Is this true for maximum margin classifiers? o hat about classification ith logistic loss [ log(1 + exp( y [. x i ))? o ho to do capacity control in maximum margin classifier learning? Some places min y [. x i referred as margin à [ implicitly assumes normalization o min [ y [. x i is meaningless ithout knoing hat is! 20
22 Solutions of hard margin SVM F = argmin R s. t., y [. x i 1 i Theorem: F = span x i : i = 1,2,, N i.e., {βj [ : i = 1,2,, N} such that F = [ βj [ x i Denote S = span x i : i = 1,2,, N and S = {z R A : i, z. x i = 0} o For any z R A, z = z S + z S s.t. z S S and z S S o z R = z S R + z S Three step proof: 1. Decompose F = F S + F S. 2. min y [ F. x i 1 min y [ F S. x i 1 [ [ (because F S. x i = 0 i) 3. if F S 0, then F S < F R 21
23 Solutions of hard margin SVM F = argmin R s. t., y [. x i 1 i Theorem: F = span x i : i = 1,2,, N i.e., {βj [ : i = 1,2,, N} such that F = [ βj [ x i Denote S = span x i : i = 1,2,, N and S = {z R A : i, z. x i = 0} o For any z R A, z = z S + z S s.t. z S S and z S S o z R = z S R + z S Three step proof: 1. Decompose F = F S + F S. 2. min y [ F. x i 1 min y [ F S. x i 1 [ [ (because F S. x i = 0 i) 3. if F S 0, then F S < F R 22
24 Solutions of hard margin SVM F = argmin R s. t., y [. x i 1 i Theorem: F = span x i : i = 1,2,, N i.e., {βj [ : i = 1,2,, N} such that F = [ βj [ x i Denote S = span x i : i = 1,2,, N and S = {z R A : i, z. x i = 0} o For any z R A, z = z S + z S s.t. z S S and z S S o z R = z S R + z S Three step proof: 1. Decompose F = F S + F S. 2. min y [ F. x i 1 min y [ F S. x i 1 [ [ (because F S. x i = 0 i) 3. if F S 0, then F S < F R 23
25 Representer Theorem F = argmin R s. t., y [. x i 1 i Theorem: F = span x i : i = 1,2,, N i.e., {βj [ : i = 1,2,, N} such that F = [ βj [ x i o Special case of representor theorem Theorem (ext): additionally, {βj [ } also stisfies βj [ = 0 for all i such that y [. x i > 1 Proof?: (animation next slide) 24
26 F Illustration credit: Nati Srebro 25
27 F = argmin Representer Theorem R s. t., y [. x i 1 i Theorem: {βj [ : i = 1,2,, N} such that F = [ βj [ x i {βj [ } also satisfies βj [ = 0 for all i such that y [ F. x i > 1 SV(F) = {i: y [ F. x i o called support vectors o hence support vector machine = 1} datapoints closest to F F = f βj [ x i [ Pˆ(dF) 26
28 F = argmin Representer Theorem R s. t., y [. x i 1 i Theorem: {βj [ : i = 1,2,, N} such that F = [ βj [ x i {βj [ } also satisfies βj [ = 0 for all i such that y [ F. x i > 1 SV(F) = {i: y [ F. x i o called support vectors o hence support vector machine = 1} datapoints closest to F F = f βj [ x i [ Pˆ(dF) Ho do e get F? 27
29 Optimizing the SVM problem F = argmin R s. t., y [. x i 1 i 1. Can do sub-gradient descent (next class) 2. Special case of quadratic program 1 min z 2 z Pz + q z s. t. Gz h, Az = b o Change of variables F = βj [ x i o Change of variables F = [ Pˆ(dF)? βj [ x i [ql! 28
30 Optimizing the SVM problem F = argmin Change of variables = R s. t., y [. x i 1 i β [ x i [ql! \ min f f β [β x i. x j { x } [ql ql N s. t. f β y [ x i. x j jq1 1 i = min β R N β Gβ s. t. y [ Gβ [ 1 i G R ith G [ = x i. x j is called the gram matrix Convex program: quadratic programming 29
31 The Kernel min R s. t. y [. x i 1 i d min β R N β Gβ s. t. y [ Gβ [ 1 i Optimization problem depends on x i values of G [ = x i. x j for i, j [N]. What about prediction? Function K x, x j F. x = f β [ [ x i. x = x. x is called the Kernel only through the Learning non-linear classifiers using feature transformations, i.e., f x =. φ(x) for some φ(x) o only thing e need to kno is K Ÿ x, x j = K(φ x, φ(x )) 30
32 The Kernel min R s. t. y [. x i 1 i d min β R N β Gβ s. t. y [ Gβ [ 1 i Optimization problem depends on x i values of G [ = x i. x j for i, j [N]. What about prediction? Function K x, x j F. x = f β [ [ x i. x = x. x is called the Kernel only through the Learning non-linear classifiers using feature transformations, i.e., f x =. φ(x) for some φ(x) o only thing e need to kno is K Ÿ x, x j = K(φ x, φ(x )) 31
33 The Kernel min R s. t. y [. x i 1 i d min β R N β Gβ s. t. y [ Gβ [ 1 i Optimization problem depends on x i values of G [ = x i. x j for i, j [N]. What about prediction? Function K x, x j F. x = f β [ [ x i. x = x. x is called the Kernel only through the Learning non-linear classifiers using feature transformations, i.e., f x =. φ(x) for some φ(x) o only thing e need to kno is K Ÿ x, x j = K(φ x, φ(x )) 32
34 Kernels As Prior Knoledge If e think that positive examples can (almost) be separated by some ellipse: then e should use polynomials of degree 2 A Kernel encodes a measure of similarity beteen objects. A bit like NN, except that it must be a valid inner product function. Slide credit: Nati Srebro 33
Day 3: Classification, logistic regression
Day 3: Classification, logistic regression Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 20 June 2018 Topics so far Supervised
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationBinary Classification / Perceptron
Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationDay 5: Generative models, structured classification
Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression
More informationSupport Vector Machine I
Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationIntroduction to Machine Learning Summer School June 18, June 29, 2018, Chicago
Day 10: Review Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 29 June 2018 Review 1 Supervised learning key questions Data
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More informationDecision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014
Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationMachine Learning: Logistic Regression. Lecture 04
Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input
More informationML4NLP Multiclass Classification
ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationINTRODUCTION TO DATA SCIENCE
INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationSVMs, Duality and the Kernel Trick
SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationLecture 10: A brief introduction to Support Vector Machine
Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationPolyhedral Computation. Linear Classifiers & the SVM
Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life
More informationThe Representor Theorem, Kernels, and Hilbert Spaces
The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...
More informationAbout this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes
About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the
More informationClassification objectives COMS 4771
Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification
More informationPattern Recognition 2018 Support Vector Machines
Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationLecture 7: Kernels for Classification and Regression
Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationTopics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families
Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationPreliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1
90 8 80 7 70 6 60 0 8/7/ Preliminaries Preliminaries Linear models and the perceptron algorithm Chapters, T x + b < 0 T x + b > 0 Definition: The Euclidean dot product beteen to vectors is the expression
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationCSE446: non-parametric methods Spring 2017
CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationSupport Vector Machines
Support Vector Machines Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: http://www.cs.cmu.edu/~awm/tutorials/ Where Should We Draw the Line????
More informationKernels and the Kernel Trick. Machine Learning Fall 2017
Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationSupport Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationMLCC 2017 Regularization Networks I: Linear Models
MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationSupport Vector Machine II
Support Vector Machine II Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 due tonight HW 2 released. Online Scalable Learning Adaptive to Unknown Dynamics and Graphs Yanning
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationOnline Advertising is Big Business
Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More information