Support Vector Machines. Machine Learning Fall 2017
|
|
- Kory Blair
- 5 years ago
- Views:
Transcription
1 Support Vector Machines Machine Learning Fall
2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2
3 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce linear classifiers 3
4 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce linear classifiers General learning principles Overfitting Mistakebound learning PAC learning, sample complexity Hypothesis choice & VC dimensions Training and generalization errors 4
5 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce linear classifiers General learning principles Overfitting Mistakebound learning PAC learning, sample complexity Hypothesis choice & VC dimensions Training and generalization errors Coming up (next few lectures): Learning theory! Training linear classifiers by minimizing loss The Risk Minimization Principle 5
6 Big picture Linear models 6
7 Big picture Linear models How good is a learning algorithm? 7
8 Big picture Linear models Perceptron, Winnow Online learning How good is a learning algorithm? 8
9 Big picture Linear models Perceptron, Winnow Online learning PAC, Agnostic learning How good is a learning algorithm? 9
10 Big picture Linear models Perceptron, Winnow Support Vector Machines Online learning PAC, Agnostic learning How good is a learning algorithm? 10
11 Big picture Linear models Perceptron, Winnow Support Vector Machines. Online learning PAC, Agnostic learning. How good is a learning algorithm? 11
12 This lecture: Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels 12
13 This lecture: Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels 13
14 VC dimensions and linear classifiers What we know so far 1. If we have m examples, then with probability 1 ±, the true error of a hypothesis h with training error err S (h) is bounded by Generalization error Training error A function of VC dimension. Low VC dimension gives tighter bound 14
15 VC dimensions and linear classifiers What we know so far 1. If we have m examples, then with probability 1 ±, the true error of a hypothesis h with training error err S (h) is bounded by Generalization error Training error A function of VC dimension. Low VC dimension gives tighter bound 2. VC dimension of a linear classifier in d dimensions = d 1 15
16 VC dimensions and linear classifiers What we know so far 1. If we have m examples, then with probability 1 ±, the true error of a hypothesis h with training error err S (h) is bounded by Generalization error Training error A function of VC dimension. Low VC dimension gives tighter bound 2. VC dimension of a linear classifier in d dimensions = d 1 But are all linear classifiers the same? 16
17 Recall: Margin The margin of a hyperplane for a dataset is the distance between the hyperplane and the data point nearest to it. 17
18 Recall: Margin The margin of a hyperplane for a dataset is the distance between the hyperplane and the data point nearest to it. Margin with respect to this hyperplane 18
19 Which line is a better choice? Why? h 2 h 1 19
20 Which line is a better choice? Why? h 2 h 1 A new example, not from the training set might be misclassified if the margin is smaller 20
21 Data dependent VC dimension Intuitively, larger margins are better Suppose we only consider linear separators with margins 1 and 2 H 1 = linear separators that have a margin 1 H 2 = linear separators that have a margin 2 And 1 > 2 The entire set of functions H 1 is better 21
22 Data dependent VC dimension Theorem (Vapnik): Let H be the set of linear classifiers that separate the training set by a margin at least Then R is the radius of the smallest sphere containing the data 22
23 Data dependent VC dimension Theorem (Vapnik): Let H be the set of linear classifiers that separate the training set by a margin at least Then R is the radius of the smallest sphere containing the data Larger margin ) Lower VC dimension Lower VC dimension ) Better generalization bound 23
24 Learning strategy Find the linear separator that maximizes the margin 24
25 This lecture: Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels 25
26 Recall: The geometry of a linear classifier b w 1 x 1 w 2 x 2 =0 Prediction = sgn(b w 1 x 1 w 2 x 2 ) 26
27 Recall: The geometry of a linear classifier b w 1 x 1 w 2 x 2 =0 Prediction = sgn(b w 1 x 1 w 2 x 2 ) We only care about the sign, not the magnitude 27
28 Recall: The geometry of a linear classifier Prediction = sgn(b w 1 x 1 w 2 x 2 ) b w 1 x 1 w 2 x 2 =0 2b 2w 1 x 1 2w 2 x 2 =0 1000b 1000w 1 x w 2 x 2 =0 We only care about the sign, not the magnitude 28
29 Maximizing margin Margin = distance of the closest point from the hyperplane We want max w Some people call this the geometric margin The numerator alone is called the functional margin 29
30 Maximizing margin Margin = distance of the closest point from the hyperplane We want max w Some people call this the geometric margin The numerator alone is called the functional margin 30
31 Recall: The geometry of a linear classifier b w 1 x 1 w 2 x 2 =0 Prediction = sgn(b w 1 x 1 w 2 x 2 ) We only care about the sign, not the magnitude 31
32 Recall: The geometry of a linear classifier b w 1 x 1 w 2 x 2 =0 Prediction = sgn(b w 1 x 1 w 2 x 2 ) We have the freedom to scale up/down w and b so that the numerator is 1. We only care about the sign, not the magnitude 32
33 Maximizing margin Margin = distance of the closest point from the hyperplane We want max w We only care about the sign of w and b in the end and not the magnitude Set the absolute score (functional margin) of the closest point to be 1 and allow w to adjust itself max w γ is equivalent to max w ' w in this setting 33
34 Maxmargin classifiers Learning problem: 34
35 Maxmargin classifiers Learning problem: Mimimizing gives us max w ' w 35
36 Maxmargin classifiers Learning problem: Mimimizing gives us max w ' w This condition is true for every example, specifically, for the example closest to the separator 36
37 Maxmargin classifiers Learning problem: Mimimizing gives us max w ' w This condition is true for every example, specifically, for the example closest to the separator This is called the hard Support Vector Machine We will look at how to solve this optimization problem later 37
38 What if the data is not separable? Hard SVM Maximize margin Every example has an functional margin of at least 1 This is a constrained optimization problem If the data is not separable, there is no w that will classify the data Infeasible problem, no solution! 38
39 What if the data is not separable? Hard SVM Maximize margin Every example has an functional margin of at least 1 This is a constrained optimization problem If the data is not separable, there is no w that will classify the data Infeasible problem, no solution! 39
40 Dealing with nonseparable data Key idea: Allow some examples to break into the margin 40
41 Dealing with nonseparable data Key idea: Allow some examples to break into the margin 41
42 Dealing with nonseparable data Key idea: Allow some examples to break into the margin This separator has a large enough margin that it should generalize well. 42
43 Dealing with nonseparable data Key idea: Allow some examples to break into the margin This separator has a large enough margin that it should generalize well. So, when computing margin, ignore the examples that make the margin smaller or the data inseparable. 43
44 Soft SVM Hard SVM: Maximize margin Every example has an functional margin of at least 1 Introduce one slack variable» i per example And require y i w T x i 1» i and» i 0 Intuition: The slack variable allows examples to break into the margin If the slack value is zero, then the example is either on or outside the margin 44
45 Soft SVM Hard SVM: Maximize margin Every example has an functional margin of at least 1 Introduce one slack variable» i per example And require y i w T x i 1» i and» i 0 Intuition: The slack variable allows examples to break into the margin If the slack value is zero, then the example is either on or outside the margin 45
46 Soft SVM Hard SVM: Maximize margin Every example has an functional margin of at least 1 Introduce one slack variable» i per example And require y i w T x i 1» i and» i 0 Intuition: The slack variable allows examples to break into the margin If the slack value is zero, then the example is either on or outside the margin 46
47 Soft SVM Hard SVM: Maximize margin Every example has an functional margin of at least 1 Introduce one slack variable» i per example And require y i w T x i 1» i and» i 0 New optimization problem for learning 47
48 Soft SVM Hard SVM: Maximize margin Every example has an functional margin of at least 1 Introduce one slack variable» i per example And require y i w T x i 1» i and» i 0 New optimization problem for learning 48
49 Soft SVM 49
50 Soft SVM Maximize margin 50
51 Soft SVM Maximize margin Minimize total slack (i.e allow as few examples as possible to violate the margin) 51
52 Soft SVM Maximize margin Tradeoff between the two terms Minimize total slack (i.e allow as few examples as possible to violate the margin) 52
53 Soft SVM Maximize margin Tradeoff between the two terms Minimize total slack (i.e allow as few examples as possible to violate the margin) Eliminate the slack variables to rewrite this This form is more interpretable 53
54 Soft SVM Maximize margin Tradeoff between the two terms Minimize total slack (i.e allow as few examples as possible to violate the margin) Eliminate the slack variables to rewrite this This form is more interpretable 54
55 Maximizing margin and minimizing loss Maximize margin Penalty for the prediction Three cases Example is correctly classified and is outside the margin: penalty = 0 Example is incorrectly classified: penalty = 1 y i w T x i Example is correctly classified but within the margin: penalty = 1 y i w T x i This is the hinge loss function 55
56 Maximizing margin and minimizing loss Maximize margin Penalty for the prediction We can consider three cases Example is correctly classified and is outside the margin: penalty = 0 Example is incorrectly classified: penalty = 1 y i w T x i Example is correctly classified but within the margin: penalty = 1 y i w T x i This is the hinge loss function 56
57 Maximizing margin and minimizing loss Maximize margin Penalty for the prediction We can consider three cases Example is correctly classified and is outside the margin: penalty = 0 Example is incorrectly classified: penalty = 1 y i w T x i Example is correctly classified but within the margin: penalty = 1 y i w T x i This is the hinge loss function 57
58 Maximizing margin and minimizing loss Maximize margin Penalty for the prediction We can consider three cases Example is correctly classified and is outside the margin: penalty = 0 Example is incorrectly classified: penalty = 1 y i w T x i Example is correctly classified but within the margin: penalty = 1 y i w T x i This is the hinge loss function 58
59 Maximizing margin and minimizing loss Maximize margin Penalty for the prediction We can consider three cases Example is correctly classified and is outside the margin: penalty = 0 Example is incorrectly classified: penalty = 1 y i w T x i Example is correctly classified but within the margin: penalty = 1 y i w T x i This is the hinge loss function 59
60 Maximizing margin and minimizing loss Maximize margin Penalty for the prediction We can consider three cases Example is correctly classified and is outside the margin: penalty = 0 Example is incorrectly classified: penalty = 1 y i w T x i Example is correctly classified but within the margin: penalty = 1 y i w T x i This gives us the hinge loss function 60
61 The Hinge Loss Loss yw T x 61
62 The Hinge Loss Loss Hinge loss 01 loss yw T x 62
63 The Hinge Loss Loss 01 loss: If the sign of y and w T x are different, then penalty = 1 Hinge loss 01 loss 01 loss: If the sign of y and w T x is the same, then no penalty yw T x 63
64 The Hinge Loss Loss Hinge: Penalize predictions even if they are correct, but too close to the margin Hinge: Incorrect predictions get a linearly increasing penalty with w T x Hinge: No penalty if w T x is far away from 1 (1 for negative examples) yw T x 64
65 Maximizing margin and minimizing loss Maximize margin Penalty for the prediction Three cases Example is correctly classified and is outside the margin: penalty = 0 Example is incorrectly classified: penalty = 1 y i w T x i Example is correctly classified but within the margin: penalty = 1 y i w T x i 65
66 General learning principle Risk minimization Define the notion of loss over the training data as a function of a hypothesis Learning = find the hypothesis that has lowest loss on the training data 66
67 General learning principle Regularized risk minimization Define a regularization function that penalizes overcomplex hypothesis. Capacity control gives better generalization Define the notion of loss over the training data as a function of a hypothesis Learning = find the hypothesis that has lowest [Regularizer loss on the training data] 67
68 SVM objective function Regularization term: Maximize the margin Imposes a preference over the hypothesis space and pushes for better generalization Can be replaced with other regularization terms which impose other preferences Empirical Loss: Hinge loss Penalizes weight vectors that make mistakes Can be replaced with other loss functions which impose other preferences 68
69 SVM objective function Regularization term: Maximize the margin Imposes a preference over the hypothesis space and pushes for better generalization Can be replaced with other regularization terms which impose other preferences Empirical Loss: Hinge loss Penalizes weight vectors that make mistakes Can be replaced with other loss functions which impose other preferences A hyperparameter that controls the tradeoff between a large margin and a small hingeloss 69
Support vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationLinear classifiers Lecture 3
Linear classifiers Lecture 3 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin ML Methodology Data: labeled instances, e.g. emails marked spam/ham
More informationLECTURE NOTE #8 PROF. ALAN YUILLE. Can we find a linear classifier that separates the position and negative examples?
LECTURE NOTE #8 PROF. ALAN YUILLE 1. Linear Classifiers and Perceptrons A dataset contains N samples: { (x µ, y µ ) : µ = 1 to N }, y µ {±1} Can we find a linear classifier that separates the position
More informationThe Perceptron Algorithm, Margins
The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt
More informationAnnouncements - Homework
Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationStatistical and Computational Learning Theory
Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the
More informationCOMP 875 Announcements
Announcements Tentative presentation order is out Announcements Tentative presentation order is out Remember: Monday before the week of the presentation you must send me the final paper list (for posting
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationSupport Vector Machine. Natural Language Processing Lab lizhonghua
Support Vector Machine Natural Language Processing Lab lizhonghua Support Vector Machine Introduction Theory SVM primal and dual problem Parameter selection and practical issues Compare to other classifier
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationPattern Recognition 2018 Support Vector Machines
Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationMinimax risk bounds for linear threshold functions
CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More informationPart of the slides are adapted from Ziko Kolter
Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,
More informationSupport Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017
Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationLecture 3 January 28
EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationSupport Vector Machines
Support Vector Machines Ryan M. Rifkin Google, Inc. 2008 Plan Regularization derivation of SVMs Geometric derivation of SVMs Optimality, Duality and Large Scale SVMs The Regularization Setting (Again)
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationMachine Learning and Data Mining. Support Vector Machines. Kalev Kask
Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems
More informationNeural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science
Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Supervised Learning / Support Vector Machines
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationModelli Lineari (Generalizzati) e SVM
Modelli Lineari (Generalizzati) e SVM Corso di AA, anno 2018/19, Padova Fabio Aiolli 19/26 Novembre 2018 Fabio Aiolli Modelli Lineari (Generalizzati) e SVM 19/26 Novembre 2018 1 / 36 Outline Linear methods
More informationLECTURE 7 Support vector machines
LECTURE 7 Support vector machines SVMs have been used in a multitude of applications and are one of the most popular machine learning algorithms. We will derive the SVM algorithm from two perspectives:
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationCSE546: SVMs, Dual Formula5on, and Kernels Winter 2012
CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012 Luke ZeClemoyer Slides adapted from Carlos Guestrin Linear classifiers Which line is becer? w. = j w (j) x (j) Data Example i Pick the one with the
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science
More informationComputational Learning Theory (VC Dimension)
Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationLecture 3: Multiclass Classification
Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationKernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin
Kernels Machine Learning CSE446 Carlos Guestrin University of Washington October 28, 2013 Carlos Guestrin 2005-2013 1 Linear Separability: More formally, Using Margin Data linearly separable, if there
More informationLinear, Binary SVM Classifiers
Linear, Binary SVM Classifiers COMPSCI 37D Machine Learning COMPSCI 37D Machine Learning Linear, Binary SVM Classifiers / 6 Outline What Linear, Binary SVM Classifiers Do 2 Margin I 3 Loss and Regularized
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationAbout this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes
About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationSupport Vector Machine. Industrial AI Lab.
Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationMACHINE LEARNING. Support Vector Machines. Alessandro Moschitti
MACHINE LEARNING Support Vector Machines Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Summary Support Vector Machines
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationMidterm Exam Solutions, Spring 2007
1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationMachine Learning 4771
Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationMachine Learning, Fall 2011: Homework 5
0-60 Machine Learning, Fall 0: Homework 5 Machine Learning Department Carnegie Mellon University Due:??? Instructions There are 3 questions on this assignment. Please submit your completed homework to
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationFIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS
LINEAR CLASSIFIER 1 FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS x f y High Value Customers Salary Task: Find Nb Orders 150 70 300 100 200 80 120 100 Low Value Customers Salary Nb Orders 40 80 220
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationMLCC 2017 Regularization Networks I: Linear Models
MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationLecture Slides for INTRODUCTION TO. Machine Learning. By: Postedited by: R.
Lecture Slides for INTRODUCTION TO Machine Learning By: alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Postedited by: R. Basili Learning a Class from Examples Class C of a family car Prediction:
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More information