COMP 562: Introduction to Machine Learning
|
|
- Julie Adelia Manning
- 5 years ago
- Views:
Transcription
1 COMP 562: Introduction to Machine Learning Lecture 20 : Support Vector Machines, Kernels Mahmoud Mostapha 1 Department of Computer Science University of North Carolina at Chapel Hill mahmoudm@cs.unc.edu October 31, Slides adapted from Eric Eaton
2 COMP562 - Lecture 20 Plan for today: Prediction Why might prediction be wrong? Support vector machines Doing really well with linear models Kernels Making the non-linear linear
3 Why Might be Wrong? True non- determinism Flip a biased coin p(heads) = θ! Es@mate θ! If θ > 0.5 predict heads, else tails Lots of ML research on problems like this: Learn a model Do the best you can in expecta@on
4 Why Might be Wrong? observability Something needed to predict y is missing from observa@on x N- bit parity problem x contains N- 1 bits (hard PO) x contains N bits but learner ignores some of them (sow PO) Noise in the observa@on x Measurement error Instrument limita@ons
5 Why Might be Wrong? True non- determinism observability hard, sow bias Algorithmic bias Bounded resources
6 Bias Having the right features (x) is crucial 0 x! x 2!
7 Support Vector Machines Doing Really Well with Linear Decision Surfaces
8 Strengths of SVMs Good in theory in Works well with few training instances Find globally best model Efficient algorithms Amenable to the kernel trick
9 Linear Separators Training instances x 2 R d+1,x 0 =1 y 2 { 1, 1} Model parameters 2 R d+1 Hyperplane x = h, xi =0 Decision func@on h(x) = sign( x)=sign(h, xi) Recall: Inner (dot) product: hu, vi = u v = u v = X i u i v i
10
11
12
13
14 A Good Separator
15 Noise in the
16 Ruling Out Some Separators
17 Lots of Noise
18 Only One Separator Remains
19 Maximizing the Margin
20 Fat Separators
21 Fat Separators margin
22 Why Maximize Margin Increasing margin reduces capacity i.e., fewer possible models Lesson from Learning Theory: If the following holds: H is sufficiently constrained in size and/or the size of the training data set n is large, then low training error is likely to be evidence of low generaliza@on error 23
23 View of Regression h (x) = 1 1+e T x h (x) =g(z) z = T x If y =1, we want h, T (x) 1 x 0 If y =0, we want h (x) 0, T x 0 J( ) = min J( ) Based on slide by Andrew Ng nx [y i log h (x i )+(1 y i ) log (1 h (x i ))] i=1 cost 1 ( x i ) cost 0 ( x i ) 24
24 Alternate View of Regression Cost of example: y i log h (x i ) (1 y i ) log (1 h (x i )) h (x) = 1 1+e T x z = T x If y =1(want T x 0 ): If y =0(want T x 0 ): Based on slide by Andrew Ng 25
25 Regression to SVMs Regression: nx dx min [y i log h (x i )+(1 y i ) log (1 h (x i ))] + 2 i=1 j=1 2 j Support Vector Machines: nx min C [y i cost 1 ( x i )+(1 y i ) cost 0 ( x i )] i=1 dx j=1 2 j You can think of C as similar to! 1 26
26 Support Vector Machine min C nx [y i cost 1 ( x i )+(1 y i ) cost 0 ( x i )] i=1 dx j=1 2 j If y =1(want x 1): If y =0(want x apple 1): `hinge (h(x)) = max(0, 1 y h(x)) Based on slide by Andrew Ng 27
27 Support Vector Machine min C nx [y i cost 1 ( x i )+(1 y i ) cost 0 ( x i )] i=1 y = 1 / 0 dx j=1 2 j with C = 1 y = +1 / min 2 dx j=1 2 j s.t. x i 1 x i apple 1 if y i =1 if y i = 1 1 min 2 dx j=1 2 j s.t. y i ( x i ) 1 28
28 Maximum Margin Hyperplane 2 margin = k k 2 x =1 x = 1
29 Support Vectors x =1 x = 1
30 Vector Inner Product v 2 v! u 2 θ p! u! kuk 2 = length(u) 2 R q = u u2 2 v 1 u 1 Based on example by Andrew Ng u v = v u = u 1 v 1 + u 2 v 2 = kuk k k 2 kvk k k 2 cos = pkukk k 2 where p = kvkk 2 cos 32
31 Understanding the Hyperplane 1 min 2 dx j=1 2 j s.t. x i 1 x i apple 1 if y i =1 if y i = 1 Assume θ 0 = 0 so that the hyperplane is centered at the origin, and that d = 2 x! θ x = k k 2 kxk 2 cos {z } p θ p! = pk k 2 Based on example by Andrew Ng 33
32 Based on example by Andrew Ng Maximizing the Margin 1 min 2 dx j=1 2 j s.t. x i 1 x i apple 1 if y i =1 if y i = 1 Assume θ 0 = 0 so that the hyperplane is centered at the origin, and that d = 2 Let p i be the projec@on of x i onto the vector θ θ -θ Since p is small, therefore k k 2 must be large to have pk k 2 1 (or - 1) -θ θ Since p is larger, k k 2 can be smaller in order to have pk k 2 1 (or - 1)
33 Size of the Margin For the support vectors, we have pk k 2 = ±1 p is the length of the projec@on of the SVs onto θ -θ p θ Therefore, p = 1 k k 2 margin = 2p = 2 k k 2 margin 35
34 The SVM Dual Problem The primal SVM problem was given as 1 dx min j 2 2 j=1 s.t. y i ( x i ) 1 8i Can solve it more efficiently by taking the Lagrangian dual Duality is a common idea in op@miza@on It transforms a difficult op@miza@on problem into a simpler one Key idea: introduce slack variables α i for each constraint α i indicates how important a par@cular constraint is to the solu@on 36
35 The SVM Dual Problem The Lagrangian is given by L(, ) = 1 dx j 2 2 j=1 s.t. i 0 8i nx i=1 i (y i x 1) We must minimize over θ and maximize over α At op@mal solu@on, par@als w.r.t θ s are 0 Solve by a bunch of algebra and calculus... and we obtain... 37
36 SVM Dual Maximize J( ) = nx i=1 i 1 2 The decision func@on is given by h(x) = sign where b = 1 SV s.t. i 0 8i X i y i =0 i X i2sv X i2sv nx i=1 nx j=1 i y i hx, x i i + b i X j2sv i j y i y j hx i, x j i! 1 j y j hx i, x j ia 38
37 Maximize Understanding the Dual J( ) = nx i=1 i 1 2 s.t. i 0 8i X i y i =0 Balances between the weight of constraints for different classes! i nx i=1 nx j=1 i j y i y j hx i, x j i Constraint weights (α i s) cannot be nega@ve! 39
38 Understanding the Dual Maximize J( ) = nx i=1 i 1 2 nx i=1 nx j=1 i j y i y j hx i, x j i s.t. i X 0 8i i y i =0 i Points with different labels increase the sum Points with same label decrease the sum Measures the similarity between points! Intui@vely, we should be more careful around points near the margin 40
39 Understanding the Dual Maximize J( ) = nx i=1 i 1 2 nx i=1 nx j=1 i j y i y j hx i, x j i s.t. i X 0 8i i y i =0 i In the solu@on, either: α i > 0 and the constraint ( y i ( x i )=1) Ø point is a support vector α i = 0 Ø point is not a support vector 41
40 Employing the Given the α*, weights are? = X i? y i x i In this formula@on, have not added x 0 = 1 Therefore, we can solve one of the SV constraints y i (? x i + 0 )=1 to obtain θ 0 i2svs Or, more commonly, take the average solu@on over all support vectors 42
41 What if Data Are Not Linearly Separable? Soft margin SVMs
42 Large Margin Classifier in Presence of Outliers C very large x 2 C not too large x 1 Based on slide by Andrew Ng 31
43 Strengths of SVMs Good in theory Good in Work well with few training instances Find globally best model Efficient algorithms Amenable to the kernel trick
44 What if Surface is Non- Linear? O O O O O O O O X X X X O O X X O O O O O O O O O O Image from h`p://
45 Kernel Methods Making the Non- Linear Linear
46 When Linear Separators Fail 0 x! x 2!
47 Mapping into a New Feature Space For example, with x i 2 R 2 ([x i1,x i2 ]) = [x i1,x i2,x i1 x i2,x 2 i1,x 2 i2] Rather than run SVM on x i, run it on Φ(x i ) Find non- linear separator in input space What if Φ(x i ) is really big? Use kernels to compute it implicitly! Image from h`p://web.engr.oregonstate.edu/ ~afern/classes/cs534/ :X 7! ˆX = (x)
48 Kernels Find kernel K such that K(x i, x j )=h (x i ), (x j )i Compu@ng K(x i, x j ) should be efficient, much more so than compu@ng Φ(x i ) and Φ(x j ) Use K(x i, x j ) in SVM algorithm rather than Remarkably, this is possible! hx i, x j i
49 The Polynomial Kernel Let x i =[x i1,x i2 ] and x j =[x j1,x j2 ] Consider the following func@on: K(x i, x j )=hxh i, x j i 2 =(x i1 x j1 + x i2 x j2 ) 2 =(x 2 i1x 2 j1 + x 2 i2x 2 j2 +2x i1 x i2 x j1 x j2 ) = h (x i ), (x j )ii where h i (x i )=[x 2 i1,x 2 i2, p p 2x i1 x i2 ] (x j )=[x 2 j1,x 2 j2, p 2x j1 x j2 ]
50 The Polynomial Kernel Given by K(x i, x j )=hx i, x j i d Φ(x) contains all monomials of degree d! Useful in visual pa`ern recogni@on Example: 16x16 pixel image monomials of degree 5 Never explicitly compute Φ(x)! Varia@on: K(x i, x j )=(hx i, x j i + 1) d Adds all lower- order monomials (degrees 1,...,d )!
51 The Kernel Trick Given an algorithm which is formulated in terms of a posi@ve definite kernel K 1, one can construct an alterna@ve algorithm by replacing K 1 with another posi@ve definite kernel K 2 Ø SVMs can use the kernel trick
52 Kernels into SVM J( ) = nx i=1 X nx J( ) = i=1 i i i 1 2 s.t. a i 0 8i X i y i =0 nx i=1 nx i=1 nx j=1 nx j=1 i j y i y j hx i, x j i i j y i y j K(x i, x j ) 53
53 The Gaussian Kernel Also called Radial Basis (RBF) kernel kxi x j k 2 2 K(x i, x j )=exp 2 2 Has value 1 when x i = x j! Value falls off to 0 with increasing distance Note: Need to do feature scaling before using Gaussian Kernel lower bias, higher variance higher bias, lower variance
54 Other Kernels Sigmoid Kernel K(x i, x j ) = tanh ( x i x j + c) Neural networks use sigmoid as ac@va@on func@on SVM with a sigmoid kernel is equivalent to 2- layer perceptron Cosine Similarity Kernel K(x i, x j )= x i x j kx i kkx j k Popular choice for measuring similarity of text documents L 2 norm projects vectors onto the unit sphere; their dot product is the cosine of the angle between the vectors 59
55 Chi- squared Kernel K(x i, x j )=exp Other Kernels Widely used in computer vision applica@ons Chi- squared measures distance between probability distribu@ons Data is assumed to be non- nega@ve, owen with L 1 norm of 1 String kernels Tree kernels Graph kernels X k (x ik x jk ) 2 x ik + x jk! 60
56 ! An Aside: The Math Behind Kernels What does it mean to be a kernel? K(x i, x j )=h (x i ), (x j )i for some Φ What does it take to be a kernel? The Gram matrix G ij = K(x i, x j ) Symmetric matrix Posi@ve semi- definite matrix: "z T Gz 0 for every non- zero vector z 2 R n Establishing kernel- hood from first principles is non- trivial
57 A Few Good Kernels... Linear Kernel K(x i, x j )=hx i, x j i Polynomial kernel K(x i, x j )=(hx i, x j i + c) d c 0 trades off influence of lower order terms Gaussian kernel Sigmoid kernel Many more... K(x i, x j )=exp Cosine similarity kernel Chi- squared kernel String/tree/graph/wavelet/etc kernels kxi x j k K(x i, x j ) = tanh ( x i x j + c) 62
58 Photo Retouching (Leyvand et al., 2008)
59 Advice for Applying SVMs Use SVM sowware package to solve for parameters e.g., SVMlight, libsvm, cvx (fast!), etc. Need to specify: Choice of parameter C! Choice of kernel Associated kernel parameters e.g., K(x i, x j )=(hx i, x j i + c) d kxi x j k 2 2 K(x i, x j )=exp
60 Class with SVMs y 2{1,...,K} Many SVM packages already have mul@- class classifica@on built in Otherwise, use one- vs- rest Based on slide by Andrew Ng Train K SVMs, each picks out one class from rest, yielding (1),..., (K) Predict class i with largest ( (i) ) x 65
61 SVMs vs Regression (Advice from Andrew Ng) n = # training examples d = # features If d is large (rela@ve to n) (e.g., d > n with d = 10,000, n = 10-1,000) Use logis@c regression or SVM with a linear kernel If d is small (up to 1,000), n is intermediate (up to 10,000) Use SVM with Gaussian kernel If d is small (up to 1,000), n is large (50,000+) Create/add more features, then use logis@c regression or SVM without a kernel Neural networks likely to work well for most of these se~ngs, but may be slower to train Based on slide by Andrew Ng 66
62 Other SVM nu SVM nu parameter controls: of support vectors (lower bound) and rate (upper bound) E.g., =0.05 guarantees that 5% of training points are SVs and training error rate is 5% Harder to than C- SVM and not as scalable SVMs for regression One- class SVMs SVMs for clustering... 67
63 Conclusion SVMs find linear separator The kernel trick makes SVMs learn non- linear decision surfaces Strength of SVMs: Good and empirical performance Supports many types of kernels Disadvantages of SVMs: Slow to train/predict for huge data sets (but fast!) Need to choose the kernel (and tune its parameters)
64 Today Support Vector Machines (SVM) Kernels
10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationSupport Vector Machine II
Support Vector Machine II Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 due tonight HW 2 released. Online Scalable Learning Adaptive to Unknown Dynamics and Graphs Yanning
More informationCSE546: SVMs, Dual Formula5on, and Kernels Winter 2012
CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012 Luke ZeClemoyer Slides adapted from Carlos Guestrin Linear classifiers Which line is becer? w. = j w (j) x (j) Data Example i Pick the one with the
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationLogis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logis&c Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationSVMs, Duality and the Kernel Trick
SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today
More informationMachine Learning and Data Mining. Support Vector Machines. Kalev Kask
Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationMulti-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University
Multi-class SVMs Lecture 17: Aykut Erdem April 2016 Hacettepe University Administrative We will have a make-up lecture on Saturday April 23, 2016. Project progress reports are due April 21, 2016 2 days
More informationPattern Recognition 2018 Support Vector Machines
Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht
More informationSupport Vector and Kernel Methods
SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationLecture 10: A brief introduction to Support Vector Machine
Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationUVA CS 4501: Machine Learning. Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in PracCce. Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in PracCce Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationLinear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationFounda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang
Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #3 Machine Learning Edward Y. Chang Edward Chang Founda'ons of LSMM 1 Edward Chang Foundations of LSMM 2 Machine Learning
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationMachine Learning 4771
Machine Learning 477 Instructors: Adrian Weller and Ilia Vovsha Lecture 3: Support Vector Machines Dual Forms Non Separable Data Support Vector Machines (Bishop 7., Burges Tutorial) Kernels Dual Form DerivaPon
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationBBM406 - Introduc0on to ML. Spring Ensemble Methods. Aykut Erdem Dept. of Computer Engineering HaceDepe University
BBM406 - Introduc0on to ML Spring 2014 Ensemble Methods Aykut Erdem Dept. of Computer Engineering HaceDepe University 2 Slides adopted from David Sontag, Mehryar Mohri, Ziv- Bar Joseph, Arvind Rao, Greg
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationSupport Vector Machines
Support Vector Machines Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: http://www.cs.cmu.edu/~awm/tutorials/ Where Should We Draw the Line????
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationSupport Vector Machine I
Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationAlgorithms for NLP. Classifica(on III. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classifica(on III Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley The Perceptron, Again Start with zero weights Visit training instances one by one Try to classify If correct,
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationSVMs: nonlinearity through kernels
Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationKernels and the Kernel Trick. Machine Learning Fall 2017
Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationTopics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families
Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines
More informationSupport Vector Machines
Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationSupport Vector Machines and Kernel Methods
Support Vector Machines and Kernel Methods Geoff Gordon ggordon@cs.cmu.edu July 10, 2003 Overview Why do people care about SVMs? Classification problems SVMs often produce good results over a wide range
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationFoundation of Intelligent Systems, Part I. SVM s & Kernel Methods
Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationCS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment
More informationAlgorithms for NLP. Classifica(on III. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classifica(on III Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley Objec(ve Func(ons What do we want from our weights? Depends! So far: minimize (training) errors: This is
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationLinear classifiers Lecture 3
Linear classifiers Lecture 3 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin ML Methodology Data: labeled instances, e.g. emails marked spam/ham
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland SUPPORT VECTOR MACHINES Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Machine Learning: Jordan
More informationSlides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP
Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Predic?ve Distribu?on (1) Predict t for new values of x by integra?ng over w: where The Evidence Approxima?on (1) The
More informationSVMC An introduction to Support Vector Machines Classification
SVMC An introduction to Support Vector Machines Classification 6.783, Biomedical Decision Support Lorenzo Rosasco (lrosasco@mit.edu) Department of Brain and Cognitive Science MIT A typical problem We have
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationSupport Vector Machines Explained
December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationSupport Vector Machines: Kernels
Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems
More informationRecita,on: Loss, Regulariza,on, and Dual*
10-701 Recita,on: Loss, Regulariza,on, and Dual* Jay- Yoon Lee 02/26/2015 *Adopted figures from 10725 lecture slides and from the book Elements of Sta,s,cal Learning Loss and Regulariza,on Op,miza,on problem
More informationDiscriminative Learning and Big Data
AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture
More informationDeviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation
Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation
More informationKernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin
Kernels Machine Learning CSE446 Carlos Guestrin University of Washington October 28, 2013 Carlos Guestrin 2005-2013 1 Linear Separability: More formally, Using Margin Data linearly separable, if there
More information