Lecture 3. 1 Introduction. 2 The Winnow Algorithm. 2.1 TheAlgorithm. CS-621 Theory Gems September 26, 2012
|
|
- Esther Sparks
- 5 years ago
- Views:
Transcription
1 CS-621 Theory Gems September 26, 2012 Lecture 3 Lecturer: Aleksander Mądry Scribes: Yann Barbotin and Vlad Ureche 1 Introduction The main topic of this lecture is another classical algorithm for finding linear classifiers the Winnow Algorithm. We also briefly discuss Support Vector Machines(SVMs). 2 The Winnow Algorithm Recalltheproblemoflinearclassificationwediscussedinthelastlecture.Inthisproblem,wearegivena collectionof mlabeledpoints {(x j,l j )} j witheach x j R n.ourgoalistofindavector(linearclassifier) (w,θ) R n+1 suchthat,forall j, sign(w x j θ) = l j. That is, we want the hyperplane corresponding to (w, θ) to separate the positive examples from the negativeones.aswealreadyarguedpreviously,wlogwecanconstrainourselvestothecasewhen θ = 0 (i.e.,thehyperplanepassesthroughtheorigin)andthereareonlypositiveexamples(i.e., l j = 1,forall j). Last time we presented a simple algorithm for this problem called Perceptron algorithm. Today, we willseeadifferent(andalsosimple)algorithmforthesametask thewinnowalgorithm. 2.1 TheAlgorithm The Winnow algorithm can be seen as a direct application of the multiplicative weights update paradigm that we discussed in Lecture 1. To make this connection precise, instead of providing an explicit description of the algorithm, we will cast it as an application of the Multiplicative Weights Update(MWU) algorithm to an appropriately tailored execution of the(general) learning-from-expert framework cf. Section5inthenotesfromLecture1.(Aswewillseelaterinthecourse,therearemany,quitediverse, algorithmsthatcanbecastinthismanner-thisisonereasonwhythemultiplicativeweightsupdate paradigm has such a wide array of applications.) SimilarlytothecaseoftheanalysisofthePerceptronalgorithm,wewillwlogassumethatourdata isnormalized. However,thistimewewillnormalizeitin l norminsteadof l 2. So,fromnowon, x j = max i x j i = 1,forall j. Winnow algorithm: Consider the following execution of the learning-from-expert-advice framework that is interacting with themwualgorithm(asdescribedinsection5innotesfromlecture1)for ρ = 1andsome 0 < 1 2. Wehave nexperts-oneforeachcoordinate(feature).ineachround t: Let (p t 1,...,pt n )betheconvexcombinationsuppliedbythemwualgorithm; Set w t tobethelinearclassifiergivenby p t i s,i.e., wt (p t 1,..., pt n ); Checkifthereexists x jt suchthat w t x jt 0(i.e., x jt ismisclassified); Ifnosuch j t existsthenwejuststopthealgorithmandoutputtheclassifier w t (asitclassifiesall the data correctly); Otherwise,wesetthelossvectortobe li t = xjt i,foreach i,andproceedtothenextround; 1
2 Clearly, given that our stopping condition ensures correctness of the output classifier, our only task is to bound the total number of executed rounds(provided an appropriate linear classifier exists). Theorem1Let w beacorrectlinearclassifiersuchthat w 1 = 1, w i 0forall i,andlet γ W = min j x j w.if γw 4 γw 2 thenthenumber ToftheiterationsoftheWinnowalgorithmisatmost T 8logn γw 2. Beforeweprovethistheorem,letusdiscusstheassumptionsthatitmakes,aswellas,providesome intuition underlying the algorithm. Aswealreadydiscussedinthelastlecture,wecanalwaysassumethatacorrectlinearclassifieris normalized.ontheotherhand,tomakesurethatallcoordinatesof w arenon-negative,wecanjust applytoeach x j atransformation x j (x j, x j ). Thiswilldoublethenumberofourfeatures,but now,asitisnothardtosee,ifthereexistedacorrectlinearclassifierfortheoriginaldatathenthereis one for the transformed one that has all its coordinates non-negative and achieves the same margin. So, theassumptionmadeon w arenotreallyconstrainingtheapplicabilityofthetheorem. To understand the idea behind the algorithm and its analysis, note that we setup the loss vectors insuchawaythatineachround tthemwualgorithmssuffersalossproportionaltotheextentit misclassifiesthepoint x jt.so,inparticular,thislossisalwaysnon-negative. Ontheotherhand,theexistenceofthecorrectclassifier w isacertificatethatthereexistsaconvex combinationofexpertsthatneversuffersalossthatislargerthan γ W (andthisquantityisstrictly negative). Therefore, by the asymptotically optimal loss guarantee of the MWU algorithm, we know thatsuchapoorperformanceofthemwualgorithmcannotlastfortoolong.(inparticular,theconvex combination (p t 1,...,pt n )thatitmaintainswilleventuallyshiftitsmasstowardsthefeatures/experts that are most important for correct classification.) Finally,notethatthetheoremrequiresthattheMWUalgorithmisusedwiththevalueof being withinafactoroftwoofthevalueof γw 2. Thismightbeproblematicasweusuallydonotknowhow large γ W is.thereis,however,asimpletechniquethatallowsustocircumventthisproblem. Inthistechnique,westartwith = 1 2 (whichisanobviousupperboundonthevalueof γw 2 )and runthewinnowalgorithmfor 8logn rounds.ifthealgorithmterminatesbythattime,wegetacorrect 2 classifierandaredone.otherwise-bytheorem1-weknowthatitmusthavebeenthecasethatour valueof wastoolarge(i.e., γw 2 < ).Thuswehalfthevalueof andrepeattheprocedure(cf.figure 1). Clearly,thisprocedurewillfinishonce becomeslessthat γw 2 (orevenearlier),anditisnothard toseethattheoverallnumberofiterationsperformedisatmosttwiceaslargeastheonepromisedby Theorem1incaseof beingsetcorrectly. data 1/2 Run MWU for 8logn 2 iter. /2 converged? no yes Correct classifier w Figure 1: Iterative margin estimation using Theorem 1 Wenowturntotheproofofthetheorem. 2
3 Proof Wehavethatthetotalloss l MWU ofthemwualgorithmis l MWU = p t i lt i = wi t xjt i = t t t i i w t x jt 0, aseach x jt waschosensoastheclassifier w t misclassifiesit. Therefore,bythelossguaranteeofMWUalgorithm(cf.Theorem6inthenotesfromLecture1),we have for any feature/expert i that 0 l MWU t l t i + t l t i + lnn t x jt i +T + lnn, (1) as ρ = 1inoursetting(duetonormalizationofall x j s)andthus li t 1forall iand t. Letusmultiplybothsidesofeachequation(1)correspondingtosome i,by wi andaddthemall together.asall wi arenon-negativeandtheysumuptoone,weobtain 0 i w i ( t ) x jt i +T + lnn = t w x jt +T + lnn. Now,bydefinitionof γ W,wehavethat w x j γ W foreach j.therefore,itmustbethecasethat 0 t w x jt +T + lnn γ W T + γ W 2 T +4lnn γ W, as γw 4 γw 2. Rearrangingthetermsanddividingbothsidesby γw 2,wegetthat as desired. T 8logn γw 2, Finally,itisworthpointingoutthattheMWUalgorithmistreatedhereinapurelyblack-box fashion. So, it can be replaced by any other algorithm for the learning-from-expert-advice framework, aslongas,thisalgorithmoffersasimilarlossguaranteetotheonedescribedbytheorem6fromlecture Comparison of the Perceptron and Winnow Algorithms At this point, we have seen two different algorithms(perceptron and Winnow) for exactly the same task. Itisthusnaturaltowonderwhichoneofthemoneshoulduse.Asitisoftenthecase,thereisnoclear answer to that question. The performance guarantees of both these algorithms although look similar are to large extent incomparable. Recall that if we do not normalize the data then the performance guarantees for these algorithms that we have established are: Algorithm Perceptron Winnow 1 γ 2 P 8logn γ 2 W Number of iterations with γ P = max min w x j w j w 2 x j 2 with γ W = max min w x j w j w 1 x j 3
4 So,ifboth γ P and γ W aresimilar,usingperceptronshouldbebetter. However,ingeneral, γ P oftencalled l 2 /l 2 -margin and γ W oftenreferredtoas l /l 1 -margin canbequitedifferent,asthe corresponding norms have different characteristics- cf. Figure 2. Inparticular,oneadvantageofWinnowalgorithmisthatthe l /l 1 -marginismuchlesssensitive to presence of irrelevant features in our data set. More precisely, if one imagines that features take valuesfromsomefixedinterval,say [ 1,1],thenaddingnewfeaturesthatarenotreallyneededfor correct classification(i.e., the optimal classifier does not put any weight on them) does not affect the l /l 1 -margin,whilestillmightsignificantlychangethe l 2 /l 2 -margin. The general consensus is that Winnow algorithm is more preferable when one expects the target classifiertoberathersparse(i.e.,alotoffeaturesisirrelevant).ontheotherhand,ifonesuspectsthat most of the features are relevant for correct classification(i.e., the optimal classifier is non-zero on most features)thentheperceptronalgorithmshouldperformbetter,asthe l 2 normofthetargetshouldbe muchsmallerthanits l 1 norm. x 1 = + = 1 x 2 = 1 +x2 2 = 1 x = max(, ) = 1 Figure2: Unitballsin,respectively, l 1, l 2 and l norm.theintersectionwiththeredlinecorresponds tothepointintheseballsthatmaximizeslinearobjective 2 +.The l 1 solutionhasonlyoneofits componentnon-zero(sparsesolution),whilethe l 2 normsolutionisfullandminimizestheenergy.the l solutionfavorshomogeneity. 3 Support Vector Machines(SVMs) Ourgeneralstrategysofarwastorelatetherunningtimeofouralgorithmstothequalityofthemargin achievedbysomeoptimalclassifier.inthiswaywewereabletoprovethatifthedatacanbeseparated with large margin then our algorithms return a correct classifier. However, while doing so, we provided no guarantee on the quality of the margin of this returned classifier. This is rather unfortunate, as one would expect(and, in fact, it can be shown) that classifiers with larger margins have better generalization properties. Therefore, ability to return a classifier with good margin(if such exists) would be a very useful property. To some extent, both Perceptron and Winnow algorithm can be adjusted to provide this kind of guarantees(e.g., by treating all classifications that are correct but have small margin as misclassifications). However, it is sometimes better to take a more principled approach here and just state the search for maximum margin classifier as optimization problem. Such optimization problems are called Support Vector Machines(SVM). Formally,inthecaseof l 2 /l 2 -margin,wehaveourdatanormalizedin l 2 normandwewanttosolve the following problem min w 2 2 (Notethattheconstraintis 1insteadof 0here.) s.t. w x j 1 j (2) 4
5 Astheaboveprogramisconvex,wecansolveitefficiently.(Infact,thereareveryefficientmethods specializedinsolvingsuchsvms.) Onecanshow seebelow thatindeedtheoptimumsolutionto thisprogramgivesamaximum l 2 /l 2 -marginclassifier.also,onecanusedualityargumentstoshowthat - similarly to the case of Perceptron and Winnow algorithm- this optimal classifier is a combination ofsomeofthedatapoints x j. Thesepointsareoftencalledsupportvector(asthey support the separating hyperplane), which explains the name of SVMs. Lemma2Let w betheoptimumsolutiontothesvm(2).the l 2 /l 2 -marginachievedby w isequal to γ P and w 2 2 = 1/γ 2 P. Proof First,weshowthat w 1 isacorrectclassifierswith l 2 /l 2 -marginbeingatleast w 2. Thisfollows easilybynotingthat,forany j, w x j 1 w 2 x j 2 w 2, whereweusedthefeasibilityof w andthefactthatall x j sarenormalized. w Ontheotherhand,if wisacorrectclassifierwith l 2 /l 2 -margin γthen γ w 2 isafeasiblesolutionto thesvm(2).toseethis,notethatforany j w x j w x j w 2 = 1. γ w 2 min j w x j w 2 Also, the objective value corresponding to w γ w 2 is 1/γ 2.Thelemmanowfollowsbycombiningthetwo above facts. Although at first glance very appealing, in practice, looking for the maximum margin classifier that correctlyclassifiesallthedatamightnotbethebestidea.thisissoasthereal-worlddataisusually noisy and just a few outliers might either render the data not linearly separable, or at least significantly reduce the margin of the best linear classifier. Therefore, instead of treating the correct classification ofalldatapointsasahardconstraint,onemightprefertobeabletoallowoneselftomisclassifysome samples if this enables separation of the remaining data with large margin. This type of soft trade-off between the correctness and quality of the margin can be very conveniently expressedinsvmsframework,byintroducingso-calledslackvariables j foreachdatapoint.formally, one consider the following augmented optimization problem: min w 2 2 +C j j s.t. w x j 1 j j j 0 j Here,theparameter Cisusedtoweighthesizeofthemarginoftheobtainedclassifieragainstits classificationerrormeasuredbythetotalhingeloss.clearly,ifweset Ctobeverylarge,weessentially recoverthesettingofsvmin(2). 5
Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012
CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,
More informationMore about the Perceptron
More about the Perceptron CMSC 422 MARINE CARPUAT marine@cs.umd.edu Credit: figures by Piyush Rai and Hal Daume III Recap: Perceptron for binary classification Classifier = hyperplane that separates positive
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationThe Perceptron Algorithm 1
CS 64: Machine Learning Spring 5 College of Computer and Information Science Northeastern University Lecture 5 March, 6 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu Introduction The Perceptron
More informationCS 395T Computational Learning Theory. Scribe: Rahul Suri
CS 395T Computational Learning Theory Lecture 6: September 19, 007 Lecturer: Adam Klivans Scribe: Rahul Suri 6.1 Overview In Lecture 5, we showed that for any DNF formula f with n variables and s terms
More information1 Review of the Perceptron Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #15 Scribe: (James) Zhen XIANG April, 008 1 Review of the Perceptron Algorithm In the last few lectures, we talked about various kinds
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationLecture 4: Linear predictors and the Perceptron
Lecture 4: Linear predictors and the Perceptron Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 4 1 / 34 Inductive Bias Inductive bias is critical to prevent overfitting.
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationLinear Classifiers. Michael Collins. January 18, 2012
Linear Classifiers Michael Collins January 18, 2012 Today s Lecture Binary classification problems Linear classifiers The perceptron algorithm Classification Problems: An Example Goal: build a system that
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationCS 590D Lecture Notes
CS 590D Lecture Notes David Wemhoener December 017 1 Introduction Figure 1: Various Separators We previously looked at the perceptron learning algorithm, which gives us a separator for linearly separable
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationThe perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.
1 The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. The algorithm applies only to single layer models
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationCS 446: Machine Learning Lecture 4, Part 2: On-Line Learning
CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning 0.1 Linear Functions So far, we have been looking at Linear Functions { as a class of functions which can 1 if W1 X separate some data and not
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More informationSupport Vector Machines. Machine Learning Fall 2017
Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce
More informationSingle layer NN. Neuron Model
Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent
More informationMinimax risk bounds for linear threshold functions
CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability
More informationSVMs: nonlinearity through kernels
Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly
More informationLinear Classification: Perceptron
Linear Classification: Perceptron Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 18 Y Tao Linear Classification: Perceptron In this lecture, we will consider
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationOnline Learning With Kernel
CS 446 Machine Learning Fall 2016 SEP 27, 2016 Online Learning With Kernel Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Stochastic Gradient Descent Algorithms Regularization Algorithm Issues
More informationHOMEWORK 4: SVMS AND KERNELS
HOMEWORK 4: SVMS AND KERNELS CMU 060: MACHINE LEARNING (FALL 206) OUT: Sep. 26, 206 DUE: 5:30 pm, Oct. 05, 206 TAs: Simon Shaolei Du, Tianshu Ren, Hsiao-Yu Fish Tung Instructions Homework Submission: Submit
More informationLecture 3: Multiclass Classification
Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next
More informationName (NetID): (1 Point)
CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More information4.1 Online Convex Optimization
CS/CNS/EE 53: Advanced Topics in Machine Learning Topic: Online Convex Optimization and Online SVM Lecturer: Daniel Golovin Scribe: Xiaodi Hou Date: Jan 3, 4. Online Convex Optimization Definition 4..
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationRecap from end of last time
Topics in Machine Learning Theory Avrim Blum 09/03/4 Lecture 3: Shifting/Sleeping Experts, the Winnow Algorithm, and L Margin Bounds Recap from end of last time RWM (multiplicative weights alg) p i = w
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationSecurity Analytics. Topic 6: Perceptron and Support Vector Machine
Security Analytics Topic 6: Perceptron and Support Vector Machine Purdue University Prof. Ninghui Li Based on slides by Prof. Jenifer Neville and Chris Clifton Readings Principle of Data Mining Chapter
More informationSGN (4 cr) Chapter 5
SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationLecture 3 January 28
EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationSupport Vector and Kernel Methods
SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationTraining the linear classifier
215, Training the linear classifier A natural way to train the classifier is to minimize the number of classification errors on the training data, i.e. choosing w so that the training error is minimized.
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationCS269: Machine Learning Theory Lecture 16: SVMs and Kernels November 17, 2010
CS269: Machine Learning Theory Lecture 6: SVMs and Kernels November 7, 200 Lecturer: Jennifer Wortman Vaughan Scribe: Jason Au, Ling Fang, Kwanho Lee Today, we will continue on the topic of support vector
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationApplied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson
Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationNeural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science
Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Supervised Learning / Support Vector Machines
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationIn the Name of God. Lecture 11: Single Layer Perceptrons
1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationLecture 11. Linear Soft Margin Support Vector Machines
CS142: Machine Learning Spring 2017 Lecture 11 Instructor: Pedro Felzenszwalb Scribes: Dan Xiang, Tyler Dae Devlin Linear Soft Margin Support Vector Machines We continue our discussion of linear soft margin
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More information1 Review of Winnow Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm
More informationCSC321 Lecture 4 The Perceptron Algorithm
CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:
More informationHidden Markov Models (HMM) and Support Vector Machine (SVM)
Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machines
Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of
More informationThe Perceptron Algorithm
The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationFIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS
LINEAR CLASSIFIER 1 FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS x f y High Value Customers Salary Task: Find Nb Orders 150 70 300 100 200 80 120 100 Low Value Customers Salary Nb Orders 40 80 220
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationOnline Passive- Aggressive Algorithms
Online Passive- Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 Tutor: Eneldo Loza Mencía Seminar aus Maschinellem Lernen WS07/08 Overview Online algorithms Online Binary Classification Problem Perceptron
More information