Lecture 3. 1 Introduction. 2 The Winnow Algorithm. 2.1 TheAlgorithm. CS-621 Theory Gems September 26, 2012

Size: px
Start display at page:

Download "Lecture 3. 1 Introduction. 2 The Winnow Algorithm. 2.1 TheAlgorithm. CS-621 Theory Gems September 26, 2012"

Transcription

1 CS-621 Theory Gems September 26, 2012 Lecture 3 Lecturer: Aleksander Mądry Scribes: Yann Barbotin and Vlad Ureche 1 Introduction The main topic of this lecture is another classical algorithm for finding linear classifiers the Winnow Algorithm. We also briefly discuss Support Vector Machines(SVMs). 2 The Winnow Algorithm Recalltheproblemoflinearclassificationwediscussedinthelastlecture.Inthisproblem,wearegivena collectionof mlabeledpoints {(x j,l j )} j witheach x j R n.ourgoalistofindavector(linearclassifier) (w,θ) R n+1 suchthat,forall j, sign(w x j θ) = l j. That is, we want the hyperplane corresponding to (w, θ) to separate the positive examples from the negativeones.aswealreadyarguedpreviously,wlogwecanconstrainourselvestothecasewhen θ = 0 (i.e.,thehyperplanepassesthroughtheorigin)andthereareonlypositiveexamples(i.e., l j = 1,forall j). Last time we presented a simple algorithm for this problem called Perceptron algorithm. Today, we willseeadifferent(andalsosimple)algorithmforthesametask thewinnowalgorithm. 2.1 TheAlgorithm The Winnow algorithm can be seen as a direct application of the multiplicative weights update paradigm that we discussed in Lecture 1. To make this connection precise, instead of providing an explicit description of the algorithm, we will cast it as an application of the Multiplicative Weights Update(MWU) algorithm to an appropriately tailored execution of the(general) learning-from-expert framework cf. Section5inthenotesfromLecture1.(Aswewillseelaterinthecourse,therearemany,quitediverse, algorithmsthatcanbecastinthismanner-thisisonereasonwhythemultiplicativeweightsupdate paradigm has such a wide array of applications.) SimilarlytothecaseoftheanalysisofthePerceptronalgorithm,wewillwlogassumethatourdata isnormalized. However,thistimewewillnormalizeitin l norminsteadof l 2. So,fromnowon, x j = max i x j i = 1,forall j. Winnow algorithm: Consider the following execution of the learning-from-expert-advice framework that is interacting with themwualgorithm(asdescribedinsection5innotesfromlecture1)for ρ = 1andsome 0 < 1 2. Wehave nexperts-oneforeachcoordinate(feature).ineachround t: Let (p t 1,...,pt n )betheconvexcombinationsuppliedbythemwualgorithm; Set w t tobethelinearclassifiergivenby p t i s,i.e., wt (p t 1,..., pt n ); Checkifthereexists x jt suchthat w t x jt 0(i.e., x jt ismisclassified); Ifnosuch j t existsthenwejuststopthealgorithmandoutputtheclassifier w t (asitclassifiesall the data correctly); Otherwise,wesetthelossvectortobe li t = xjt i,foreach i,andproceedtothenextround; 1

2 Clearly, given that our stopping condition ensures correctness of the output classifier, our only task is to bound the total number of executed rounds(provided an appropriate linear classifier exists). Theorem1Let w beacorrectlinearclassifiersuchthat w 1 = 1, w i 0forall i,andlet γ W = min j x j w.if γw 4 γw 2 thenthenumber ToftheiterationsoftheWinnowalgorithmisatmost T 8logn γw 2. Beforeweprovethistheorem,letusdiscusstheassumptionsthatitmakes,aswellas,providesome intuition underlying the algorithm. Aswealreadydiscussedinthelastlecture,wecanalwaysassumethatacorrectlinearclassifieris normalized.ontheotherhand,tomakesurethatallcoordinatesof w arenon-negative,wecanjust applytoeach x j atransformation x j (x j, x j ). Thiswilldoublethenumberofourfeatures,but now,asitisnothardtosee,ifthereexistedacorrectlinearclassifierfortheoriginaldatathenthereis one for the transformed one that has all its coordinates non-negative and achieves the same margin. So, theassumptionmadeon w arenotreallyconstrainingtheapplicabilityofthetheorem. To understand the idea behind the algorithm and its analysis, note that we setup the loss vectors insuchawaythatineachround tthemwualgorithmssuffersalossproportionaltotheextentit misclassifiesthepoint x jt.so,inparticular,thislossisalwaysnon-negative. Ontheotherhand,theexistenceofthecorrectclassifier w isacertificatethatthereexistsaconvex combinationofexpertsthatneversuffersalossthatislargerthan γ W (andthisquantityisstrictly negative). Therefore, by the asymptotically optimal loss guarantee of the MWU algorithm, we know thatsuchapoorperformanceofthemwualgorithmcannotlastfortoolong.(inparticular,theconvex combination (p t 1,...,pt n )thatitmaintainswilleventuallyshiftitsmasstowardsthefeatures/experts that are most important for correct classification.) Finally,notethatthetheoremrequiresthattheMWUalgorithmisusedwiththevalueof being withinafactoroftwoofthevalueof γw 2. Thismightbeproblematicasweusuallydonotknowhow large γ W is.thereis,however,asimpletechniquethatallowsustocircumventthisproblem. Inthistechnique,westartwith = 1 2 (whichisanobviousupperboundonthevalueof γw 2 )and runthewinnowalgorithmfor 8logn rounds.ifthealgorithmterminatesbythattime,wegetacorrect 2 classifierandaredone.otherwise-bytheorem1-weknowthatitmusthavebeenthecasethatour valueof wastoolarge(i.e., γw 2 < ).Thuswehalfthevalueof andrepeattheprocedure(cf.figure 1). Clearly,thisprocedurewillfinishonce becomeslessthat γw 2 (orevenearlier),anditisnothard toseethattheoverallnumberofiterationsperformedisatmosttwiceaslargeastheonepromisedby Theorem1incaseof beingsetcorrectly. data 1/2 Run MWU for 8logn 2 iter. /2 converged? no yes Correct classifier w Figure 1: Iterative margin estimation using Theorem 1 Wenowturntotheproofofthetheorem. 2

3 Proof Wehavethatthetotalloss l MWU ofthemwualgorithmis l MWU = p t i lt i = wi t xjt i = t t t i i w t x jt 0, aseach x jt waschosensoastheclassifier w t misclassifiesit. Therefore,bythelossguaranteeofMWUalgorithm(cf.Theorem6inthenotesfromLecture1),we have for any feature/expert i that 0 l MWU t l t i + t l t i + lnn t x jt i +T + lnn, (1) as ρ = 1inoursetting(duetonormalizationofall x j s)andthus li t 1forall iand t. Letusmultiplybothsidesofeachequation(1)correspondingtosome i,by wi andaddthemall together.asall wi arenon-negativeandtheysumuptoone,weobtain 0 i w i ( t ) x jt i +T + lnn = t w x jt +T + lnn. Now,bydefinitionof γ W,wehavethat w x j γ W foreach j.therefore,itmustbethecasethat 0 t w x jt +T + lnn γ W T + γ W 2 T +4lnn γ W, as γw 4 γw 2. Rearrangingthetermsanddividingbothsidesby γw 2,wegetthat as desired. T 8logn γw 2, Finally,itisworthpointingoutthattheMWUalgorithmistreatedhereinapurelyblack-box fashion. So, it can be replaced by any other algorithm for the learning-from-expert-advice framework, aslongas,thisalgorithmoffersasimilarlossguaranteetotheonedescribedbytheorem6fromlecture Comparison of the Perceptron and Winnow Algorithms At this point, we have seen two different algorithms(perceptron and Winnow) for exactly the same task. Itisthusnaturaltowonderwhichoneofthemoneshoulduse.Asitisoftenthecase,thereisnoclear answer to that question. The performance guarantees of both these algorithms although look similar are to large extent incomparable. Recall that if we do not normalize the data then the performance guarantees for these algorithms that we have established are: Algorithm Perceptron Winnow 1 γ 2 P 8logn γ 2 W Number of iterations with γ P = max min w x j w j w 2 x j 2 with γ W = max min w x j w j w 1 x j 3

4 So,ifboth γ P and γ W aresimilar,usingperceptronshouldbebetter. However,ingeneral, γ P oftencalled l 2 /l 2 -margin and γ W oftenreferredtoas l /l 1 -margin canbequitedifferent,asthe corresponding norms have different characteristics- cf. Figure 2. Inparticular,oneadvantageofWinnowalgorithmisthatthe l /l 1 -marginismuchlesssensitive to presence of irrelevant features in our data set. More precisely, if one imagines that features take valuesfromsomefixedinterval,say [ 1,1],thenaddingnewfeaturesthatarenotreallyneededfor correct classification(i.e., the optimal classifier does not put any weight on them) does not affect the l /l 1 -margin,whilestillmightsignificantlychangethe l 2 /l 2 -margin. The general consensus is that Winnow algorithm is more preferable when one expects the target classifiertoberathersparse(i.e.,alotoffeaturesisirrelevant).ontheotherhand,ifonesuspectsthat most of the features are relevant for correct classification(i.e., the optimal classifier is non-zero on most features)thentheperceptronalgorithmshouldperformbetter,asthe l 2 normofthetargetshouldbe muchsmallerthanits l 1 norm. x 1 = + = 1 x 2 = 1 +x2 2 = 1 x = max(, ) = 1 Figure2: Unitballsin,respectively, l 1, l 2 and l norm.theintersectionwiththeredlinecorresponds tothepointintheseballsthatmaximizeslinearobjective 2 +.The l 1 solutionhasonlyoneofits componentnon-zero(sparsesolution),whilethe l 2 normsolutionisfullandminimizestheenergy.the l solutionfavorshomogeneity. 3 Support Vector Machines(SVMs) Ourgeneralstrategysofarwastorelatetherunningtimeofouralgorithmstothequalityofthemargin achievedbysomeoptimalclassifier.inthiswaywewereabletoprovethatifthedatacanbeseparated with large margin then our algorithms return a correct classifier. However, while doing so, we provided no guarantee on the quality of the margin of this returned classifier. This is rather unfortunate, as one would expect(and, in fact, it can be shown) that classifiers with larger margins have better generalization properties. Therefore, ability to return a classifier with good margin(if such exists) would be a very useful property. To some extent, both Perceptron and Winnow algorithm can be adjusted to provide this kind of guarantees(e.g., by treating all classifications that are correct but have small margin as misclassifications). However, it is sometimes better to take a more principled approach here and just state the search for maximum margin classifier as optimization problem. Such optimization problems are called Support Vector Machines(SVM). Formally,inthecaseof l 2 /l 2 -margin,wehaveourdatanormalizedin l 2 normandwewanttosolve the following problem min w 2 2 (Notethattheconstraintis 1insteadof 0here.) s.t. w x j 1 j (2) 4

5 Astheaboveprogramisconvex,wecansolveitefficiently.(Infact,thereareveryefficientmethods specializedinsolvingsuchsvms.) Onecanshow seebelow thatindeedtheoptimumsolutionto thisprogramgivesamaximum l 2 /l 2 -marginclassifier.also,onecanusedualityargumentstoshowthat - similarly to the case of Perceptron and Winnow algorithm- this optimal classifier is a combination ofsomeofthedatapoints x j. Thesepointsareoftencalledsupportvector(asthey support the separating hyperplane), which explains the name of SVMs. Lemma2Let w betheoptimumsolutiontothesvm(2).the l 2 /l 2 -marginachievedby w isequal to γ P and w 2 2 = 1/γ 2 P. Proof First,weshowthat w 1 isacorrectclassifierswith l 2 /l 2 -marginbeingatleast w 2. Thisfollows easilybynotingthat,forany j, w x j 1 w 2 x j 2 w 2, whereweusedthefeasibilityof w andthefactthatall x j sarenormalized. w Ontheotherhand,if wisacorrectclassifierwith l 2 /l 2 -margin γthen γ w 2 isafeasiblesolutionto thesvm(2).toseethis,notethatforany j w x j w x j w 2 = 1. γ w 2 min j w x j w 2 Also, the objective value corresponding to w γ w 2 is 1/γ 2.Thelemmanowfollowsbycombiningthetwo above facts. Although at first glance very appealing, in practice, looking for the maximum margin classifier that correctlyclassifiesallthedatamightnotbethebestidea.thisissoasthereal-worlddataisusually noisy and just a few outliers might either render the data not linearly separable, or at least significantly reduce the margin of the best linear classifier. Therefore, instead of treating the correct classification ofalldatapointsasahardconstraint,onemightprefertobeabletoallowoneselftomisclassifysome samples if this enables separation of the remaining data with large margin. This type of soft trade-off between the correctness and quality of the margin can be very conveniently expressedinsvmsframework,byintroducingso-calledslackvariables j foreachdatapoint.formally, one consider the following augmented optimization problem: min w 2 2 +C j j s.t. w x j 1 j j j 0 j Here,theparameter Cisusedtoweighthesizeofthemarginoftheobtainedclassifieragainstits classificationerrormeasuredbythetotalhingeloss.clearly,ifweset Ctobeverylarge,weessentially recoverthesettingofsvmin(2). 5

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012 CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,

More information

More about the Perceptron

More about the Perceptron More about the Perceptron CMSC 422 MARINE CARPUAT marine@cs.umd.edu Credit: figures by Piyush Rai and Hal Daume III Recap: Perceptron for binary classification Classifier = hyperplane that separates positive

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

The Perceptron Algorithm 1

The Perceptron Algorithm 1 CS 64: Machine Learning Spring 5 College of Computer and Information Science Northeastern University Lecture 5 March, 6 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu Introduction The Perceptron

More information

CS 395T Computational Learning Theory. Scribe: Rahul Suri

CS 395T Computational Learning Theory. Scribe: Rahul Suri CS 395T Computational Learning Theory Lecture 6: September 19, 007 Lecturer: Adam Klivans Scribe: Rahul Suri 6.1 Overview In Lecture 5, we showed that for any DNF formula f with n variables and s terms

More information

1 Review of the Perceptron Algorithm

1 Review of the Perceptron Algorithm COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #15 Scribe: (James) Zhen XIANG April, 008 1 Review of the Perceptron Algorithm In the last few lectures, we talked about various kinds

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

Lecture 4: Linear predictors and the Perceptron

Lecture 4: Linear predictors and the Perceptron Lecture 4: Linear predictors and the Perceptron Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 4 1 / 34 Inductive Bias Inductive bias is critical to prevent overfitting.

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by

More information

Linear Classifiers. Michael Collins. January 18, 2012

Linear Classifiers. Michael Collins. January 18, 2012 Linear Classifiers Michael Collins January 18, 2012 Today s Lecture Binary classification problems Linear classifiers The perceptron algorithm Classification Problems: An Example Goal: build a system that

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

CS 590D Lecture Notes

CS 590D Lecture Notes CS 590D Lecture Notes David Wemhoener December 017 1 Introduction Figure 1: Various Separators We previously looked at the perceptron learning algorithm, which gives us a separator for linearly separable

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. 1 The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. The algorithm applies only to single layer models

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning 0.1 Linear Functions So far, we have been looking at Linear Functions { as a class of functions which can 1 if W1 X separate some data and not

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Support Vector Machines. Machine Learning Fall 2017

Support Vector Machines. Machine Learning Fall 2017 Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce

More information

Single layer NN. Neuron Model

Single layer NN. Neuron Model Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent

More information

Minimax risk bounds for linear threshold functions

Minimax risk bounds for linear threshold functions CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability

More information

SVMs: nonlinearity through kernels

SVMs: nonlinearity through kernels Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly

More information

Linear Classification: Perceptron

Linear Classification: Perceptron Linear Classification: Perceptron Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 18 Y Tao Linear Classification: Perceptron In this lecture, we will consider

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information

Online Learning With Kernel

Online Learning With Kernel CS 446 Machine Learning Fall 2016 SEP 27, 2016 Online Learning With Kernel Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Stochastic Gradient Descent Algorithms Regularization Algorithm Issues

More information

HOMEWORK 4: SVMS AND KERNELS

HOMEWORK 4: SVMS AND KERNELS HOMEWORK 4: SVMS AND KERNELS CMU 060: MACHINE LEARNING (FALL 206) OUT: Sep. 26, 206 DUE: 5:30 pm, Oct. 05, 206 TAs: Simon Shaolei Du, Tianshu Ren, Hsiao-Yu Fish Tung Instructions Homework Submission: Submit

More information

Lecture 3: Multiclass Classification

Lecture 3: Multiclass Classification Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

4.1 Online Convex Optimization

4.1 Online Convex Optimization CS/CNS/EE 53: Advanced Topics in Machine Learning Topic: Online Convex Optimization and Online SVM Lecturer: Daniel Golovin Scribe: Xiaodi Hou Date: Jan 3, 4. Online Convex Optimization Definition 4..

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Recap from end of last time

Recap from end of last time Topics in Machine Learning Theory Avrim Blum 09/03/4 Lecture 3: Shifting/Sleeping Experts, the Winnow Algorithm, and L Margin Bounds Recap from end of last time RWM (multiplicative weights alg) p i = w

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Support Vector Machines

Support Vector Machines Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification

More information

Security Analytics. Topic 6: Perceptron and Support Vector Machine

Security Analytics. Topic 6: Perceptron and Support Vector Machine Security Analytics Topic 6: Perceptron and Support Vector Machine Purdue University Prof. Ninghui Li Based on slides by Prof. Jenifer Neville and Chris Clifton Readings Principle of Data Mining Chapter

More information

SGN (4 cr) Chapter 5

SGN (4 cr) Chapter 5 SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Lecture 3 January 28

Lecture 3 January 28 EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Support Vector and Kernel Methods

Support Vector and Kernel Methods SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Training the linear classifier

Training the linear classifier 215, Training the linear classifier A natural way to train the classifier is to minimize the number of classification errors on the training data, i.e. choosing w so that the training error is minimized.

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

CS269: Machine Learning Theory Lecture 16: SVMs and Kernels November 17, 2010

CS269: Machine Learning Theory Lecture 16: SVMs and Kernels November 17, 2010 CS269: Machine Learning Theory Lecture 6: SVMs and Kernels November 7, 200 Lecturer: Jennifer Wortman Vaughan Scribe: Jason Au, Ling Fang, Kwanho Lee Today, we will continue on the topic of support vector

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

More information

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Supervised Learning / Support Vector Machines

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

In the Name of God. Lecture 11: Single Layer Perceptrons

In the Name of God. Lecture 11: Single Layer Perceptrons 1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Lecture 11. Linear Soft Margin Support Vector Machines

Lecture 11. Linear Soft Margin Support Vector Machines CS142: Machine Learning Spring 2017 Lecture 11 Instructor: Pedro Felzenszwalb Scribes: Dan Xiang, Tyler Dae Devlin Linear Soft Margin Support Vector Machines We continue our discussion of linear soft margin

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

1 Review of Winnow Algorithm

1 Review of Winnow Algorithm COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm

More information

CSC321 Lecture 4 The Perceptron Algorithm

CSC321 Lecture 4 The Perceptron Algorithm CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:

More information

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Hidden Markov Models (HMM) and Support Vector Machine (SVM) Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)

More information

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37 COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of

More information

The Perceptron Algorithm

The Perceptron Algorithm The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS

FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS LINEAR CLASSIFIER 1 FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS x f y High Value Customers Salary Task: Find Nb Orders 150 70 300 100 200 80 120 100 Low Value Customers Salary Nb Orders 40 80 220

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

Online Passive- Aggressive Algorithms

Online Passive- Aggressive Algorithms Online Passive- Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 Tutor: Eneldo Loza Mencía Seminar aus Maschinellem Lernen WS07/08 Overview Online algorithms Online Binary Classification Problem Perceptron

More information