Midterms. PAC Learning SVM Kernels+Boost Decision Trees. MultiClass CS446 Spring 17

Size: px
Start display at page:

Download "Midterms. PAC Learning SVM Kernels+Boost Decision Trees. MultiClass CS446 Spring 17"

Transcription

1 Midterms PAC Learning SVM Kernels+Boost Decision Trees 1

2 Grades are on a curve Midterms Will be available at the TA sessions this week Projects feedback has been sent. Recall that this is 25% of your grade! 2

3 Classification So far we focused on Binary Classification For linear models: Perceptron, Winnow, SVM, GD, SGD The prediction is simple: Given an example x, Prediction = sgn(w T x) Where w is the learned model The output is a single bit 3

4 Multi-Categorical Output Tasks Multi-class Classification (y {1,...,K}) character recognition ( 6 ) document classification ( homepage ) Multi-label Classification (y {1,...,K}) document classification ( (homepage,facultypage) ) Category Ranking (y (K)) user preference ( (love > like > hate) ) document classification ( hompage > facultypage > sports ) Hierarchical Classification (y {1,..,K}) cohere with class hierarchy place document into index where soccer is-a sport 4

5 Setting Learning: Given a data set D = {(x i, y i )} m 1 Where x i 2 R n, y i 2 {1,2,,k}. Prediction (inference): Given an example x, and a learned function (model), Output a single class labels y. 5

6 Binary to Multiclass Most schemes for multiclass classification work by reducing the problem to that of binary classification. There are multiple ways to decompose the multiclass prediction into multiple binary decisions One-vs-all All-vs-all Error correcting codes We will then talk about a more general scheme: Constraint Classification It can be used to model other non-binary classification schemes and leads to Structured Prediction. 6

7 One-Vs-All Assumption: Each class can be separated from all the rest using a binary classifier in the hypothesis space. Learning: Decomposed to learning k independent binary classifiers, one for each class label. Learning: Let D be the set of training examples. 8 label l, construct a binary classification problem as follows: Positive examples: Elements of D with label l Negative examples: All other elements of D This is a binary learning problem that we can solve, producing k binary classifiers w 1, w 2, w k Decision: Winner Takes All (WTA): f(x) = argmax i w it x 7

8 Solving MultiClass with 1vs All learning MultiClass classifier Function f : R n {1,2,3,...,k} Decompose into binary problems Not always possible to learn No theoretical justification Need to make sure the range of all classifiers is the same (unless the problem is easy) MultiClass CS446 Spring 17 8

9 Learning via One-Versus-All (OvA) Assumption Find v r,v b,v g,v y R n such that v r.x > 0 iff y = red v b.x > 0 iff y = blue v g.x > 0 iff y = green v y.x > 0 iff y = yellow Classification: f(x) = argmax i v i x H = R nk Real Problem MultiClass CS446 Spring 17 9

10 All-Vs-All Assumption: There is a separation between every pair of classes using a binary classifier in the hypothesis space. Learning: Decomposed to learning [k choose 2] ~ k 2 independent binary classifiers, one corresponding to each pair of class labels. For the pair (i, j): Positive example: all exampels with label i Negative examples: all examples with label j Decision: More involved, since output of binary classifier may not cohere. Each label gets k-1 votes. Decision Options: Majority: classify example x to take label i if i wins on x more often than j (j=1, k) A tournament: start with n/2 pairs; continue with winners. 10

11 Learning via All-Verses-All (AvA) Assumption Find v rb,v rg,v ry,v bg,v by,v gy R d such that v rb.x > 0 if y = red < 0 if y = blue v rg.x > 0 if y = red < 0 if y = green... (for all pairs) It is possible to separate all k classes with the O(k 2 ) classifiers H = R kkn How to classify? Individual Classifiers Decision Regions MultiClass CS446 Spring 17 11

12 Classifying with AvA Tournament Majority Vote 1 red, 2 yellow, 2 green? All are post-learning and might cause weird stuff MultiClass CS446 Spring 17 12

13 One-vs-All vs. All vs. All Assume m examples, k class labels. For simplicity, say, m/k in each. One vs. All: classifier f i : m/k (+) and (k-1)m/k (-) Decision: Evaluate k linear classifiers and do Winner Takes All (WTA): All vs. All: f(x) = argmax i f i (x) = argmax i w it x Classifier f ij : m/k (+) and m/k (-) More expressivity, but less examples to learn from. Decision: Evaluate k 2 linear classifiers; decision sometimes unstable. What type of learning methods would prefer All vs. All (efficiency-wise)? (Think about Dual/Primal) MultiClass CS446 Spring 17 13

14 Error Correcting Codes Decomposition 1-vs-all uses k classifiers for k labels; can you use only log 2 k? Reduce the multi-class classification to random binary problems. Choose a code word for each label. K=8: all we need is 3 bits, three classifiers Rows: An encoding of each class (k rows) Columns: L dichotomies of the data, each corresponds to a new classification problem Extreme cases: Label P1 P2 P3 P4 1-vs-all: k rows, k columns k rows log 2 k columns Each training example is mapped to one example per column (x,3) {(x,p1), +; (x,p2), -; (x,p3), -; (x,p4), +} To classify a new example x: Evaluate hypothesis on the 4 binary problems {(x,p1), (x,p2), (x,p3), (x,p4),} Choose label that is most consistent with the results. Use Hamming distance (bit-wise distance) Nice theoretical results as a function of the performance of the P i s (depending on code & size) Potential Problems? Can you separate any dichotomy? k

15 Problems with Decompositions Learning optimizes over local metrics Does not guarantee good global performance We don t care about the performance of the local classifiers Poor decomposition poor performance Difficult local problems Irrelevant local problems Especially true for Error Correcting Output Codes Another (class of) decomposition Difficulty: how to make sure that the resulting problems are separable. Efficiency: e.g., All vs. All vs. One vs. All Former has advantage when working with the dual space. Not clear how to generalize multi-class to problems with a very large # of output variables. 15

16 1 Vs All: Learning Architecture k label nodes; n input features, nk weights. Evaluation: Winner Take All Training: Each set of n weights, corresponding to the i-th label, is trained Independently, given its performance on example x, and Independently of the performance of label j on x. Hence: Local learning; only the final decision is global, (Winner Takes All (WTA)). However, this architecture allows multiple learning algorithms; e.g., see the implementation in the SNoW/LbJava Multi-class Classifier Targets (each an LTU) Weighted edges (weight vectors) Features MultiClass CS446 Spring 17 21

17 Recall: Winnow s Extensions Winnow learns monotone Boolean functions We extended to general Boolean functions via Balanced Winnow 2 weights per variable; Decision: using the effective weight, the difference between w + and w - This is equivalent to the Winner take all decision Positive w + Learning: In principle, it is possible to use the 1-vs-all rule and update each set of n weights separately, but we suggested the balanced Update rule that takes into account how both sets of n weights predict on example x If [(w w ) x ] y, w i w i r y x i, w i w i r y x i Negative w - Can this be generalized to the case of k labels, k >2? We need a global learning approach 22

18 Extending Balanced In a 1-vs-all training you have a target node that represents positive examples and target node that represents negative examples. Typically, we train each node separately (mine/not-mine example). Rather, given an example we could say: this is more a + example than a example. If [(w w ) x ] y, w i w i r y x i, w i w i r y x i We compared the activation of the different target nodes (classifiers) on a given example. (This example is more class + than class -) Can this be generalized to the case of k labels, k >2? 23

19 Where are we? Introduction Combining binary classifiers One-vs-all All-vs-all Error correcting codes Training a single (global) classifier Multiclass SVM Constraint classification 24

20 Recall: Margin for binary classifiers The margin of a hyperplane for a dataset is the distance between the hyperplane and the data point nearest to it Margin with respect to this hyperplane 25

21 Multiclass Margin 26

22 Multiclass SVM (Intuition) Recall: Binary SVM Maximize margin Equivalently, Minimize norm of weight vector, while keeping the closest points to the hyperplane with a score 1 Multiclass SVM Each label has a different weight vector (like one-vs-all) Maximize multiclass margin Equivalently, Minimize total norm of the weight vectors while making sure that the true label scores at least 1 more than the second best one. 27

23 Multiclass SVM in the separable case Recall hard binary SVM Size of the weights. Effectively, regularizer The score for the true label is higher than the score for any other label by 1 28

24 Multiclass SVM: General case Size of the weights. Effectively, regularizer Total slack. Effectively, don t allow too many examples to violate the margin constraint The score for the true label is higher than the score for any other label by 1 Slack variables. Not all examples need to satisfy the margin constraint. Slack variables can only be positive 29

25 Multiclass SVM: General case Size of the weights. Effectively, regularizer Total slack. Effectively, don t allow too many examples to violate the margin constraint The score for the true label is higher than the score for any other label by 1 -» i Slack variables. Not all examples need to satisfy the margin constraint. Slack variables can only be positive 30

26 Multiclass SVM Generalizes binary SVM algorithm If we have only two classes, this reduces to the binary (up to scale) Comes with similar generalization guarantees as the binary SVM Can be trained using different optimization methods Stochastic sub-gradient descent can be generalized Try as exercise 31

27 Multiclass SVM: Summary Training: Optimize the global SVM objective Prediction: Winner takes all argmax i w it x With K labels and inputs in < n, we have nk weights in all Same as one-vs-all Why does it work? Why is this the right definition of multiclass margin? A theoretical justification, along with extensions to other algorithms beyond SVM is given by Constraint Classification Applies also to multi-label problems, ranking problems, etc. [Dav Zimak; with D. Roth and S. Har-Peled] 32

28 Constraint Classification The examples we give the learner are pairs (x,y), y 2 {1, k} The black box learner (1 vs. all) we described might be thought of as a function of x only but, actually, we made use of the labels y How is y being used? y decides what to do with the example x; that is, which of the k classifiers should take the example as a positive example (making it a negative to all the others). How do we predict? Is it better in any well defined way? Let: f y (x) = w yt x Then, we predict using: y * = argmax y=1, k f y (x) Equivalently, we can say that we predict as follows: Predict y iff 8 y 2 {1, k}, y :=y (w yt w y T ) x 0 (**) So far, we did not say how we learn the k weight vectors w y (y = 1, k) Can we train in a way that better fits the way we predict? What does it mean? 33

29 We showed: if pairs of labels are separable (a reasonable assumption) than in some higher dimensional space, the problem is linearly separable. Linear Separability for Multiclass We are learning k n-dimensional weight vectors, so we can concatenate the k weight vectors into Notice: This is just a representational w= (w 1, w 2, w k ) 2 R nk trick. We did not say how to learn the weight vectors. Key Construction: (Kesler Construction; Zimak s Constraint Classification) We will represent each example (x,y), as an nk-dimensional vector, x y, with x embedded in the y-th part of it (y=1,2, k) and the other coordinates are 0. E.g., x y = (0,x,0,0) R kn (here k=4, y=2) Now we can understand the n-dimensional decision rule: Predict y iff 8 y 2 {1, k}, y :=y (w yt w y T ) x 0 (**) Equivalently, in the nk-dimensional space. Predict y iff 8 y 2 {1, k}, y :=y w T (x y x y ) w T x yy 0 Conclusion: The set (x yy, + ) (x y x y, +) is linearly separable from the set (-x yy, - ) using the linear separator w 2 R kn We solved the voroni diagram challenge. 34

30 Constraint Classification Training: [We first explain via Kesler s construction; then show we don t need it] Given a data set {(x,y)}, (m examples) with x 2 R n, y 2 {1,2, k} create a binary classification task (in R kn ): (x y - x y, +), (x y x y -), for all y : = y (2m(k-1) examples) Here x y 2 R kn Use your favorite linear learning algorithm to train a binary classifier. Prediction: Given an nk dimensional weight vector w and a new example x, predict: argmax y w T x y 35

31 Transform Examples 2>1 2>3 2>4 Details: Kesler Construction & Multi-Class Separability If (x,i) was a given n- dimensional example (that is, x has 2>4 is labeled 2>1 i, then x ij, 8 j=1, k, j:= i, are positive examples in the nk-dimensional space. 2>3 x ij are negative examples. i>j f i (x) - f j (x) > 0 w i x - w j x > 0 W X i - W X j > 0 W (X i - X j ) > 0 W X ij > 0 X i = (0,x,0,0) R kd X j = (0,0,0,x) R kd X ij = X i - X j = (0,x,0,-x) W = (w 1,w 2,w 3,w 4 ) R kd 36

32 Kesler s Construction (1) y = argmax i=(r,b,g,y) w i.x w i, x R n Find w r,w b,w g,w y R n such that w r.x > w b.x w r.x > w g.x w r.x > w y.x H = R kn 37

33 Kesler s Construction (2) Let w = (w r,w b,w g,w y ) R kn Let 0 n, be the n-dim zero vector x -x -x x w r.x > w b.x w.(x,-x,0 n,0 n ) > 0 w.(-x,x,0 n,0 n ) < 0 w r.x > w g.x w.(x,0 n,-x,0 n ) > 0 w.(-x,0 n,x,0 n ) < 0 w r.x > w y.x w.(x,0 n,0 n,-x) > 0 w.(-x,0 n,0 n,x) < 0 38

34 Kesler s Construction (3) Let w = (w 1,..., w k ) R n x... x R n = R kn x ij = (0 (i-1)n, x, 0 (k-i)n ) (0 (j-1)n, x, 0 (k-j)n ) R kn x Given (x, y) R n x {1,...,k} For all j y (all other labels) Add to P + (x,y), (x yj, 1) Add to P - (x,y), ( x yj, -1) -x P + (x,y) has k-1 positive examples ( R kn ) P - (x,y) has k-1 negative examples ( R kn ) 39

35 Learning via Kesler s Construction Given (x 1, y 1 ),..., (x N, y N ) R n x {1,...,k} Create P + = P + (x i,y i ) P = P (x i,y i ) Find w = (w 1,..., w k ) R kn, such that w.x separates P + from P One can use any algorithm in this space: Perceptron, Winnow, SVM, etc. To understand how to update the weight vector in the n-dimensional space, we note that w T x yy 0 (in the nk-dimensional space) is equivalent to: (w yt w y T ) x 0 (in the n-dimensional space) MultiClass CS446 Spring 17 40

36 Perceptron in Kesler Construction A perceptron update rule applied in the nk-dimensional space due to a mistake in w T x ij 0 Or, equivalently to (w it w jt ) x 0 (in the n-dimensional space) Implies the following update: Given example (x,i) (example x 2 R n, labeled i) 8 (i,j), i,j = 1, k, i := j (***) If (w it - w jt ) x < 0 (mistaken prediction; equivalent to w T x ij < 0 ) w i w i +x (promotion) and w j w j x (demotion) Note that this is a generalization of balanced Winnow rule. Note that we promote w i and demote k-1 weight vectors w j 41

37 Conservative update The general scheme suggests: Given example (x,i) (example x 2 R n, labeled i) 8 (i,j), i,j = 1, k, i := j (***) If (w it - w jt ) x < 0 (mistaken prediction; equivalent to w T x ij < 0 ) w i w i +x (promotion) and w j w j x (demotion) Promote w i and demote k-1 weight vectors w j A conservative update: (SNoW and LBJava s implementation): In case of a mistake: only the weights corresponding to the target node i and that closest node j are updated. Let: j* = argmax j=1, k w jt x (highest activation among competing labels) If (w it w T j* ) x < 0 (mistaken prediction) w i w i +x (promotion) and w j* w j* x (demotion) Other weight vectors are not being updated. 42

38 Multiclass Classification Summary 1: Multiclass Classification From the full dataset, construct three binary classifiers, one for each class w bluet x > 0 for blue inputs w orgt x > 0 for orange inputs w blackt x > 0 for black inputs Notation: Score for blue label Winner Take All will predict the right answer. Only the correct label will have a positive score 43

39 Multiclass Classification Summary 2: One-vs-all may not always work Red points are not separable with a single binary classifier The decomposition is not expressive enough! w bluet x > 0 for blue inputs w orgt x > 0 for orange inputs w blackt x > 0 for black inputs??? 44

40 Easy to learn Summary 3: Local Learning: One-vs-all classification Use any binary classifier learning algorithm Potential Problems Calibration issues We are comparing scores produced by K classifiers trained independently. No reason for the scores to be in the same numerical range! Train vs. Train Does not account for how the final predictor will be used Does not optimize any global measure of correctness Yet, works fairly well In most cases, especially in high dimensional problems (everything is already linearly separable). 45

41 Summary 4: Global Multiclass Approach [Constraint Classification, Har-Peled et. al 02] Create K classifiers w 1, w 2,, w K. ; Predict with WTA: argmax i w it x But, train differently: For examples with label i, we want w it x > w jt x for all j Training: For each training example (x i, y i ) : y arg max j w j T φ(x i, y i ) if y y i w yi w yi + ηx i w y w y ηx i (promote) (demote) η: learning rate 46

42 Significance The hypothesis learned above is more expressive than when the OvA assumption is used. Any linear learning algorithm can be used, and algorithmic-specific properties are maintained (e.g., attribute efficiency if using winnow.) E.g., the multiclass support vector machine can be implemented by learning a hyperplane to separate P(S) with maximal margin. As a byproduct of the linear separability observation, we get a natural notion of a margin in the multi-class case, inherited from the binary separability in the nk-dimensional space. Given example x ij 2 R nk, margin(x ij,w) = min ij w T x ij Consequently, given x 2 R n, labeled i margin(x,w) = min j (w it - w jt ) x 47

43 Margin MultiClass CS446 Spring 17 48

44 Multiclass Margin MultiClass CS446 Spring 17 49

45 Constraint Classification The scheme presented can be generalized to provide a uniform view for multiple types of problems: multi-class, multi-label, categoryranking Reduces learning to a single binary learning task Captures theoretical properties of binary algorithm Experimentally verified Naturally extends Perceptron, SVM, etc... It is called constraint classification since it does it all by representing labels as a set of constraints or preferences among output labels. 50

46 Multi-category to Constraint Classification The unified formulation is clear from the following examples: Multiclass Just like in the multiclass we learn one w (x, A) (x, (A>B, A>C, A>D) ) i 2 R n for each label, the same is done for Multilabel multi-label and ranking. The (x, (A, B)) (x, ( (A>C, A>D, B>C, B>D) ) weight vectors are updated according with the Label Ranking requirements from y 2 S k (x, (5>4>3>2>1)) (x, ( (5>4, 4>3, 3>2, 2>1) ) In all cases, we have examples (x,y) with y S k Where S k : partial order over class labels {1,...,k} defines preference relation ( > ) for class labeling Consequently, the Constraint Classifier is: h: X S k h(x) is a partial order h(x) is consistent with y if (i<j) y (i<j) h(x) (Consult the Perceptron in Kesler construction slide) 51

47 Properties of Construction (Zimak et. al 2002, 2003) Can learn any argmax v i.x function (even when i isn t linearly separable from the union of the others) Can use any algorithm to find linear separation Perceptron Algorithm ultraconservative online algorithm [Crammer, Singer 2001] Winnow Algorithm multiclass winnow [ Masterharm 2000 ] Defines a multiclass margin by binary margin in R kd multiclass SVM [Crammer, Singer 2001] 52

48 Margin Generalization Bounds Linear Hypothesis space: h(x) = argsort v i.x v i, x R d argsort returns permutation of {1,...,k} CC margin-based bound = min (x,y)s min (i < j)y v i.x v j.x err D (h) C R 2 m ln() 2 m - number of examples R - max x x - confidence C - average # constraints 53

49 VC-style Generalization Bounds Linear Hypothesis space: h(x) = argsort v i.x v i, x R d argsort returns permutation of {1,...,k} CC VC-based bound err D (h) err(s,h) m - number of examples d - dimension of input space delta - confidence k - number of classes kdlog(mk /d) ln m Performance: even though this is the right thing to do, and differences can be observed in low dimensional cases, in high dimensional cases, the impact is not always significant. 54

50 Beyond MultiClass Classification Ranking category ranking (over classes) ordinal regression (over examples) Multilabel x is both red and blue Complex relationships x is more red than blue, but not green Millions of classes sequence labeling (e.g. POS tagging) The same algorithms can be applied to these problems, namely, to Structured Prediction This observation is the starting point for CS546. MultiClass CS446 Spring 17 55

51 (more) Multi-Categorical Output Tasks Sequential Prediction (y {1,...,K} + ) e.g. POS tagging ( (NVNNA) ) This is a sentence. D V D N e.g. phrase identification Many labels: K L for length L sentence Structured Output Prediction (y C({1,...,K} + )) e.g. parse tree, multi-level phrase identification e.g. sequential prediction Constrained by domain, problem, data, background knowledge, etc... MultiClass CS446 Spring 17 56

52 Semantic Role Labeling: A Structured-Output Problem For each verb in a sentence 1. Identify all constituents that fill a semantic role 2. Determine their roles Core Arguments, e.g., Agent, Patient or Instrument Their adjuncts, e.g., Locative, Temporal or Manner A0 : leaver A2 : benefactor I left my pearls to my daughter-in-law in my will. A1 : thing left AM-LOC MultiClass CS446 Spring 17 57

53 Semantic Role Labeling Just like in the multiclass case we can think about local vs. global predictions. Local: each component learned separately, w/o thinking about other components. Global: learn to predicting the whole structure. Algorithm: essentially the same as CC A0 - A1 A2 AM-LOC I left my pearls to my daughter-in-law in my will. Many possible valid outputs Many possible invalid outputs Typically, one correct output (per input) MultiClass CS446 Spring 17 58

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Lecture 3: Multiclass Classification

Lecture 3: Multiclass Classification Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

ML4NLP Multiclass Classification

ML4NLP Multiclass Classification ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning 0.1 Linear Functions So far, we have been looking at Linear Functions { as a class of functions which can 1 if W1 X separate some data and not

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Learning and Inference over Constrained Output

Learning and Inference over Constrained Output Learning and Inference over Constrained Output Vasin Punyakanok Dan Roth Wen-tau Yih Dav Zimak Department of Computer Science University of Illinois at Urbana-Champaign {punyakan, danr, yih, davzimak}@uiuc.edu

More information

Support Vector Machines. Machine Learning Fall 2017

Support Vector Machines. Machine Learning Fall 2017 Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

CS446: Machine Learning Fall Final Exam. December 6 th, 2016 CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Support Vector and Kernel Methods

Support Vector and Kernel Methods SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Kernelized Perceptron Support Vector Machines

Kernelized Perceptron Support Vector Machines Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Multiclass Classification

Multiclass Classification Multiclass Classification David Rosenberg New York University March 7, 2017 David Rosenberg (New York University) DS-GA 1003 March 7, 2017 1 / 52 Introduction Introduction David Rosenberg (New York University)

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

COMP 875 Announcements

COMP 875 Announcements Announcements Tentative presentation order is out Announcements Tentative presentation order is out Remember: Monday before the week of the presentation you must send me the final paper list (for posting

More information

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Classification and Pattern Recognition

Classification and Pattern Recognition Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Machine Learning Lecture 6 Note

Machine Learning Lecture 6 Note Machine Learning Lecture 6 Note Compiled by Abhi Ashutosh, Daniel Chen, and Yijun Xiao February 16, 2016 1 Pegasos Algorithm The Pegasos Algorithm looks very similar to the Perceptron Algorithm. In fact,

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Multiclass and Introduction to Structured Prediction

Multiclass and Introduction to Structured Prediction Multiclass and Introduction to Structured Prediction David S. Rosenberg New York University March 27, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 27, 2018 1 / 49 Contents

More information

Machine Learning Foundations

Machine Learning Foundations Machine Learning Foundations ( 機器學習基石 ) Lecture 11: Linear Models for Classification Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going

More information

Predicting Sequences: Structured Perceptron. CS 6355: Structured Prediction

Predicting Sequences: Structured Perceptron. CS 6355: Structured Prediction Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1 Conditional Random Fields summary An undirected graphical model Decompose the score over the structure into a collection of

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification

More information

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Warm up. Regrade requests submitted directly in Gradescope, do not  instructors. Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required

More information

Multiclass and Introduction to Structured Prediction

Multiclass and Introduction to Structured Prediction Multiclass and Introduction to Structured Prediction David S. Rosenberg Bloomberg ML EDU November 28, 2017 David S. Rosenberg (Bloomberg ML EDU) ML 101 November 28, 2017 1 / 48 Introduction David S. Rosenberg

More information

Online Learning With Kernel

Online Learning With Kernel CS 446 Machine Learning Fall 2016 SEP 27, 2016 Online Learning With Kernel Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Stochastic Gradient Descent Algorithms Regularization Algorithm Issues

More information

ALGORITHMS AND ANALYSIS FOR MULTI-CATEGORY CLASSIFICATION

ALGORITHMS AND ANALYSIS FOR MULTI-CATEGORY CLASSIFICATION i ALGORITHMS AND ANALYSIS FOR MULTI-CATEGORY CLASSIFICATION Dav Arthur Zimak, Ph.D. Department of Computer Science University of Illinois at Urbana-Champaign, 2006 Dan Roth, Adviser Classification problems

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation

More information

Kernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin

Kernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin Kernels Machine Learning CSE446 Carlos Guestrin University of Washington October 28, 2013 Carlos Guestrin 2005-2013 1 Linear Separability: More formally, Using Margin Data linearly separable, if there

More information

Dan Roth 461C, 3401 Walnut

Dan Roth   461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Part of the slides are adapted from Ziko Kolter

Part of the slides are adapted from Ziko Kolter Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

CS446: Machine Learning Spring Problem Set 4

CS446: Machine Learning Spring Problem Set 4 CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

More information

Binary Classification / Perceptron

Binary Classification / Perceptron Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data

More information

Learning from Examples

Learning from Examples Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble

More information

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37 COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

CS221 / Autumn 2017 / Liang & Ermon. Lecture 2: Machine learning I

CS221 / Autumn 2017 / Liang & Ermon. Lecture 2: Machine learning I CS221 / Autumn 2017 / Liang & Ermon Lecture 2: Machine learning I cs221.stanford.edu/q Question How many parameters (real numbers) can be learned by machine learning algorithms using today s computers?

More information

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

(Kernels +) Support Vector Machines

(Kernels +) Support Vector Machines (Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning

More information

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Midterm Exam Solutions, Spring 2007

Midterm Exam Solutions, Spring 2007 1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Linear Classification and SVM. Dr. Xin Zhang

Linear Classification and SVM. Dr. Xin Zhang Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7. Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Foundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification page 2 Motivation Real-world problems often have multiple classes:

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Linear Classifiers: Expressiveness

Linear Classifiers: Expressiveness Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?

More information

CIS519: Applied Machine Learning Fall Homework 5. Due: December 10 th, 2018, 11:59 PM

CIS519: Applied Machine Learning Fall Homework 5. Due: December 10 th, 2018, 11:59 PM CIS59: Applied Machine Learning Fall 208 Homework 5 Handed Out: December 5 th, 208 Due: December 0 th, 208, :59 PM Feel free to talk to other members of the class in doing the homework. I am more concerned

More information