Supervised Learning of Non-binary Problems Part I: Multiclass Categorization via Output Codes

Size: px
Start display at page:

Download "Supervised Learning of Non-binary Problems Part I: Multiclass Categorization via Output Codes"

Transcription

1 Supervised Learning of Non-binary Problems Part I: Multiclass Categorization via Output Codes Yoram Singer Hebrew University singer Based on joint work with: Koby Crammer, Hebrew Univ. Rob Schapire, AT&T Labs Erin Allwein, SWRI NATO ASI on Learning Theory and Practice July K.U. Leuven Belgium

2 Problem Setting Given: multiclass data e.g.: predict children s favorite Teletubbies (name=guy, age=5, sex=m) (name=neta, age=2, sex=f) (name=dana, age=6, sex=f).. Given: binary (2-class) learning algorithm A x ( x 1, + ), ( x 2,-- ), ( x 3, + )... A h h(x) ε {--, + }

3 Problem Setting (cont.) Goal: use binary learning algorithm to build classifier for multiclass data ( x 1, ), ( x, ), (, )... 2 x 3 multiclass learner A x H H(x)ε {,,, }

4 The One-against-all Approach [folklore] break k-class problem into k binary problems and solve each separately how to combine predictions? x 1 x 1 x 1 + x 1 x 1 x 2 x 2 x 2 x 2 + x 2 x 3 x 3 x 3 x 3 x 3 + x 4 x 4 x 4 + x 4 x 4 x 5 x 5 + x 5 x 5 x 5 h 1 h 2 h 3 h 4 evaluate all h s and hope exactly one is + e.g.: if h 1 (x) = h 2 (x) = h 4 (x) = and h 3 (x) =+ then predict problem: will give incorrect prediction if only one h is wrong

5 The All-pairs Approach [Hastie & Tibshirani] create one binary problem for each pair of classes vs. vs. vs. vs. vs. vs. x 1 x 1 x 1 x 1 x 2 x 2 x 2 + x 2 + x 3 x 3 x 3 + x 3 x 4 x 4 x 4 x 4 x 5 x 5 + x 5 + x 5 + h 1 h 2 h 3 h 4 h 5 h 6 not obvious how to combine predictions

6 Error-correcting Output Codes reduce to binary using coding matrix M rows of M code words x 1 x 1 x 1 x 1 + x 1 + x 1 + x 2 x 2 + x 2 + x 2 x 2 x 2 x 3 x 3 + x 3 + x 3 + x 3 + x 3 x 4 x 4 x 4 x 4 + x 4 + x 4 + x 5 x 5 + x 5 x 5 + x 5 x 5 + h 1 h 2 h 3 h 4 h 5 binary problems [Dietterich & Bakiri] given x, choose closest row of M to h(x) = (h 1 (x),..., h 5 (x)) e.g.: if h(x) = (, +, +, +, ) then predict

7 ECOC (continued) if rows of M far from one another, will be highly robust to errors potentially much faster when k (# of classes) large disadvantage: binary problems may be unnatural and hard to solve

8 Outline general framework for studying such reductions from multiclass to binary particularly for margin-based binary learners decoding methods for combining predictions of binary classifiers Hamming decoding loss-based decoding general training error bounds generalization error bound when binary learner is AdaBoost finding a good output coding matrix discrete output codes continuous output codes learning continuous output codes using QP generalization error bound for continuous output codes experiments with discrete and continuous codes variation on a theme: multilabel problems

9 A (Slightly) More General Approach choose matrix M over { 1,, +1} x 1 x 1 x 1 x 1 + x 1 + x 2 x 2 + x 2 x 2 x 2 x 3 x 3 + x 3 + x 3 + x 4 x 4 x 4 x 4 + x 4 + x 5 x 5 x 5 x 5 + h 1 h 2 h 3 h 4 h 5 binary problems generalizes all three earlier approaches how to do decoding?

10 Hamming Decoding (generalized) Hamming distance between u, v { 1,, +1} l is (u, v) = 1 l l s=1 if u s = v s u s v s 1 if u s v s u s v s 1/2 if u s = v s =. e.g.: ( (+ +), (+ + +) ) = 1 5 ( ) to classify x evaluate h(x) = (h 1 (x),..., h l (x)) choose row r closest to h(x) in Hamming distance weakness: ignores confidence of predictions

11 Margin-based Binary Learning Algorithms typical binary classifier produces real-valued predictions i.e., h(x) rather than h(x) {, +} interpretation: sign(h(x)) = predicted class ( or +) h(x) = confidence margin of labeled example (x, y) is yh(x) yh(x) > correct prediction yh(x) = h(x) = confidence many learning algorithms attempt to find classifier h minimizing m i=1 L(y ih(x i )) where (x 1, y 1 ),..., (x m, y m ) is given training set L is loss function of margin

12 Examples of Margin-based Learning Algorithms algorithm L(z) support-vector machines max{1 z, } logistic regression ln(1 + e z ) AdaBoost e z least-squares regression (1 z) 2 C4.5 ln(1 + e z ) CART (1 z) 2

13 Loss-based Decoding can use instead of Hamming decoding when using margin-based learning algorithm to classify x evaluate h(x) = (h 1 (x),..., h l (x)) for each class r compute total loss that would have been suffered if x were labeled r l L(M(r, s) h s(x)) s=1 choose class r that minimizes total loss

14 Loss-based Decoding (example) M = h(x) = (1.9, 3.1, 4.2,.7, 6.2) compute: total loss for each class L( )+L( 3.1 )+L( )+L(.7)+L( 6.2 ) L( 1.9)+L( 3.1 )+L( 4.2 )+L( )+L( 6.2 ) L( 1.9 )+L( )+L( 4.2)+L(.7)+L( 6.2) L( 1.9 )+L( 3.1)+L( )+L(.7 )+L( ) Hamming decoding (sign(h(x)) = (+,, +, +, +)): predict loss-based decoding with L(z) = e z : predict

15 Training Error Bounds let ε be expected loss on training averaged over all l binary problems ε = 1 ml m i=1 l L(M(y i, s) h s (x i )) s=1 let ρ be minimum distance between pairs of rows of M ρ = min{ (M(r 1, ), M(r 2, )) : r 1 r 2 } assume L() = 1 and 1 2 (L(z) + L( z)) L() and other minor technical assumptions Theorem: For loss-based decoding: For Hamming decoding: # training mistakes ρ ε # training mistakes 2ε ρ

16 Proof Sketch Loss-based decoding incorrectly classifies an example (x, y): l L(M(r, s)f s(x)) l L(M(y, s)f s(x)) s=1 s=1 Define: S = {s : M(r, s) M(y, s) M(r, s) M(y, s) } S = {s : M(r, s) = M(y, s) = } z s = M(y, s)f s (x) z s = M(r, s)f s (x) Then: l s=1 L(z s) l s=1 L(z s) L(z s) s S S L(z s ) s S S

17 Proof Sketch (cont.) l L(z s) s=1 s S S L(z s ) 1 2 (L(z s) + L(z s )) s S S = 1 (L(z 2 s) + L(z s )) + 1 (L(z s S 2 s) + L(z s )) s S L() S + S = L() (M(r), M(y)) 2 ρl() a mistake on training example (x i, y i ) implies that l L(M(y i, s)f s (x i )) ρl() s=1 so the number of training mistakes is at most 1 m l ρl() L(M(y i, s)f s (x i )) i=1 s=1 and the training error is at most lε/(ρl())

18 Discussion want good performance on binary problems (so that ε small) want rows far apart from each other (so ρ large) ρ 1/2 for ECOC ρ = 2/k for one-against-all but: making ρ large may make binary problems more difficult to solve many zeros in matrix makes binary problems easier and faster to solve but: adding zeros tends to increase average error ε

19 Generalization Error for Boosting with Loss-based Decoding - Outline analyze performance on unseen data when binary learner is AdaBoost in binary case, use margins to analyze boosting: (Schapire, Freund, Bartlett, Lee 1998) larger empirical margins data imply better bound on generalization error boosting tends to increase training-data margins for loss-based decoding modify classification rule modifying definition of margin gives two-part analysis

20 Generalization Error Bounds for AdaBoost.MO Loss-based decoding for AdaBoost: H(x) = arg min y Y = arg min y Y l s=1 exp ( M(y, s)f(x, s)) l exp s=1 M(y, s) T α th t (x, s) t=1 Scaling f to [ 1, +1] using the transform x (1/η) ln(x/l): η = T t=1 α t f(x, s) = 1 T η α th t (x, s) t=1 ν(f, η, x, y) = 1 η ln 1 l l s=1 e ηm(y,s)f(x,s) H(x) = arg max y Y ν(f, η, x, y)

21 Generalization Error Bounds for AdaBoost.MO (cont.) Modified definition of margin: M f,η (x, y) = 1 ν(f, η, x, y) max 2 r y ν(f, η, x, r) H the base-hypothesis space, convex hull of H co(h) = f : x h α hh(x) α h, h α h = 1 Theorem: D distribution over X Y; S sample of m examples; base-classifier space H has VC-dimension d; coding matrix M; w.p. 1 δ every function f co(h) and η > satisfy [ Pr D M f,η (x, y) ] [ Pr S M f,η (x, y) θ ] + O 1 m d log 2 (lm/d) θ 2 + log(1/δ) 1/2

22 Experiments tested SVM on 8 benchmark problems using both decoding methods and 5 different coding matrices overall, loss-based decoding is better than Hamming decoding: 2 one against all 5 All Pairs.5 Complete.5 Random Dense 2 Random Sparse Test error difference 4 6 Test error difference 1 15 Test error difference.5 1 Test error difference Test error difference Number of Classes Number of Classes Number of Classes Number of Classes Number of Classes one-against-all is consistently worse than other methods but no clear pattern among other methods 35 one against all All Pairs 35 one against all Complete 35 one against all Random Dense 3 one against all Random Sparse Test error difference Test error difference Test error difference Test error difference Number of Classes Number of Classes Number of Classes Number of Classes

23 So far we assumed that the matrix M is predefined problems: all columns have the same influence in the decoding process but some binary problems might be simpler than others each binary classifier induces the same loss for all labels but some labels are easier to predict than others correlations between the binary classifiers are not taken into account but similar columns of M might result highly correlated classifiers change M in accordance to the predictions of the binary classifiers

24 Code design - discrete case The following problem is NP-complete: given: a training set S = {(x 1, y 1 ),..., (x m, y m )} a set of binary classifiers h(x) = (h 1 (x),..., h l ) error bound q find: a matrix M { 1, +1} k l such that the classifier H(x) based on M and h( ) makes at most q mistakes on S however, if l is fixed there might exist a poly-time algorithm for finding the matrix M which attains the smallest number of errors on S

25 Code design - continuous case analogously to the binary classifiers let M k l column t of M defines a partition of the labels into two disjoint sets sign(m r,t ) is the set (+1 or 1) to which the class r belongs M r,t is the confidence in the associated partition the classification rule H(x) = arg max r where M r is the rth row of M (h(x) Mr )

26 Continuous codes - example M = h(x) = (1.1, 3.1, 4.2,.7, 6.2) compute: similarity for each class predict:

27 Multiclass Margin Multiclass Margin Multiclass Margin Multiclass Margin Multiclass Margin Similarity (Similarity of correct label)-(largest simialiry of rest) > 1!!!!!! " " " " " " # $# $ % &% & Similarity Loss (Similarity of correct label)-(largest simialiry of rest) < 1 ' ' ' ' ' ' ( ( ( ( ( ( ) ) ) ) ) ) * * * * * * ,,,,,, / / / / / / Similarity (Similarity of correct label)-(largest simialiry of rest) < Loss replace the discrete multiclass loss with a piecewise-linear bound ɛ S = 1 m m i=1 [H(x i) yi] 1 m m i=1 max r {( h(xi) Mr) + 1 δyi,r} ( h(xi) Myi )

28 Optimization problem Minimize: β M p + m i=1 ξ i subject to the constraints: i, r : ( h(x i ) Myi ) ( h(x i ) Mr ) 1 δ yi,r ξ i l 1 l 2 l Primal Variables m + 2kl m + kl m + kl + 1 -Constraints Constraints km + 2kl km km + 2kl Dual Variables km + 2kl km km + 2kl -Constraints km + 2kl km km + 2kl Constraints m + 2kl m m + kl + 1

29 Constructing the code using the dual program we get M r = β 1 m i=1 [δ y i,r η i,r ] h(x i ) an example i participates in the solution for M if and only if {η i,1,..., η i,k } is not a point distribution concentrating on the correct label y i code is constructed from examples whose labels are hard to predict efficient algorithm: SPOC - Support Pattern Output Coding can be combined with Mercer kernels chooses a single example and efficiently solves a mini-qp problem constructs continuous code or multiclass SVM ( h(x) = x) can bound number of iterations needed to be ɛ close to optimal solution

30 Generalization Error for Continuous Codes and Multiclass SVM suppose we perfectly classify a random samples S of m examples using matrix M; then w.p 1 δ a bound on the generalization error is O R 2 m k M 2 log(m) + k 2 log m δ where R is the radius of the ball containing the support of S

31 Example 1 Training Set 1 Square vs Rest Star vs Rest 1 Circle vs Rest

32 Old and new decision boundaries Old New

33 Old and new codes.5 ID code matrix 1.5 New code matrix

34 Experiments 9 8 id BCH random 9 8 id BCH random 7 7 Relative performence % Relative performence % satimage shuttle mnist isolet letter vowel glass soybean 1 satimage shuttle mnist isolet letter vowel glass soybean 5 45 id BCH random 8 id BCH random 4 6 Relative performence % Relative performence % satimage shuttle mnist isolet letter vowel glass soybean 4 satimage shuttle mnist isolet letter vowel glass soybean

35 Multilabel Problems Joint work with Eleazar Eskin (Columbia) and Yair Weiss (HUJI) topics: Earnings Forecasts Share-Capital Debt-Issues Loans Credits Credit-Ratings Mergers Acquisitions Asset-Transfers... labeled documents: Doc#1: {Share-Capital, Credit-Ratings, Asset-Transfers} Doc#2: {Mergers, Acquisitions} Doc#3: {Earnings} Doc#4: {Earnings, Forecasts} Doc#5:... task find set of relevant topics for each new document

36 Multilabel problems - formal definition Instances x i X Set of possible topics Y = {1, 2,..., K} Relevant topics Labels Example t i 2 Y y i { 1, +1} K Doc#17 {Share-Capital, Acquisitions} Doc#17 (+ + ) Find f : X { 1, +1} K

37 Naïve Approach train K classifiers f i : X { 1, +1} f(x) = (f 1 (x),..., f K (x)) classifier i predicts whether topic i is relevant (+1) or not ( 1) apply each classifier independently to each new instance pros: simple cons: error prone does account for correlations between topics Employed by AdaBoost.MH and versions of SVM

38 Adding Fault Tolerance train classifiers that take topic correlations into account: article is about Mergers learn whether also about Acquisitions article is not about Mergers abstain obtain K 2 classifiers f i j : X { 1, +1} view each possible vector of labels in { 1, +1} K as a super-label build corresponding output code of size 2 K K 2 use loss-based decoding

39 Multilabel + ECOC: example three classes, nine classifiers output code = 8 9 f 1 1 f 2 2 f 3 3 f 1 2 f 2 3 f 2 1 f 2 3 f 3 1 f but... classification time is exponential in K

40 Bayes to the Rescue Define the probability of K labels corresponding to row r Note that if P (r x) = exp l L(M(r, s)f s(x) s=1 then l L(M(r, s)f s(x)) > l L(M(y, s)f s(x)) s=1 s=1 P (r x) < P (y x) construct belief net with loops from constraints use loopy belief propagation (Max-product) to find best K labels

41 Simulation Results ECOC Error Raw Error Apps in the works: multilabel text classification gene multi-functionality classification

42 Summary general framework for solving multiclass problems with margin classifiers using output codes general empirical loss analysis generalization error analysis when binary learner is AdaBoost experimental results support loss-based decoding design of continuous output codes efficient algorithm generalization analysis research directions online models for multiclass problems hierarchical classification and coding VC analysis of learning with output codes learning output codes using Bregman-Legendre projections

43 Based On T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. J. of AI Research, 2: , R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):1 4, E.L. Allwein, R.E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. J. of ML Research, 1: , 2. K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. J. of ML Research, 2: , 21. K. Crammer and Y. Singer. On the learnability and design of output codes for multiclass problems. Machine Learning, 47(2/3), pp , 22.

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Journal of Machine Learning Research () 3- Submitted /; Published / Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin L. Allwein Southwest Research Institute Culebra Road San

More information

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation

More information

Multiclass Learning by Probabilistic Embeddings

Multiclass Learning by Probabilistic Embeddings Multiclass Learning by Probabilistic Embeddings Ofer Dekel and Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {oferd,singer}@cs.huji.ac.il Abstract

More information

Foundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Multi-Class Classification Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation Real-world problems often have multiple classes: text, speech,

More information

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees

More information

Foundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification page 2 Motivation Real-world problems often have multiple classes:

More information

Multiclass Boosting with Repartitioning

Multiclass Boosting with Repartitioning Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006 Binary and Multiclass Problems Binary classification problems Y = { 1, 1} Multiclass classification problems Y

More information

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

CS446: Machine Learning Fall Final Exam. December 6 th, 2016 CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

Ensembles of Classifiers.

Ensembles of Classifiers. Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting

More information

Improved Boosting Algorithms Using Confidence-rated Predictions

Improved Boosting Algorithms Using Confidence-rated Predictions Improved Boosting Algorithms Using Confidence-rated Predictions schapire@research.att.com AT&T Labs, Shannon Laboratory, 18 Park Avenue, Room A79, Florham Park, NJ 793-971 singer@research.att.com AT&T

More information

Boosting: Foundations and Algorithms. Rob Schapire

Boosting: Foundations and Algorithms. Rob Schapire Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you

More information

2 Upper-bound of Generalization Error of AdaBoost

2 Upper-bound of Generalization Error of AdaBoost COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y

More information

Machine Learning in the Data Revolution Era

Machine Learning in the Data Revolution Era Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m

The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m ) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ

More information

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016 The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets

More information

Voting (Ensemble Methods)

Voting (Ensemble Methods) 1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers

More information

Learning Ensembles. 293S T. Yang. UCSB, 2017.

Learning Ensembles. 293S T. Yang. UCSB, 2017. Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,

More information

Learning Binary Classifiers for Multi-Class Problem

Learning Binary Classifiers for Multi-Class Problem Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

More information

ABC-Boost: Adaptive Base Class Boost for Multi-class Classification

ABC-Boost: Adaptive Base Class Boost for Multi-class Classification ABC-Boost: Adaptive Base Class Boost for Multi-class Classification Ping Li Department of Statistical Science, Cornell University, Ithaca, NY 14853 USA pingli@cornell.edu Abstract We propose -boost (adaptive

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Ensemble Methods for Machine Learning

Ensemble Methods for Machine Learning Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Support Vector Machine. Industrial AI Lab.

Support Vector Machine. Industrial AI Lab. Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different

More information

CS446: Machine Learning Spring Problem Set 4

CS446: Machine Learning Spring Problem Set 4 CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee Support Vector Machine Industrial AI Lab. Prof. Seungchul Lee Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories /

More information

On Nearest-Neighbor Error-Correcting Output Codes with Application to All-Pairs Multiclass Support Vector Machines

On Nearest-Neighbor Error-Correcting Output Codes with Application to All-Pairs Multiclass Support Vector Machines Journal of Machine Learning Research 4 (2003) 1-15 Submitted 10/02; Published 4/03 On Nearest-Neighbor Error-Correcting Output Codes with Application to All-Pairs Multiclass Support Vector Machines Aldebaro

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006

More information

Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers

Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers JMLR: Workshop and Conference Proceedings vol 35:1 8, 014 Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers Balázs Kégl LAL/LRI, University

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

A Brief Introduction to Adaboost

A Brief Introduction to Adaboost A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman. 1 Outline Background Adaboost Algorithm Theory/Interpretations 2 What s So Good

More information

Boosting with decision stumps and binary features

Boosting with decision stumps and binary features Boosting with decision stumps and binary features Jason Rennie jrennie@ai.mit.edu April 10, 2003 1 Introduction A special case of boosting is when features are binary and the base learner is a decision

More information

Error Limiting Reductions Between Classification Tasks

Error Limiting Reductions Between Classification Tasks Error Limiting Reductions Between Classification Tasks Keywords: supervised learning, reductions, classification Abstract We introduce a reduction-based model for analyzing supervised learning tasks. We

More information

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti MACHINE LEARNING Support Vector Machines Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Summary Support Vector Machines

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Classification objectives COMS 4771

Classification objectives COMS 4771 Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification

More information

Midterm Exam Solutions, Spring 2007

Midterm Exam Solutions, Spring 2007 1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems

More information

Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Part of the slides are adapted from Ziko Kolter

Part of the slides are adapted from Ziko Kolter Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

The Perceptron Algorithm, Margins

The Perceptron Algorithm, Margins The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of

More information

i=1 = H t 1 (x) + α t h t (x)

i=1 = H t 1 (x) + α t h t (x) AdaBoost AdaBoost, which stands for ``Adaptive Boosting", is an ensemble learning algorithm that uses the boosting paradigm []. We will discuss AdaBoost for binary classification. That is, we assume that

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Nearest Neighbors Methods for Support Vector Machines

Nearest Neighbors Methods for Support Vector Machines Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad

More information

The Margin Vector, Admissible Loss and Multi-class Margin-based Classifiers

The Margin Vector, Admissible Loss and Multi-class Margin-based Classifiers The Margin Vector, Admissible Loss and Multi-class Margin-based Classifiers Hui Zou University of Minnesota Ji Zhu University of Michigan Trevor Hastie Stanford University Abstract We propose a new framework

More information

Deep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH.

Deep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH. Deep Boosting Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Ensemble Methods in ML Combining several base classifiers

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)

More information

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

Accelerating Stochastic Optimization

Accelerating Stochastic Optimization Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz

More information

A Simple Algorithm for Multilabel Ranking

A Simple Algorithm for Multilabel Ranking A Simple Algorithm for Multilabel Ranking Krzysztof Dembczyński 1 Wojciech Kot lowski 1 Eyke Hüllermeier 2 1 Intelligent Decision Support Systems Laboratory (IDSS), Poznań University of Technology, Poland

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Understanding SVM (and associated kernel machines) through the development of a Matlab toolbox

Understanding SVM (and associated kernel machines) through the development of a Matlab toolbox Understanding SVM (and associated kernel machines) through the development of a Matlab toolbox Stephane Canu To cite this version: Stephane Canu. Understanding SVM (and associated kernel machines) through

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

On the Learnability and Design of Output Codes for Multiclass Problems

On the Learnability and Design of Output Codes for Multiclass Problems c Machine Learning,, 1?? () Kluwer Academic ublishers, Boston. Manufactured in The etherlands. On the Learnability and Design of Output Codes for Multiclass roblems kobics@cs.huji.ac.il School of Computer

More information

1/sqrt(B) convergence 1/B convergence B

1/sqrt(B) convergence 1/B convergence B The Error Coding Method and PICTs Gareth James and Trevor Hastie Department of Statistics, Stanford University March 29, 1998 Abstract A new family of plug-in classication techniques has recently been

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution

More information

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Does Modeling Lead to More Accurate Classification?

Does Modeling Lead to More Accurate Classification? Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Logistic regression and linear classifiers COMS 4771

Logistic regression and linear classifiers COMS 4771 Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random

More information

Minimax risk bounds for linear threshold functions

Minimax risk bounds for linear threshold functions CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Machine Learning Linear Models

Machine Learning Linear Models Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method

More information

Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods

Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods The Annals of Statistics, 26(5)1651-1686, 1998. Boosting the Margin A New Explanation for the Effectiveness of Voting Methods Robert E. Schapire AT&T Labs 18 Park Avenue, Room A279 Florham Park, NJ 7932-971

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland SUPPORT VECTOR MACHINES Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Machine Learning: Jordan

More information