Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach
|
|
- Victoria Garrison
- 5 years ago
- Views:
Transcription
1 Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach Krzysztof Dembczyński and Wojciech Kot lowski Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland PAN Summer School,
2 Agenda 1 Binary Classification 2 Bipartite Ranking 3 Multi-Label Classification 4 Reductions in Multi-Label Classification 5 Conditional Ranking The project is co-financed by the European Union from resources of the European Social Found 1 / 44
3 Outline 1 Ranking problem 2 Multilabel ranking 3 Summary 2 / 44
4 Outline 1 Ranking problem 2 Multilabel ranking 3 Summary 3 / 44
5 Ranking problem Ranking problem from the learning perspective: train a model that sorts items according to the preferences of a subject. Problems varies in the preference structure and training information: Bipartite, multipartite and object ranking, Ordinal classification/regression, Multi-label ranking, Conditional ranking. 4 / 44
6 Object ranking Ranking of national football teams. 5 / 44
7 Multi-label ranking Sort document tags by relevance. tennis sport Wimbledon Poland USA. politics 6 / 44
8 Label ranking Training data: {(x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )}, where y i is a ranking (permutation) of a fixed number of labels/alternatives. 1 Predict permutation (y π(1), y π(2),..., y π(m) ) for a given x. X 1 X 2 Y 1 Y 2 Y m x x x n x ??? 1 E. Hüllermeier, J. Fürnkranz, W. Cheng, and K. Brinker. Label ranking by learning pairwise preferences. Artificial Intelligence, 172: , / 44
9 Label ranking Training data: {(x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )}, where y i is a ranking (permutation) of a fixed number of labels/alternatives. 1 Predict permutation (y π(1), y π(2),..., y π(m) ) for a given x. X 1 X 2 Y 1 Y 2 Y m x x x n x E. Hüllermeier, J. Fürnkranz, W. Cheng, and K. Brinker. Label ranking by learning pairwise preferences. Artificial Intelligence, 172: , / 44
10 Collaborative filtering 2 Training data: {(u i, m j, y ij )}, for some i = 1,..., n and j = 1,..., m, y ij Y = R. Predict y ij for a given u i and m j. m 1 m 2 m 3 m m u u u u n D. Goldberg, D. Nichols, B.M. Oki, and D. Terry. Using collaborative filtering to weave and information tapestry. Communications of the ACM, 35(12):61 70, / 44
11 Dyadic prediction instances y 1 y 2 y m y m+1 y m x 1 10? 1?? 3 5 x ? 7 0 x 3?? 1? ? 3 1 x n 0.9 1?? 2 3 x n+1??? 3 1 x n+2???? 3 A.K. Menon and C. Elkan. Predicting labels for dyadic data. Data Mining and Knowledge Discovery, 21(2), / 44
12 Query-document models. Conditional ranking 10 / 44
13 Feedback information Different types of feedback information: utility scores: x x x x x / 44
14 Feedback information Different types of feedback information: total order: x 2 x 3 x 4 x 1 x 5 11 / 44
15 Feedback information Different types of feedback information: partial order: x 2 x 3 x 4 x 1 x 5 11 / 44
16 Feedback information Different types of feedback information: pairwise comparisons: x 2 x 3, x 2 x 4, x 2 x 1, x 2 x 5, x 3 x 1, x 3 x 5, x 4 x 1, x 4 x / 44
17 Feedback information Different types of feedback information: ordinal labels: x 1 1 x 2 5 x 3 4 x 4 3 x 5 1 x 2 x 3, x 2 x 4, x 2 x 1, x 2 x 5 x 3 x 1, x 3 x 4, x 3 x 5, x 4 x 1, x 4 x / 44
18 Feedback information Different types of feedback information: binary labels x 1 0 x 2 1 x 3 1 x 4 1 x 5 0 x 2 x 1, x 3 x 1, x 4 x 1, x 2 x 5, x 3 x 5, x 4 x / 44
19 Task losses Performance measures (task losses) in the ranking problems: Pairwise disagreement (also referred to as rank loss) ( ), Discounted cumulative gain ( ), Average precision ( ), Expected reciprocal rank ( ), / 44
20 Task losses Performance measures (task losses) in the ranking problems: Pairwise disagreement (also referred to as rank loss) ( ), Discounted cumulative gain ( ), Average precision ( ), Expected reciprocal rank ( ),... These measures are usually neither convex nor differentiable hard in optimization. 12 / 44
21 Task losses Performance measures (task losses) in the ranking problems: Pairwise disagreement (also referred to as rank loss) ( ), Discounted cumulative gain ( ), Average precision ( ), Expected reciprocal rank ( ),... These measures are usually neither convex nor differentiable hard in optimization. Learning algorithms rather employ surrogate losses to facilitate the optimization problem. 12 / 44
22 Can we design for a given ranking problem a surrogate loss that will provide a near-optimal solution with respect to a given task loss? 13 / 44
23 Pairwise disagreement Let r r be the true ranks of two objects and ˆr, ˆr be the predicted ranks of the same objects. Pairwise disagreement can be expressed by counting errors of the type: ˆr ˆr In general, the problem cannot be easily solved, but for some special cases it is possible. 4 4 J. Duchi, L. Mackey, and M. Jordan. On the consistency of ranking algorithms. In ICML, pages , 2010 W. Kot lowski, K. Dembczyński, and E. Hüllermeier. Bipartite ranking through minimization of univariate loss. In International Conference on Machine Learning, pages , / 44
24 Discounted cumulative gain Let us assume that there are n objects to rank. Let r, ˆr {1,..., n} represent true and predicted rank of an object, respectively. Discounted cumulative gain can be expressed by n ˆr i =1 2 n r i 1 log(1 + ˆr i ) ˆr: r: Reduction to regression 5 or multi-class classification 6 is possible. 5 D. Cossock and T. Zhang. Statistical analysis of bayes optimal subset ranking. IEEE Trans. Info. Theory, 54: , Ping Li, Christopher J. C. Burges, and Qiang Wu. McRank: Learning to rank using multiple classification and gradient boosting. In NIPS, / 44
25 Average precision Let us assume that there is n objects to rank. Let r {0, 1} be the relevance and ˆr {1,..., n} the predicted rank of the same object. Average precision can be expressed by: 1 {i : r i = 1} ˆr i i:r i =1 k=1 ˆr: r: Theoretical analysis in terms of surrogate losses. 7 r k ˆr i 7 Clément Calauzènes, Nicolas Usunier, and Patrick Gallinari. Calibration and regret bounds for order-preserving surrogate losses in learning to rank. Machine Learning, 93(2-3): , / 44
26 Expected reciprocal rank Let us assume that there is n objects to rank. Let r, ˆr {1,..., n} represent true and predicted rank of an object, respectively. Expected reciprocal rank 8 can be expressed by: ERR = = n k=1 n k=1 1 P (user stops at i) k k 1 1 k R k (1 R q ), R k = 2n r k 1 2 n 1. q=1 Theoretical analysis in terms of surrogate losses. 9 8 O.Chapellea and Y.Chang. Yahoo! learning to rank challenge overview. J.of Mach. Learn. Res., 14:1 24, Clément Calauzènes, Nicolas Usunier, and Patrick Gallinari. Calibration and regret bounds for order-preserving surrogate losses in learning to rank. Machine Learning, 93(2-3): , / 44
27 Setting Objects (x, y) generated from an unknown distribution P (x, y). Risk (expected loss) of the function h(x): where l is a loss function. The regret of a classifier: where h is the Bayes classifier, L l (h) := E (x,y) [l(y, h(x))], Reg l (h) = L l (h) L l (h ), h = arg min L l (h). h 18 / 44
28 Setting Since task losses are usually neither convex nor differentiable, we use surrogate (or proxy) losses that are easier in optimization. We say that a surrogate loss l is consistent (calibrated) with the task loss l when the following holds: Reg l(h) 0 Reg l (h) / 44
29 Outline 1 Ranking problem 2 Multilabel ranking 3 Summary 20 / 44
30 Multilabel ranking Training data: {((x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ), } y i {0, 1} m Sort labels from the most to the least relevant for a given x. X 1 X 2 Y 1 Y 2... Y m x x x n x ??? 21 / 44
31 Multilabel ranking Training data: {((x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ), } y i {0, 1} m Sort labels from the most to the least relevant for a given x. X 1 X 2 Y 1 Y 2... Y m x x x n x h 2 > h 1 >... > h m 21 / 44
32 Multilabel ranking Training data: {((x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ), } y i {0, 1} m Sort labels from the most to the least relevant for a given x. X 1 X 2 Y 1 Y 2... Y m x x x n x y 2 y 1... y m 21 / 44
33 Multilabel ranking Rank loss: l(y, h(x)) = w(y) ( h i (x) < h j (x) + 1 ) 2 h i(x) = h j (x), (i,j): y i >y j where w(y) < w max is a weight function. X 1 X 2 Y 1 Y 2... Y m x h 2 > h 1 >... > h m 22 / 44
34 Multilabel ranking Rank loss: l(y, h(x)) = w(y) ( h i (x) < h j (x) + 1 ) 2 h i(x) = h j (x), (i,j): y i >y j where w(y) < w max is a weight function. The weight function w(y) is usually used to normalize the range of rank loss to [0, 1]: w(y) = 1 n + n, i.e., it is equal to the inverse of the total number of pairwise comparisons between labels. 22 / 44
35 Pairwise surrogate losses The most intuitive approach is to use pairwise convex surrogate losses of the form l φ (y, h) = w(y)φ(h i h j ), (i,j): y i >y j where φ is an exponential function (BoosTexter) 10 : φ(f) = e f, logistic function (LLLR) 11 : φ(f) = log(1 + e f ), or hinge function (RankSVM) 12 : φ(f) = max(0, 1 f). 10 R. E. Schapire and Y. Singer. BoosTexter: A Boosting-based System for Text Categorization. Machine Learning, 39(2/3): , O. Dekel, Ch. Manning, and Y. Singer. Log-linear models for label ranking. In NIPS. MIT Press, A. Elisseeff and J. Weston. A kernel method for multi-labelled classification. In NIPS, pages , / 44
36 Surrogate losses φ Boolean Test Exponential Logistic Hinge f 24 / 44
37 Multilabel ranking The pairwise approach is, unfortunately, inconsistent for the most commonly used convex surrogates. 13 There exists a class of pairwise surrogates that is consistent. We will show, however, that the simple univariate (pointwise) variants of the exponential and logistic loss are consistent with the multi-label rank loss. 13 J. Duchi, L. Mackey, and M. Jordan. On the consistency of ranking algorithms. In ICML, pages , 2010 W. Gao and Z. Zhou. On the consistency of multi-label learning. In COLT, pages , / 44
38 Multilabel ranking Let us denote: uv ij = y : y i =u,y j =v w(y)p (y x). uv ij reduces to P (Y i = u, Y j = v x), for w(y) 1. uv ij = vu ji for all (i, j) Let W = E[w(Y ) x] = y w(y)p (y x). Then 00 ij + 01 ij + 10 ij + 11 ij = W. P (y) w Y 1 Y 2 Y / 44
39 Multilabel ranking The conditional risk can be written as: L rnk (h x) = ( 10 ij h i < h j + 01 ij h i > h j i>j + 1 ) 2 ( 10 ij + 01 ij ) h i = h j Ideally, we would like to find h for which: L rnk (h x) = i>j min{ 10 ij, 01 ij }. 27 / 44
40 Reduction to weighted binary relevance The Bayes ranker can be obtained by sorting labels according to: 14 1 i = y : y i =1 w(y)p (y x). For w(y) 1, the labels should be sorted according to their marginal probabilities, since u i reduces to P (y i = u x) in this case. 14 K. Dembczyński, W. Kot lowski, and E. Hüllermeier. Consistent multilabel ranking through univariate losses. In International Conference on Machine Learning, / 44
41 Reduction to weighted binary relevance The Bayes risk is indeed: L rnk (h x) = i>j Since 1 i = 10 ij + 11 ij min{ 10 ij, 01 ij }., we have: 1 i 1 j = 10 ij + 11 ij 01 ij 11 ij = 10 ij 01 ij P (y) w Y 1 Y 2 Y i / 44
42 Reduction to weighted binary relevance Consider the univariate (weighted) exponential and logistic loss: l exp (y, h) = w(y) l log (y, h) = w(y) m e (2y i 1)h i, i=1 m i=1 The risk minimizer of these losses is: ) log (1 + e (2y i 1)h i. h i (x) = 1 c log 1 i 0 i = 1 c log 1 i W 1 i, which is a strictly increasing transformation of 1 i, where W = E[w(Y ) x] = y w(y)p (y x). 30 / 44
43 Reduction to weighted binary relevance Vertical reduction: Solving m independent classification problems. Many algorithms that minimize (weighted) exponential or logistic surrogate, such as AdaBoost or logistic regression, can be applied. Besides its simplicity and efficiency, this approach is consistent. 31 / 44
44 Regret bound 15 Theorem: Let Reg rnk (h) be the regret for rank loss, and Reg exp (h) and Reg log (h) be the regrets for exponential and logistic losses, respectively. Then 6 Reg rnk (h) 4 C Reg exp (h), 2 Reg rnk (h) 2 C Reg log (h), where C m mw max. 15 K. Dembczyński, W. Kot lowski, and E. Hüllermeier. Consistent multilabel ranking through univariate losses. In International Conference on Machine Learning, / 44
45 Main result: Sketch of Proof The main idea: to exploit similar regret bounds obtained for bipartite ranking. 16 Reduce horizontally the multilabel ranking problem to the bipartite ranking for each x separately. Since the labels are independent in bipartite ranking, transform the original label distribution to a new auxiliary one with independent labels. Adapt then bounds for the reduced problem with the auxiliary distribution. Finally, return to the original problem. 16 W. Kot lowski, K. Dembczyński, and E. Hüllermeier. Bipartite ranking through minimization of univariate loss. In International Conference on Machine Learning, pages , / 44
46 Main result: Horizontal reduction X 1 X 2 Y 1 Y 2... Y m x x x n x 1 X y Ỹ x 1 1 ỹ 1 1 x 2 2 ỹ x m m ỹ m 0 For a given x, we define a bipartite ranking problem by setting X = {1,..., m}; The objects (instances) to be ranked correspond to the label indices of the MLR problem and are of the form x = i, (i = 1,..., m). The corresponding label for x = i is y i. Unfortunately, the labels y i are not necessarily independent. 34 / 44
47 Main result: Transformation The rank regret depends solely on the marginal weights 1 i : Replace the original distribution P x by the distribution P, for which labels are conditionally independent, P (Ỹ = 1 X = i) = 1 i W, P ( X = i) = 1 m and replace the original weights by w(y) = W. The resulting problem will have the same 1 i. 35 / 44
48 Main result: Regret bound for an auxiliary problem We adapt the known results for the biparite ranking: Theorem: Let Regbr ( h, P ) be the regret of the (unnormalized) biparite ranking problem, and Reg exp ( h, P ) and Reg log ( h, P ) the corresponsing exponential and logistic loss regrets. Then it holds: Reg br ( h, P 3 ) Reg 2 exp ( h, P ) Reg br ( h, P ) 2 Reg log ( h, P ) 36 / 44
49 Main result Tracing back We trace back from Reg l ( h, P x) to Reg rnk (h), where l stands for either exponential or logistic loss. P : Reg br ( h, P ) P x: Reg rnk (h x) E P : Reg rnk (h) Reg l ( h, P ) biparite rank. Reg l (h x) cond. MLR E Reg l (h) MLR 37 / 44
50 Inconsistency of the pairwise approach The conditional risk of pairwise surrogate loss is: L φ (h, P x) = i>j 10 ij φ(h i h j ) + 10 ji φ(h j h i ), and a necessary condition for consistency is that the Bayes classifier h for φ-loss is also the Bayes ranker, i.e., sgn(h i h j) = sgn( 10 ij 01 ij ). The (nonlinear monotone) transformation φ applies to the differences h i h j, so the minimization of the pairwise convex losses result in a complicated solution h, where h i generally depends on all 10 jk (1 j, k m), and not only on 1 i The only case in which the above convex pairwise loss is consistent is when the labels are independent (the case of bipartite ranking). 38 / 44
51 Experimental results: Synthetic data rank loss WBR LR LLLR Bayes risk # of learning examples rank loss WBR LR LLLR Bayes risk # of learning examples Figure : WBR LR vs. LLLR. Left: independent data. Right: dependent data. Label independence: the methods perform more or less en par. Label dependence: WBR shows small but consistent improvements. 39 / 44
52 Experimental results: Benchmark data Table : WBR-AdaBoost vs. AdaBoost.MR (left) and WBR-LR vs LLLR (right). dataset AB.MR WBR-AB LLLR WBR-LR image emotions scene yeast mediamill WBR is at least competitive to state-of-the-art algorithms defined on pairwise surrogates. 40 / 44
53 Outline 1 Ranking problem 2 Multilabel ranking 3 Summary 41 / 44
54 Summary Ranking problem: different settings. Multi-label ranking. Consistency of multi-label rankers. 42 / 44
55 Conclusions Take-away message: Multi-label ranking can be solved by a variant of BR. Pairwise approaches are inconsistent. Multi-label ranking is the simplest variant of conditional ranking problems. For more check: 43 / 44
56 Thank you for your attention! The project is co-financed by the European Union from resources of the European Social Found. 44 / 44
A Simple Algorithm for Multilabel Ranking
A Simple Algorithm for Multilabel Ranking Krzysztof Dembczyński 1 Wojciech Kot lowski 1 Eyke Hüllermeier 2 1 Intelligent Decision Support Systems Laboratory (IDSS), Poznań University of Technology, Poland
More informationBinary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach
Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach Krzysztof Dembczyński and Wojciech Kot lowski Intelligent Decision Support Systems Laboratory (IDSS) Poznań
More informationSurrogate regret bounds for generalized classification performance metrics
Surrogate regret bounds for generalized classification performance metrics Wojciech Kotłowski Krzysztof Dembczyński Poznań University of Technology PL-SIGML, Częstochowa, 14.04.2016 1 / 36 Motivation 2
More informationRegret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss
Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss Krzysztof Dembczyński 1, Willem Waegeman 2, Weiwei Cheng 1, and Eyke Hüllermeier 1 1 Knowledge
More informationListwise Approach to Learning to Rank Theory and Algorithm
Listwise Approach to Learning to Rank Theory and Algorithm Fen Xia *, Tie-Yan Liu Jue Wang, Wensheng Zhang and Hang Li Microsoft Research Asia Chinese Academy of Sciences document s Learning to Rank for
More informationOn the Consistency of AUC Pairwise Optimization
On the Consistency of AUC Pairwise Optimization Wei Gao and Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University Collaborative Innovation Center of Novel Software Technology
More informationDecoupled Collaborative Ranking
Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides
More informationA Statistical View of Ranking: Midway between Classification and Regression
A Statistical View of Ranking: Midway between Classification and Regression Yoonkyung Lee* 1 Department of Statistics The Ohio State University *joint work with Kazuki Uematsu June 4-6, 2014 Conference
More informationOrdinal Classification with Decision Rules
Ordinal Classification with Decision Rules Krzysztof Dembczyński 1, Wojciech Kotłowski 1, and Roman Słowiński 1,2 1 Institute of Computing Science, Poznań University of Technology, 60-965 Poznań, Poland
More informationOn the Problem of Error Propagation in Classifier Chains for Multi-Label Classification
On the Problem of Error Propagation in Classifier Chains for Multi-Label Classification Robin Senge, Juan José del Coz and Eyke Hüllermeier Draft version of a paper to appear in: L. Schmidt-Thieme and
More informationRelationship between Loss Functions and Confirmation Measures
Relationship between Loss Functions and Confirmation Measures Krzysztof Dembczyński 1 and Salvatore Greco 2 and Wojciech Kotłowski 1 and Roman Słowiński 1,3 1 Institute of Computing Science, Poznań University
More informationRegret Analysis for Performance Metrics in Multi-Label Classification: The Case of Hamming and Subset Zero-One Loss
Regret Analysis for Performance Metrics in Multi-Label Classification: The Case of Hamming and Subset Zero-One Loss Krzysztof Dembczyński 1,3, Willem Waegeman 2, Weiwei Cheng 1, and Eyke Hüllermeier 1
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,
More informationStatistical Optimality in Multipartite Ranking and Ordinal Regression
Statistical Optimality in Multipartite Ranking and Ordinal Regression Kazuki Uematsu, Chemitox Inc., Japan Yoonkyung Lee, The Ohio State University Technical Report No. 873 August, 2013 Department of Statistics
More informationOn Label Dependence in Multi-Label Classification
Krzysztof Dembczynski 1,2 dembczynski@informatik.uni-marburg.de Willem Waegeman 3 willem.waegeman@ugent.be Weiwei Cheng 1 cheng@informatik.uni-marburg.de Eyke Hüllermeier 1 eyke@informatik.uni-marburg.de
More informationConvex Calibration Dimension for Multiclass Loss Matrices
Journal of Machine Learning Research 7 (06) -45 Submitted 8/4; Revised 7/5; Published 3/6 Convex Calibration Dimension for Multiclass Loss Matrices Harish G. Ramaswamy Shivani Agarwal Department of Computer
More informationWeb Search and Text Mining. Learning from Preference Data
Web Search and Text Mining Learning from Preference Data Outline Two-stage algorithm, learning preference functions, and finding a total order that best agrees with a preference function Learning ranking
More informationLearning Binary Classifiers for Multi-Class Problem
Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,
More informationCalibration and regret bounds for order-preserving surrogate losses in learning to rank
Mach Learn (2013) 93:227 260 DOI 10.1007/s10994-013-5382-3 Calibration and regret bounds for order-preserving surrogate losses in learning to rank Clément Calauzènes Nicolas Usunier Patrick Gallinari Received:
More informationRegret Analysis for Performance Metrics in Multi-Label Classification: The Case of Hamming and Subset Zero-One Loss
Regret Analysis for Performance Metrics in Multi-Label Classification: The Case of Hamming and Subset Zero-One Loss Krzysztof Dembczyński 1,3, Willem Waegeman 2, Weiwei Cheng 1,andEykeHüllermeier 1 1 Department
More informationLarge-scale Collaborative Ranking in Near-Linear Time
Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going
More informationLecture 18: Multiclass Support Vector Machines
Fall, 2017 Outlines Overview of Multiclass Learning Traditional Methods for Multiclass Problems One-vs-rest approaches Pairwise approaches Recent development for Multiclass Problems Simultaneous Classification
More informationFoundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Multi-Class Classification Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation Real-world problems often have multiple classes: text, speech,
More informationIntroduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationFoundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification page 2 Motivation Real-world problems often have multiple classes:
More informationClassification objectives COMS 4771
Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification
More informationMulti-Label Selective Ensemble
Multi-Label Selective Ensemble Nan Li, Yuan Jiang and Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University Collaborative Innovation Center of Novel Software Technology
More information10701/15781 Machine Learning, Spring 2007: Homework 2
070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach
More informationOn the VC-Dimension of the Choquet Integral
On the VC-Dimension of the Choquet Integral Eyke Hüllermeier and Ali Fallah Tehrani Department of Mathematics and Computer Science University of Marburg, Germany {eyke,fallah}@mathematik.uni-marburg.de
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationSelective Ensemble of Classifier Chains
Selective Ensemble of Classifier Chains Nan Li 1,2 and Zhi-Hua Zhou 1 1 National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China 2 School of Mathematical Sciences,
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationAnalysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification
Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification arxiv:62.03663v [cs.cv] 2 Dec 206 Maksim Lapin, Matthias Hein, and Bernt Schiele Abstract Top-k error is
More informationAdaBoost and other Large Margin Classifiers: Convexity in Classification
AdaBoost and other Large Margin Classifiers: Convexity in Classification Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mikhail Traskin. slides at
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006
More informationMulti-Label Learning with Weak Label
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Multi-Label Learning with Weak Label Yu-Yin Sun Yin Zhang Zhi-Hua Zhou National Key Laboratory for Novel Software Technology
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationClassification and Support Vector Machine
Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline
More informationOnline isotonic regression
Online isotonic regression Wojciech Kot lowski Joint work with: Wouter Koolen (CWI, Amsterdam) Alan Malek (MIT) Poznań University of Technology 06.06.2017 Outline 1 Motivation 2 Isotonic regression 3 Online
More informationBoosting with decision stumps and binary features
Boosting with decision stumps and binary features Jason Rennie jrennie@ai.mit.edu April 10, 2003 1 Introduction A special case of boosting is when features are binary and the base learner is a decision
More informationPosition-Aware ListMLE: A Sequential Learning Process for Ranking
Position-Aware ListMLE: A Sequential Learning Process for Ranking Yanyan Lan 1 Yadong Zhu 2 Jiafeng Guo 1 Shuzi Niu 2 Xueqi Cheng 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing,
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationLarge margin optimization of ranking measures
Large margin optimization of ranking measures Olivier Chapelle Yahoo! Research, Santa Clara chap@yahoo-inc.com Quoc Le NICTA, Canberra quoc.le@nicta.com.au Alex Smola NICTA, Canberra alex.smola@nicta.com.au
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationPredicting Partial Orders: Ranking with Abstention
Predicting Partial Orders: Ranking with Abstention Weiwei Cheng 1, Michaël Rademaker 2, Bernard De Baets 2, and Eyke Hüllermeier 1 1 Department of Mathematics and Computer Science University of Marburg,
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationLecture 3: Multiclass Classification
Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in
More informationProgressive Random k-labelsets for Cost-Sensitive Multi-Label Classification
1 26 Progressive Random k-labelsets for Cost-Sensitive Multi-Label Classification Yu-Ping Wu Hsuan-Tien Lin Department of Computer Science and Information Engineering, National Taiwan University, Taiwan
More informationModified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization
Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization Jian Zhang jian.zhang@cs.cmu.edu Rong Jin rong@cs.cmu.edu Yiming Yang yiming@cs.cmu.edu Alex
More informationSupport Vector Machines
Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationOn the Consistency of Multi-Label Learning
On the Consistency o Multi-Label Learning Wei Gao and Zhi-Hua Zhou National Key Laboratory or Novel Sotware Technology Nanjing University, Nanjing 210093, China {gaow,zhouzh}@lamda.nju.edu.cn Abstract
More informationDiscriminative Learning can Succeed where Generative Learning Fails
Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,
More informationMulti-label Active Learning with Auxiliary Learner
Multi-label Active Learning with Auxiliary Learner Chen-Wei Hung and Hsuan-Tien Lin National Taiwan University November 15, 2011 C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationBoosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi
Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier
More informationOn the Bayes-Optimality of F-Measure Maximizers
Journal of Machine Learning Research 15 (2014) 3513-3568 Submitted 10/13; Revised 6/14; Published 11/14 On the Bayes-Optimality of F-Measure Maximizers Willem Waegeman willem.waegeman@ugent.be Department
More informationABC-Boost: Adaptive Base Class Boost for Multi-class Classification
ABC-Boost: Adaptive Base Class Boost for Multi-class Classification Ping Li Department of Statistical Science, Cornell University, Ithaca, NY 14853 USA pingli@cornell.edu Abstract We propose -boost (adaptive
More informationOnline Passive-Aggressive Algorithms. Tirgul 11
Online Passive-Aggressive Algorithms Tirgul 11 Multi-Label Classification 2 Multilabel Problem: Example Mapping Apps to smart folders: Assign an installed app to one or more folders Candy Crush Saga 3
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationCorrLog: Correlated Logistic Models for Joint Prediction of Multiple Labels
CorrLog: Correlated Logistic Models for Joint Prediction of Multiple Labels Wei Bian Bo Xie Dacheng Tao Georgia Tech Center for Music Technology, Georgia Institute of Technology bo.xie@gatech.edu Centre
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationStatistical Properties of Large Margin Classifiers
Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides
More informationClassification and Pattern Recognition
Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations
More informationFoundations of Machine Learning Ranking. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Ranking Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation Very large data sets: too large to display or process. limited resources, need
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationClick-Through Rate prediction: TOP-5 solution for the Avazu contest
Click-Through Rate prediction: TOP-5 solution for the Avazu contest Dmitry Efimov Petrovac, Montenegro June 04, 2015 Outline Provided data Likelihood features FTRL-Proximal Batch algorithm Factorization
More informationPrediction of Citations for Academic Papers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationBregman Divergences for Data Mining Meta-Algorithms
p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,
More informationML4NLP Multiclass Classification
ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we
More informationKnowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..
Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar Knowledge Discovery in Data: Naïve Bayes Overview Naïve Bayes methodology refers to a probabilistic approach to information discovery
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationEfficient and Principled Online Classification Algorithms for Lifelon
Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland SUPPORT VECTOR MACHINES Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Machine Learning: Jordan
More informationOn the Consistency of Ranking Algorithms
John C. Duchi jduchi@cs.berkeley.edu Lester W. Mackey lmackey@cs.berkeley.edu Computer Science Division, University of California, Berkeley, CA 94720, USA Michael I. Jordan jordan@cs.berkeley.edu Computer
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationClass Prior Estimation from Positive and Unlabeled Data
IEICE Transactions on Information and Systems, vol.e97-d, no.5, pp.1358 1362, 2014. 1 Class Prior Estimation from Positive and Unlabeled Data Marthinus Christoffel du Plessis Tokyo Institute of Technology,
More informationLearning from Corrupted Binary Labels via Class-Probability Estimation
Learning from Corrupted Binary Labels via Class-Probability Estimation Aditya Krishna Menon Brendan van Rooyen Cheng Soon Ong Robert C. Williamson xxx National ICT Australia and The Australian National
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationStochastic Top-k ListNet
Stochastic Top-k ListNet Tianyi Luo, Dong Wang, Rong Liu, Yiqiao Pan CSLT / RIIT Tsinghua University lty@cslt.riit.tsinghua.edu.cn EMNLP, Sep 9-19, 2015 1 Machine Learning Ranking Learning to Rank Information
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationOnline Advertising is Big Business
Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More information