Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach

Similar documents
A Simple Algorithm for Multilabel Ranking

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach

Surrogate regret bounds for generalized classification performance metrics

Regret Analysis for Performance Metrics in Multi-Label Classification The Case of Hamming and Subset Zero-One Loss

Listwise Approach to Learning to Rank Theory and Algorithm

On the Consistency of AUC Pairwise Optimization

Decoupled Collaborative Ranking

A Statistical View of Ranking: Midway between Classification and Regression

Ordinal Classification with Decision Rules

On the Problem of Error Propagation in Classifier Chains for Multi-Label Classification

Relationship between Loss Functions and Confirmation Measures

Regret Analysis for Performance Metrics in Multi-Label Classification: The Case of Hamming and Subset Zero-One Loss

Large-Margin Thresholded Ensembles for Ordinal Regression

Statistical Optimality in Multipartite Ranking and Ordinal Regression

On Label Dependence in Multi-Label Classification

Convex Calibration Dimension for Multiclass Loss Matrices

Web Search and Text Mining. Learning from Preference Data

Learning Binary Classifiers for Multi-Class Problem

Calibration and regret bounds for order-preserving surrogate losses in learning to rank

Regret Analysis for Performance Metrics in Multi-Label Classification: The Case of Hamming and Subset Zero-One Loss

Large-scale Collaborative Ranking in Near-Linear Time

CS-E4830 Kernel Methods in Machine Learning

Lecture 18: Multiclass Support Vector Machines

Foundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Foundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research

Classification objectives COMS 4771

Multi-Label Selective Ensemble

10701/15781 Machine Learning, Spring 2007: Homework 2

On the VC-Dimension of the Choquet Integral

Online Learning and Sequential Decision Making

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Selective Ensemble of Classifier Chains

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification

AdaBoost and other Large Margin Classifiers: Convexity in Classification

IFT Lecture 7 Elements of statistical learning theory

Large-Margin Thresholded Ensembles for Ordinal Regression

Multi-Label Learning with Weak Label

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Classification and Support Vector Machine

Online isotonic regression

Boosting with decision stumps and binary features

Position-Aware ListMLE: A Sequential Learning Process for Ranking

10-701/ Machine Learning - Midterm Exam, Fall 2010

Large margin optimization of ranking measures

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

Does Modeling Lead to More Accurate Classification?

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Predicting Partial Orders: Ranking with Abstention

A Study of Relative Efficiency and Robustness of Classification Methods

Lecture 3: Multiclass Classification

Progressive Random k-labelsets for Cost-Sensitive Multi-Label Classification

Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization

Support Vector Machines

ECS289: Scalable Machine Learning

On the Consistency of Multi-Label Learning

Discriminative Learning can Succeed where Generative Learning Fails

Multi-label Active Learning with Auxiliary Learner

Stochastic Gradient Descent

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

On the Bayes-Optimality of F-Measure Maximizers

ABC-Boost: Adaptive Base Class Boost for Multi-class Classification

Online Passive-Aggressive Algorithms. Tirgul 11

SUPPORT VECTOR MACHINE

CorrLog: Correlated Logistic Models for Joint Prediction of Multiple Labels

ECS289: Scalable Machine Learning

Voting (Ensemble Methods)

Statistical Properties of Large Margin Classifiers

Classification and Pattern Recognition

Foundations of Machine Learning Ranking. Mehryar Mohri Courant Institute and Google Research

ECE 5424: Introduction to Machine Learning

Click-Through Rate prediction: TOP-5 solution for the Avazu contest

Prediction of Citations for Academic Papers

Bregman Divergences for Data Mining Meta-Algorithms

ML4NLP Multiclass Classification

Knowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Statistical Data Mining and Machine Learning Hilary Term 2016

Efficient and Principled Online Classification Algorithms for Lifelon

6.036 midterm review. Wednesday, March 18, 15

Intelligent Systems Discriminative Learning, Neural Networks

Evaluation. Andrea Passerini Machine Learning. Evaluation

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Introduction to Machine Learning

On the Consistency of Ranking Algorithms

Machine Learning for NLP

Lecture 2 Machine Learning Review

Click Prediction and Preference Ranking of RSS Feeds

Class Prior Estimation from Positive and Unlabeled Data

Learning from Corrupted Binary Labels via Class-Probability Estimation

PAC-learning, VC Dimension and Margin-based Bounds

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Stochastic Top-k ListNet

Evaluation requires to define performance measures to be optimized

Online Advertising is Big Business

Support Vector Machines, Kernel SVM

Scaling Neighbourhood Methods

Transcription:

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach Krzysztof Dembczyński and Wojciech Kot lowski Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland PAN Summer School, 29.06-04.07.2014

Agenda 1 Binary Classification 2 Bipartite Ranking 3 Multi-Label Classification 4 Reductions in Multi-Label Classification 5 Conditional Ranking The project is co-financed by the European Union from resources of the European Social Found 1 / 44

Outline 1 Ranking problem 2 Multilabel ranking 3 Summary 2 / 44

Outline 1 Ranking problem 2 Multilabel ranking 3 Summary 3 / 44

Ranking problem Ranking problem from the learning perspective: train a model that sorts items according to the preferences of a subject. Problems varies in the preference structure and training information: Bipartite, multipartite and object ranking, Ordinal classification/regression, Multi-label ranking, Conditional ranking. 4 / 44

Object ranking Ranking of national football teams. 5 / 44

Multi-label ranking Sort document tags by relevance. tennis sport Wimbledon Poland USA. politics 6 / 44

Label ranking Training data: {(x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )}, where y i is a ranking (permutation) of a fixed number of labels/alternatives. 1 Predict permutation (y π(1), y π(2),..., y π(m) ) for a given x. X 1 X 2 Y 1 Y 2 Y m x 1 5.0 4.5 1 3 2 x 2 2.0 2.5 2 1 3..... x n 3.0 3.5 3 1 2 x 4.0 2.5??? 1 E. Hüllermeier, J. Fürnkranz, W. Cheng, and K. Brinker. Label ranking by learning pairwise preferences. Artificial Intelligence, 172:1897 1916, 2008 7 / 44

Label ranking Training data: {(x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )}, where y i is a ranking (permutation) of a fixed number of labels/alternatives. 1 Predict permutation (y π(1), y π(2),..., y π(m) ) for a given x. X 1 X 2 Y 1 Y 2 Y m x 1 5.0 4.5 1 3 2 x 2 2.0 2.5 2 1 3..... x n 3.0 3.5 3 1 2 x 4.0 2.5 1 2 3 1 E. Hüllermeier, J. Fürnkranz, W. Cheng, and K. Brinker. Label ranking by learning pairwise preferences. Artificial Intelligence, 172:1897 1916, 2008 7 / 44

Collaborative filtering 2 Training data: {(u i, m j, y ij )}, for some i = 1,..., n and j = 1,..., m, y ij Y = R. Predict y ij for a given u i and m j. m 1 m 2 m 3 m m u 1 1 4 u 2 3 1 u 3 2 5... u n 2 1 2 D. Goldberg, D. Nichols, B.M. Oki, and D. Terry. Using collaborative filtering to weave and information tapestry. Communications of the ACM, 35(12):61 70, 1992 8 / 44

Dyadic prediction 3 4 5 7 8 6 10 14 9 21 12 instances y 1 y 2 y m y m+1 y m+2 1 1 x 1 10? 1?? 3 5 x 2 0.1 0? 7 0 x 3?? 1? 1 1... 0? 3 1 x n 0.9 1?? 2 3 x n+1??? 3 1 x n+2???? 3 A.K. Menon and C. Elkan. Predicting labels for dyadic data. Data Mining and Knowledge Discovery, 21(2), 2010 9 / 44

Query-document models. Conditional ranking 10 / 44

Feedback information Different types of feedback information: utility scores: x 1 0.19 x 2 0.93 x 3 0.71 x 4 0.52 x 5 0.09 11 / 44

Feedback information Different types of feedback information: total order: x 2 x 3 x 4 x 1 x 5 11 / 44

Feedback information Different types of feedback information: partial order: x 2 x 3 x 4 x 1 x 5 11 / 44

Feedback information Different types of feedback information: pairwise comparisons: x 2 x 3, x 2 x 4, x 2 x 1, x 2 x 5, x 3 x 1, x 3 x 5, x 4 x 1, x 4 x 5. 11 / 44

Feedback information Different types of feedback information: ordinal labels: x 1 1 x 2 5 x 3 4 x 4 3 x 5 1 x 2 x 3, x 2 x 4, x 2 x 1, x 2 x 5 x 3 x 1, x 3 x 4, x 3 x 5, x 4 x 1, x 4 x 5. 11 / 44

Feedback information Different types of feedback information: binary labels x 1 0 x 2 1 x 3 1 x 4 1 x 5 0 x 2 x 1, x 3 x 1, x 4 x 1, x 2 x 5, x 3 x 5, x 4 x 5. 11 / 44

Task losses Performance measures (task losses) in the ranking problems: Pairwise disagreement (also referred to as rank loss) ( ), Discounted cumulative gain ( ), Average precision ( ), Expected reciprocal rank ( ),... 12 / 44

Task losses Performance measures (task losses) in the ranking problems: Pairwise disagreement (also referred to as rank loss) ( ), Discounted cumulative gain ( ), Average precision ( ), Expected reciprocal rank ( ),... These measures are usually neither convex nor differentiable hard in optimization. 12 / 44

Task losses Performance measures (task losses) in the ranking problems: Pairwise disagreement (also referred to as rank loss) ( ), Discounted cumulative gain ( ), Average precision ( ), Expected reciprocal rank ( ),... These measures are usually neither convex nor differentiable hard in optimization. Learning algorithms rather employ surrogate losses to facilitate the optimization problem. 12 / 44

Can we design for a given ranking problem a surrogate loss that will provide a near-optimal solution with respect to a given task loss? 13 / 44

Pairwise disagreement Let r r be the true ranks of two objects and ˆr, ˆr be the predicted ranks of the same objects. Pairwise disagreement can be expressed by counting errors of the type: ˆr ˆr In general, the problem cannot be easily solved, but for some special cases it is possible. 4 4 J. Duchi, L. Mackey, and M. Jordan. On the consistency of ranking algorithms. In ICML, pages 327 334, 2010 W. Kot lowski, K. Dembczyński, and E. Hüllermeier. Bipartite ranking through minimization of univariate loss. In International Conference on Machine Learning, pages 1113 1120, 2011 14 / 44

Discounted cumulative gain Let us assume that there are n objects to rank. Let r, ˆr {1,..., n} represent true and predicted rank of an object, respectively. Discounted cumulative gain can be expressed by n ˆr i =1 2 n r i 1 log(1 + ˆr i ) ˆr: 1 2 3 4 5 6 7 8 9 r: 1 5 3 4 2 6 7 8 9 Reduction to regression 5 or multi-class classification 6 is possible. 5 D. Cossock and T. Zhang. Statistical analysis of bayes optimal subset ranking. IEEE Trans. Info. Theory, 54:5140 5154, 2008 6 Ping Li, Christopher J. C. Burges, and Qiang Wu. McRank: Learning to rank using multiple classification and gradient boosting. In NIPS, 2007 15 / 44

Average precision Let us assume that there is n objects to rank. Let r {0, 1} be the relevance and ˆr {1,..., n} the predicted rank of the same object. Average precision can be expressed by: 1 {i : r i = 1} ˆr i i:r i =1 k=1 ˆr: 1 2 3 4 5 6 7 8 9 r: 1 1 0 0 1 0 1 0 0 Theoretical analysis in terms of surrogate losses. 7 r k ˆr i 7 Clément Calauzènes, Nicolas Usunier, and Patrick Gallinari. Calibration and regret bounds for order-preserving surrogate losses in learning to rank. Machine Learning, 93(2-3):227 260, 2013 16 / 44

Expected reciprocal rank Let us assume that there is n objects to rank. Let r, ˆr {1,..., n} represent true and predicted rank of an object, respectively. Expected reciprocal rank 8 can be expressed by: ERR = = n k=1 n k=1 1 P (user stops at i) k k 1 1 k R k (1 R q ), R k = 2n r k 1 2 n 1. q=1 Theoretical analysis in terms of surrogate losses. 9 8 O.Chapellea and Y.Chang. Yahoo! learning to rank challenge overview. J.of Mach. Learn. Res., 14:1 24, 2011 9 Clément Calauzènes, Nicolas Usunier, and Patrick Gallinari. Calibration and regret bounds for order-preserving surrogate losses in learning to rank. Machine Learning, 93(2-3):227 260, 2013 17 / 44

Setting Objects (x, y) generated from an unknown distribution P (x, y). Risk (expected loss) of the function h(x): where l is a loss function. The regret of a classifier: where h is the Bayes classifier, L l (h) := E (x,y) [l(y, h(x))], Reg l (h) = L l (h) L l (h ), h = arg min L l (h). h 18 / 44

Setting Since task losses are usually neither convex nor differentiable, we use surrogate (or proxy) losses that are easier in optimization. We say that a surrogate loss l is consistent (calibrated) with the task loss l when the following holds: Reg l(h) 0 Reg l (h) 0. 19 / 44

Outline 1 Ranking problem 2 Multilabel ranking 3 Summary 20 / 44

Multilabel ranking Training data: {((x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ), } y i {0, 1} m Sort labels from the most to the least relevant for a given x. X 1 X 2 Y 1 Y 2... Y m x 1 5.0 4.5 1 1 0 x 2 2.0 2.5 0 1 0...... x n 3.0 3.5 0 1 1 x 4.0 2.5??? 21 / 44

Multilabel ranking Training data: {((x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ), } y i {0, 1} m Sort labels from the most to the least relevant for a given x. X 1 X 2 Y 1 Y 2... Y m x 1 5.0 4.5 1 1 0 x 2 2.0 2.5 0 1 0...... x n 3.0 3.5 0 1 1 x 4.0 2.5 h 2 > h 1 >... > h m 21 / 44

Multilabel ranking Training data: {((x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ), } y i {0, 1} m Sort labels from the most to the least relevant for a given x. X 1 X 2 Y 1 Y 2... Y m x 1 5.0 4.5 1 1 0 x 2 2.0 2.5 0 1 0...... x n 3.0 3.5 0 1 1 x 4.0 2.5 y 2 y 1... y m 21 / 44

Multilabel ranking Rank loss: l(y, h(x)) = w(y) ( h i (x) < h j (x) + 1 ) 2 h i(x) = h j (x), (i,j): y i >y j where w(y) < w max is a weight function. X 1 X 2 Y 1 Y 2... Y m x 4.0 2.5 1 0 0 h 2 > h 1 >... > h m 22 / 44

Multilabel ranking Rank loss: l(y, h(x)) = w(y) ( h i (x) < h j (x) + 1 ) 2 h i(x) = h j (x), (i,j): y i >y j where w(y) < w max is a weight function. The weight function w(y) is usually used to normalize the range of rank loss to [0, 1]: w(y) = 1 n + n, i.e., it is equal to the inverse of the total number of pairwise comparisons between labels. 22 / 44

Pairwise surrogate losses The most intuitive approach is to use pairwise convex surrogate losses of the form l φ (y, h) = w(y)φ(h i h j ), (i,j): y i >y j where φ is an exponential function (BoosTexter) 10 : φ(f) = e f, logistic function (LLLR) 11 : φ(f) = log(1 + e f ), or hinge function (RankSVM) 12 : φ(f) = max(0, 1 f). 10 R. E. Schapire and Y. Singer. BoosTexter: A Boosting-based System for Text Categorization. Machine Learning, 39(2/3):135 168, 2000 11 O. Dekel, Ch. Manning, and Y. Singer. Log-linear models for label ranking. In NIPS. MIT Press, 2004 12 A. Elisseeff and J. Weston. A kernel method for multi-labelled classification. In NIPS, pages 681 687, 2001 23 / 44

Surrogate losses φ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Boolean Test Exponential Logistic Hinge 3 2 1 0 1 2 3 f 24 / 44

Multilabel ranking The pairwise approach is, unfortunately, inconsistent for the most commonly used convex surrogates. 13 There exists a class of pairwise surrogates that is consistent. We will show, however, that the simple univariate (pointwise) variants of the exponential and logistic loss are consistent with the multi-label rank loss. 13 J. Duchi, L. Mackey, and M. Jordan. On the consistency of ranking algorithms. In ICML, pages 327 334, 2010 W. Gao and Z. Zhou. On the consistency of multi-label learning. In COLT, pages 341 358, 2011 25 / 44

Multilabel ranking Let us denote: uv ij = y : y i =u,y j =v w(y)p (y x). uv ij reduces to P (Y i = u, Y j = v x), for w(y) 1. uv ij = vu ji for all (i, j) Let W = E[w(Y ) x] = y w(y)p (y x). Then 00 ij + 01 ij + 10 ij + 11 ij = W. P (y) w Y 1 Y 2 Y 3 0.0 1 0 0 0 0.0 1 0 0 1 0.2 1 0 1 0 0.2 1 0 1 1 0.4 1 1 0 0 0.1 1 1 0 1 0.1 1 1 1 0 0.0 1 1 1 1 10 01 10 01 10 01 12 0.5 12 0.4 13 0.5 13 0.2 23 0.3 23 0.1 26 / 44

Multilabel ranking The conditional risk can be written as: L rnk (h x) = ( 10 ij h i < h j + 01 ij h i > h j i>j + 1 ) 2 ( 10 ij + 01 ij ) h i = h j Ideally, we would like to find h for which: L rnk (h x) = i>j min{ 10 ij, 01 ij }. 27 / 44

Reduction to weighted binary relevance The Bayes ranker can be obtained by sorting labels according to: 14 1 i = y : y i =1 w(y)p (y x). For w(y) 1, the labels should be sorted according to their marginal probabilities, since u i reduces to P (y i = u x) in this case. 14 K. Dembczyński, W. Kot lowski, and E. Hüllermeier. Consistent multilabel ranking through univariate losses. In International Conference on Machine Learning, 2012 28 / 44

Reduction to weighted binary relevance The Bayes risk is indeed: L rnk (h x) = i>j Since 1 i = 10 ij + 11 ij min{ 10 ij, 01 ij }., we have: 1 i 1 j = 10 ij + 11 ij 01 ij 11 ij = 10 ij 01 ij P (y) w Y 1 Y 2 Y 3 0.0 1 0 0 0 0.0 1 0 0 1 0.2 1 0 1 0 0.2 1 0 1 1 0.4 1 1 0 0 0.1 1 1 0 1 0.1 1 1 1 0 0.0 1 1 1 1 i 1 0.6 0.5 0.3 10 01 10 01 10 01 12 0.5 12 0.4 13 0.5 13 0.2 23 0.3 23 0.1 29 / 44

Reduction to weighted binary relevance Consider the univariate (weighted) exponential and logistic loss: l exp (y, h) = w(y) l log (y, h) = w(y) m e (2y i 1)h i, i=1 m i=1 The risk minimizer of these losses is: ) log (1 + e (2y i 1)h i. h i (x) = 1 c log 1 i 0 i = 1 c log 1 i W 1 i, which is a strictly increasing transformation of 1 i, where W = E[w(Y ) x] = y w(y)p (y x). 30 / 44

Reduction to weighted binary relevance Vertical reduction: Solving m independent classification problems. Many algorithms that minimize (weighted) exponential or logistic surrogate, such as AdaBoost or logistic regression, can be applied. Besides its simplicity and efficiency, this approach is consistent. 31 / 44

Regret bound 15 Theorem: Let Reg rnk (h) be the regret for rank loss, and Reg exp (h) and Reg log (h) be the regrets for exponential and logistic losses, respectively. Then 6 Reg rnk (h) 4 C Reg exp (h), 2 Reg rnk (h) 2 C Reg log (h), where C m mw max. 15 K. Dembczyński, W. Kot lowski, and E. Hüllermeier. Consistent multilabel ranking through univariate losses. In International Conference on Machine Learning, 2012 32 / 44

Main result: Sketch of Proof The main idea: to exploit similar regret bounds obtained for bipartite ranking. 16 Reduce horizontally the multilabel ranking problem to the bipartite ranking for each x separately. Since the labels are independent in bipartite ranking, transform the original label distribution to a new auxiliary one with independent labels. Adapt then bounds for the reduced problem with the auxiliary distribution. Finally, return to the original problem. 16 W. Kot lowski, K. Dembczyński, and E. Hüllermeier. Bipartite ranking through minimization of univariate loss. In International Conference on Machine Learning, pages 1113 1120, 2011 33 / 44

Main result: Horizontal reduction X 1 X 2 Y 1 Y 2... Y m x 1 5.0 4.5 1 1 0 x 2 2.0 2.5 0 1 0...... x n 3.0 3.5 0 1 1 x 1 X y Ỹ x 1 1 ỹ 1 1 x 2 2 ỹ 2 1... x m m ỹ m 0 For a given x, we define a bipartite ranking problem by setting X = {1,..., m}; The objects (instances) to be ranked correspond to the label indices of the MLR problem and are of the form x = i, (i = 1,..., m). The corresponding label for x = i is y i. Unfortunately, the labels y i are not necessarily independent. 34 / 44

Main result: Transformation The rank regret depends solely on the marginal weights 1 i : Replace the original distribution P x by the distribution P, for which labels are conditionally independent, P (Ỹ = 1 X = i) = 1 i W, P ( X = i) = 1 m and replace the original weights by w(y) = W. The resulting problem will have the same 1 i. 35 / 44

Main result: Regret bound for an auxiliary problem We adapt the known results for the biparite ranking: Theorem: Let Regbr ( h, P ) be the regret of the (unnormalized) biparite ranking problem, and Reg exp ( h, P ) and Reg log ( h, P ) the corresponsing exponential and logistic loss regrets. Then it holds: Reg br ( h, P 3 ) Reg 2 exp ( h, P ) Reg br ( h, P ) 2 Reg log ( h, P ) 36 / 44

Main result Tracing back We trace back from Reg l ( h, P x) to Reg rnk (h), where l stands for either exponential or logistic loss. P : Reg br ( h, P ) P x: Reg rnk (h x) E P : Reg rnk (h) Reg l ( h, P ) biparite rank. Reg l (h x) cond. MLR E Reg l (h) MLR 37 / 44

Inconsistency of the pairwise approach The conditional risk of pairwise surrogate loss is: L φ (h, P x) = i>j 10 ij φ(h i h j ) + 10 ji φ(h j h i ), and a necessary condition for consistency is that the Bayes classifier h for φ-loss is also the Bayes ranker, i.e., sgn(h i h j) = sgn( 10 ij 01 ij ). The (nonlinear monotone) transformation φ applies to the differences h i h j, so the minimization of the pairwise convex losses result in a complicated solution h, where h i generally depends on all 10 jk (1 j, k m), and not only on 1 i The only case in which the above convex pairwise loss is consistent is when the labels are independent (the case of bipartite ranking). 38 / 44

Experimental results: Synthetic data rank loss 0.170 0.172 0.174 WBR LR LLLR Bayes risk 250 500 1000 2000 4000 8000 16000 # of learning examples rank loss 0.182 0.184 0.186 WBR LR LLLR Bayes risk 250 500 1000 2000 4000 8000 16000 # of learning examples Figure : WBR LR vs. LLLR. Left: independent data. Right: dependent data. Label independence: the methods perform more or less en par. Label dependence: WBR shows small but consistent improvements. 39 / 44

Experimental results: Benchmark data Table : WBR-AdaBoost vs. AdaBoost.MR (left) and WBR-LR vs LLLR (right). dataset AB.MR WBR-AB LLLR WBR-LR image 0.2081 0.2041 0.2047 0.2065 emotions 0.1703 0.1699 0.1743 0.1657 scene 0.0720 0.0792 0.0861 0.0793 yeast 0.2072 0.1820 0.1728 0.1736 mediamill 0.0665 0.0609 0.0614 0.0472 WBR is at least competitive to state-of-the-art algorithms defined on pairwise surrogates. 40 / 44

Outline 1 Ranking problem 2 Multilabel ranking 3 Summary 41 / 44

Summary Ranking problem: different settings. Multi-label ranking. Consistency of multi-label rankers. 42 / 44

Conclusions Take-away message: Multi-label ranking can be solved by a variant of BR. Pairwise approaches are inconsistent. Multi-label ranking is the simplest variant of conditional ranking problems. For more check: http://www.cs.put.poznan.pl/kdembczynski 43 / 44

Thank you for your attention! The project is co-financed by the European Union from resources of the European Social Found. 44 / 44