Ranking-Oriented Evaluation Metrics

Similar documents
Collaborative Recommendation with Multiclass Preference Context

Decoupled Collaborative Ranking

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

Ranking and Filtering

Semestrial Project - Expedia Hotel Ranking

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Collaborative topic models: motivations cont

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends

Learning to Recommend Point-of-Interest with the Weighted Bayesian Personalized Ranking Method in LBSNs

Context-aware factorization methods for implicit feedback based recommendation problems

Recommendation Systems

NCDREC: A Decomposability Inspired Framework for Top-N Recommendation

Personalized Ranking for Non-Uniformly Sampled Items

Collaborative Filtering

SQL-Rank: A Listwise Approach to Collaborative Ranking

Counterfactual Model for Learning Systems

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Dynamic Poisson Factorization

Recommendation Systems

Item Recommendation for Emerging Online Businesses

Domokos Miklós Kelen. Online Recommendation Systems. Eötvös Loránd University. Faculty of Natural Sciences. Advisor:

Large-scale Collaborative Ranking in Near-Linear Time

Probabilistic Neighborhood Selection in Collaborative Filtering Systems

CS249: ADVANCED DATA MINING

Recommendation Systems

Optimizing Factorization Machines for Top-N Context-Aware Recommendations

Probabilistic Matrix Factorization

Counterfactual Evaluation and Learning

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Andriy Mnih and Ruslan Salakhutdinov

Alternating Least Squares for Personalized Ranking

Designing Information Devices and Systems I Spring 2016 Elad Alon, Babak Ayazifar Homework 12

Collaborative Filtering. Radek Pelánek

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation

arxiv: v2 [cs.ir] 4 Jun 2018

Web Search and Text Mining. Learning from Preference Data

Learning Binary Codes with Neural Collaborative Filtering for Efficient Recommendation Systems

Data Mining Techniques

Collaborative Filtering on Ordinal User Feedback

Time-aware Collaborative Topic Regression: Towards Higher Relevance in Textual Item Recommendation

Offline Evaluation of Ranking Policies with Click Models

Towards modeling implicit feedback with quantum entanglement

Introduction to Computational Advertising

Facing the information flood in our daily lives, search engines mainly respond

Recommender systems, matrix factorization, variable selection and social graph data

Information Retrieval

Statistical Ranking Problem

Factor Modeling for Advertisement Targeting

Matrix Factorization and Factorization Machines for Recommender Systems

A Novel Click Model and Its Applications to Online Advertising

CSE 258, Winter 2017: Midterm

Recurrent Latent Variable Networks for Session-Based Recommendation

Equating Tests Under The Nominal Response Model Frank B. Baker

A Modified PMF Model Incorporating Implicit Item Associations

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E.

Probabilistic Field Mapping for Product Search

Location Regularization-Based POI Recommendation in Location-Based Social Networks

Matrix Factorization Techniques for Recommender Systems

Clustering based tensor decomposition

Statistical Optimality in Multipartite Ranking and Ordinal Regression

Evaluating GIS for Disaster Management

Lecture 11 Sections 4.5, 4.7. Wed, Feb 18, 2009

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Maximum Margin Matrix Factorization for Collaborative Ranking

Matrix Factorization and Collaborative Filtering

Evaluation Metrics. Jaime Arguello INLS 509: Information Retrieval March 25, Monday, March 25, 13

Performance Evaluation

Unifying Nearest Neighbors Collaborative Filtering

LEC 5: Two Dimensional Linear Discriminant Analysis

IMU Experiment in IR4QA at NTCIR-8

6.034 Introduction to Artificial Intelligence

arxiv: v2 [cs.ir] 14 May 2018

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

Name: Matriculation Number: Tutorial Group: A B C D E

Data Science Mastery Program

Factorization Models for Context-/Time-Aware Movie Recommendations

DOCUMENTS DE TRAVAIL CEMOI / CEMOI WORKING PAPERS. A SAS macro to estimate Average Treatment Effects with Matching Estimators

Designing Information Devices and Systems I Spring 2018 Homework 5


Preference Relation-based Markov Random Fields for Recommender Systems

Item Response Theory (IRT) Analysis of Item Sets

Time-aware Point-of-interest Recommendation

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS

Collaborative Filtering via Different Preference Structures

Mixture-Rank Matrix Approximation for Collaborative Filtering

Matrix Factorization Techniques for Recommender Systems

Lecture Notes 1: Matrix Algebra Part C: Pivoting and Matrix Decomposition

On the Foundations of Diverse Information Retrieval. Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi

arxiv: v1 [physics.soc-ph] 12 May 2015

Calibration and regret bounds for order-preserving surrogate losses in learning to rank

DIFFERENTIAL MANIFOLDS HW Exercise Employing the summation convention, we have: [u, v] i = ui x j vj vi. x j u j

Machine Learning and Adaptive Systems. Lectures 3 & 4

TFMAP: Optimizing MAP for Top-N Context-aware Recommendation

LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Circle-based Recommendation in Online Social Networks

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

Department of Computer Science, Guiyang University, Guiyang , GuiZhou, China

Transcription:

Ranking-Oriented Evaluation Metrics Weike Pan College of Computer Science and Software Engineering Shenzhen University W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 1 / 21

Outline 1 Introduction 2 Evaluation Metrics 3 Experiments 4 Conclusion W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 2 / 21

Introduction Recommendation with Implicit Feedback We may represent users implicit feedback in a matrix form: If we can estimate the missing values (denoted as? ) in the matrix or rank the items directly, we can make recommendations for each user. W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 3 / 21

Notations (1/2) Introduction Table: Some notations. n user number m item number u {1, 2,...,n} user ID i, j {1, 2,...,m} item ID R = {(u, i)} (user, item) pairs in training data y ui {1, 0} indicator variable, y ui = 1 if (u, i) R I u preferred items by user u in training data I the whole item set U the whole user set W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 4 / 21

Notations (2/2) Introduction Table: Some notations. R te = {(u, i)} (user, item) pairs in test data Iu te preferred items by user u in test data U te user set in test data ˆr ui predicted rating of user u on item i recommended items for user u I re u W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 5 / 21

Evaluation Metrics Top-k Recommended Items Assume we have a ranked recommendation list of items for user u as generated by some recommendation method, I re u = {i(1),..., i(l),..., i(k)} I\I u where i(l) represents the item located at position l. W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 6 / 21

Evaluation Metrics How Shall We Evaluate the Recommendation Performance? Compare the ranked recommendation list of items for user u, i.e., Iu re, with the preferred items by user u in test data, i.e., Iu te W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 7 / 21

Pre@k Evaluation Metrics The precision of user u is defined as, Pre u @k = 1 k k l=1 δ(i(l) I te u ), where δ(x) = 1 if x is true and δ(x) = 0 otherwise. δ(i(l) Ite u ) = Ire u Ite u thus denotes the number of items in k l=1 Iu re Ite u. Then, we have Pre@k = u U te Pre u @k/ U te (1) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 8 / 21

Rec@k Evaluation Metrics The recall of user u is defined as, Rec u @k = 1 I te u k l=1 δ(i(l) I te u ), which means how many preferred items in I te u Then, we have Rec@k = are also in Ire u. u U te Rec u @k/ U te (2) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 9 / 21

F1@k Evaluation Metrics The F1 score of user u is defined as, F 1 u @k = 2 Pre u@k Rec u @k Pre u @k + Rec u @k. Then, we have F 1@k = u U te F 1 u @k/ U te (3) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 10 / 21

NDCG@k Evaluation Metrics The NDCG of user u is defined as, NDCG u @k = 1 Z u DCG u @k, where DCG u @k = k with preferred items in I te u Then, we have l=1 2δ(i(l) I te u ) 1 log(l+1) NDCG@k = and Z u is the best DCG u @k score in the beginning of Ire u. u U te NDCG u @k/ U te (4) NDCG emphasizes the items ranked in the beginning (i.e., location dependent) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 11 / 21

1-call@k Evaluation Metrics The 1-call of user u is defined as, 1-call u @k = δ( k l=1 δ(i(l) I te u ) 1), which means whether there is at least one preferred item in I re u. Then, we have 1-call@k = u U te 1-call u @k/ U te (5) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 12 / 21

Evaluation Metrics Mean Reciprocal Rank (MRR) The reciprocal rank of user u is defined as, 1 RR u = min i I te u (p ui ) where min i I te u (p ui ) is the position of the first preferred item in I re u. Then, we have MRR = u U te RR u / U te (6) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 13 / 21

Evaluation Metrics Mean Average Precision (MAP) The average precision of user u is defined as, AP u = 1 1 Iu te ( δ(p uj p ui )+1) p ui i I te u j I te u where p ui is the ranking position of item i. p uj p ui means that item j is ranked before item i for user u. Then, we have MAP = u U te AP u / U te (7) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 14 / 21

Evaluation Metrics Average Relative Position (ARP) The relative position of user u is defined as, where RP u = 1 I te u i I te u p ui I I u p ui I I u is the relative position of item i. Then, we have ARP = u U te RP u / U te (8) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 15 / 21

Evaluation Metrics Area Under the Curve (AUC) The AUC of user u is defined as, AUC u = 1 R te (u) (i,j) R te (u) δ(ˆr ui >ˆr uj ) where R te (u) = {(i, j) (u, i) R te,(u, j) R R te }. Then, we have AUC = u U te AUC u / U te (9) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 16 / 21

PopRank Experiments We may use the bias of each item as the predicted rating, ˆr ui = b i = where µ = n u=1 m i=1 y ui/n/m. n y ui /n µ (10) u=1 W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 17 / 21

Data Set Experiments We use the files u1.base and u1.test of MovieLens100K 1 as our training data and test data, respectively. user number: n = 943; item number: m = 1682. u1.base (training data): 80000 rating records, and the density (or sparsity) is 80000/943/1682 = 5.04%. u1.test (test data): 20000 rating records. Pre-processing (for simulation): we only keep the (user, item) pairs with ratings 4 or 5 in u1.base and u1.test as preferred (user,item) pairs, and remove all other records. Finally, we obtain u1.base.occf and u1.test.occf. 1 http://grouplens.org/datasets/ W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 18 / 21

Results Experiments Table: Prediction performance of PopRank on MovieLens100K (u1.base.occf, u1.test.occf). PopRank Pre@5 0.2338 Rec@5 0.0571 F 1@5 0.0775 NDCG@5 0.2568 1 call@5 0.5877 MRR 0.4657 MAP 0.1516 ARP 0.1551 AUC 0.8516 W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 19 / 21

Conclusion Conclusion Different ranking-oriented evaluation metrics with different emphasis W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 20 / 21

Homework Conclusion Implement the ranking-oriented evaluation metrics and conduct empirical studies of PopRank on u2.base.occf, u2.test.occf of MovieLens100K with similar pre-processing Design some new evaluation metrics Design a new recommendation method to beat PopRank W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 21 / 21