Ranking-Oriented Evaluation Metrics Weike Pan College of Computer Science and Software Engineering Shenzhen University W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 1 / 21
Outline 1 Introduction 2 Evaluation Metrics 3 Experiments 4 Conclusion W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 2 / 21
Introduction Recommendation with Implicit Feedback We may represent users implicit feedback in a matrix form: If we can estimate the missing values (denoted as? ) in the matrix or rank the items directly, we can make recommendations for each user. W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 3 / 21
Notations (1/2) Introduction Table: Some notations. n user number m item number u {1, 2,...,n} user ID i, j {1, 2,...,m} item ID R = {(u, i)} (user, item) pairs in training data y ui {1, 0} indicator variable, y ui = 1 if (u, i) R I u preferred items by user u in training data I the whole item set U the whole user set W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 4 / 21
Notations (2/2) Introduction Table: Some notations. R te = {(u, i)} (user, item) pairs in test data Iu te preferred items by user u in test data U te user set in test data ˆr ui predicted rating of user u on item i recommended items for user u I re u W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 5 / 21
Evaluation Metrics Top-k Recommended Items Assume we have a ranked recommendation list of items for user u as generated by some recommendation method, I re u = {i(1),..., i(l),..., i(k)} I\I u where i(l) represents the item located at position l. W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 6 / 21
Evaluation Metrics How Shall We Evaluate the Recommendation Performance? Compare the ranked recommendation list of items for user u, i.e., Iu re, with the preferred items by user u in test data, i.e., Iu te W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 7 / 21
Pre@k Evaluation Metrics The precision of user u is defined as, Pre u @k = 1 k k l=1 δ(i(l) I te u ), where δ(x) = 1 if x is true and δ(x) = 0 otherwise. δ(i(l) Ite u ) = Ire u Ite u thus denotes the number of items in k l=1 Iu re Ite u. Then, we have Pre@k = u U te Pre u @k/ U te (1) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 8 / 21
Rec@k Evaluation Metrics The recall of user u is defined as, Rec u @k = 1 I te u k l=1 δ(i(l) I te u ), which means how many preferred items in I te u Then, we have Rec@k = are also in Ire u. u U te Rec u @k/ U te (2) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 9 / 21
F1@k Evaluation Metrics The F1 score of user u is defined as, F 1 u @k = 2 Pre u@k Rec u @k Pre u @k + Rec u @k. Then, we have F 1@k = u U te F 1 u @k/ U te (3) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 10 / 21
NDCG@k Evaluation Metrics The NDCG of user u is defined as, NDCG u @k = 1 Z u DCG u @k, where DCG u @k = k with preferred items in I te u Then, we have l=1 2δ(i(l) I te u ) 1 log(l+1) NDCG@k = and Z u is the best DCG u @k score in the beginning of Ire u. u U te NDCG u @k/ U te (4) NDCG emphasizes the items ranked in the beginning (i.e., location dependent) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 11 / 21
1-call@k Evaluation Metrics The 1-call of user u is defined as, 1-call u @k = δ( k l=1 δ(i(l) I te u ) 1), which means whether there is at least one preferred item in I re u. Then, we have 1-call@k = u U te 1-call u @k/ U te (5) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 12 / 21
Evaluation Metrics Mean Reciprocal Rank (MRR) The reciprocal rank of user u is defined as, 1 RR u = min i I te u (p ui ) where min i I te u (p ui ) is the position of the first preferred item in I re u. Then, we have MRR = u U te RR u / U te (6) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 13 / 21
Evaluation Metrics Mean Average Precision (MAP) The average precision of user u is defined as, AP u = 1 1 Iu te ( δ(p uj p ui )+1) p ui i I te u j I te u where p ui is the ranking position of item i. p uj p ui means that item j is ranked before item i for user u. Then, we have MAP = u U te AP u / U te (7) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 14 / 21
Evaluation Metrics Average Relative Position (ARP) The relative position of user u is defined as, where RP u = 1 I te u i I te u p ui I I u p ui I I u is the relative position of item i. Then, we have ARP = u U te RP u / U te (8) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 15 / 21
Evaluation Metrics Area Under the Curve (AUC) The AUC of user u is defined as, AUC u = 1 R te (u) (i,j) R te (u) δ(ˆr ui >ˆr uj ) where R te (u) = {(i, j) (u, i) R te,(u, j) R R te }. Then, we have AUC = u U te AUC u / U te (9) W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 16 / 21
PopRank Experiments We may use the bias of each item as the predicted rating, ˆr ui = b i = where µ = n u=1 m i=1 y ui/n/m. n y ui /n µ (10) u=1 W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 17 / 21
Data Set Experiments We use the files u1.base and u1.test of MovieLens100K 1 as our training data and test data, respectively. user number: n = 943; item number: m = 1682. u1.base (training data): 80000 rating records, and the density (or sparsity) is 80000/943/1682 = 5.04%. u1.test (test data): 20000 rating records. Pre-processing (for simulation): we only keep the (user, item) pairs with ratings 4 or 5 in u1.base and u1.test as preferred (user,item) pairs, and remove all other records. Finally, we obtain u1.base.occf and u1.test.occf. 1 http://grouplens.org/datasets/ W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 18 / 21
Results Experiments Table: Prediction performance of PopRank on MovieLens100K (u1.base.occf, u1.test.occf). PopRank Pre@5 0.2338 Rec@5 0.0571 F 1@5 0.0775 NDCG@5 0.2568 1 call@5 0.5877 MRR 0.4657 MAP 0.1516 ARP 0.1551 AUC 0.8516 W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 19 / 21
Conclusion Conclusion Different ranking-oriented evaluation metrics with different emphasis W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 20 / 21
Homework Conclusion Implement the ranking-oriented evaluation metrics and conduct empirical studies of PopRank on u2.base.occf, u2.test.occf of MovieLens100K with similar pre-processing Design some new evaluation metrics Design a new recommendation method to beat PopRank W.K. Pan (CSSE, SZU) Ranking-Oriented Evaluation Metrics 21 / 21