Predicting the Performance of Collaborative Filtering Algorithms

Similar documents
Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Collaborative Filtering. Radek Pelánek

Impact of Data Characteristics on Recommender Systems Performance

Collaborative Filtering on Ordinal User Feedback

Matrix Factorization Techniques for Recommender Systems

Learning in Probabilistic Graphs exploiting Language-Constrained Patterns

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Collaborative Recommendation with Multiclass Preference Context

Ranking and Filtering

Decoupled Collaborative Ranking

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Algorithms for Collaborative Filtering

A Modified PMF Model Incorporating Implicit Item Associations

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

Predicting Neighbor Goodness in Collaborative Filtering

Probabilistic Partial User Model Similarity for Collaborative Filtering

Collaborative Filtering Applied to Educational Data Mining

Scaling Neighbourhood Methods

6.034 Introduction to Artificial Intelligence

Restricted Boltzmann Machines for Collaborative Filtering

arxiv: v2 [cs.ir] 14 May 2018

* Matrix Factorization and Recommendation Systems

Facing the information flood in our daily lives, search engines mainly respond

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Collaborative Filtering with Temporal Dynamics with Using Singular Value Decomposition

Collaborative Filtering

Matrix Factorization Techniques for Recommender Systems

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

Rating Prediction with Topic Gradient Descent Method for Matrix Factorization in Recommendation

arxiv: v2 [cs.ir] 4 Jun 2018

Circle-based Recommendation in Online Social Networks

Matrix and Tensor Factorization from a Machine Learning Perspective

arxiv: v1 [cs.si] 18 Oct 2015

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

A Matrix Factorization Technique with Trust Propagation for Recommendation in Social Networks

Ordinal Boltzmann Machines for Collaborative Filtering

Matrix Factorization and Collaborative Filtering

CS249: ADVANCED DATA MINING

Local Low-Rank Matrix Approximation with Preference Selection of Anchor Points

Andriy Mnih and Ruslan Salakhutdinov

Recommendation Systems

Using SVD to Recommend Movies

The BigChaos Solution to the Netflix Prize 2008

Recommendation Systems

Content-based Recommendation

Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures

Learning in Probabilistic Graphs exploiting Language-Constrained Patterns

Scalable Hierarchical Recommendations Using Spatial Autocorrelation

SQL-Rank: A Listwise Approach to Collaborative Ranking

Big Data Analysis with Apache Spark UC#BERKELEY

Data Mining Techniques

Kernelized Matrix Factorization for Collaborative Filtering

Recommendation Systems

HOLISTIC ENTROPY REDUCTION FOR COLLABORATIVE FILTERING

Choice-based recommender systems

Multiverse Recommendation: N-dimensional Tensor Factorization for Context-aware Collaborative Filtering

Department of Computer Science, Guiyang University, Guiyang , GuiZhou, China

Scalable Bayesian Matrix Factorization

Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Prediction in Social Networks

Time-aware Collaborative Topic Regression: Towards Higher Relevance in Textual Item Recommendation

Collaborative Filtering with Aspect-based Opinion Mining: A Tensor Factorization Approach

CS425: Algorithms for Web Scale Data

The Pragmatic Theory solution to the Netflix Grand Prize

Exploiting Local and Global Social Context for Recommendation

Collaborative Filtering with Entity Similarity Regularization in Heterogeneous Information Networks

Collaborative Filtering via Different Preference Structures

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Preference Relation-based Markov Random Fields for Recommender Systems

Little Is Much: Bridging Cross-Platform Behaviors through Overlapped Crowds

Collaborative filtering based on multi-channel diffusion

TAPER: A Contextual Tensor- Based Approach for Personalized Expert Recommendation

Ordinal Boltzmann Machines for Collaborative Filtering

COT: Contextual Operating Tensor for Context-Aware Recommender Systems

CS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering

TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation

Recommender Systems with Social Regularization

Exploiting Emotion on Reviews for Recommender Systems

CS425: Algorithms for Web Scale Data

SoCo: A Social Network Aided Context-Aware Recommender System

Collaborative topic models: motivations cont

A Simple Algorithm for Nuclear Norm Regularized Problems

a Short Introduction

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

Matrix Factorization with Content Relationships for Media Personalization

MATRIX RECOVERY FROM QUANTIZED AND CORRUPTED MEASUREMENTS

Generalized Linear Models in Collaborative Filtering

Spectral k-support Norm Regularization

Similar but Different: Exploiting Users Congruity for Recommendation Systems

Offline Evaluation of Ranking Policies with Click Models

Adaptive one-bit matrix completion

Data Science Mastery Program

Recommender Systems: Overview and. Package rectools. Norm Matloff. Dept. of Computer Science. University of California at Davis.

Factored Proximity Models for Top-N Recommendations

Maximum Margin Matrix Factorization for Collaborative Ranking

Mixture-Rank Matrix Approximation for Collaborative Filtering

Recommender System for Yelp Dataset CS6220 Data Mining Northeastern University

Probabilistic Neighborhood Selection in Collaborative Filtering Systems

Transcription:

Predicting the Performance of Collaborative Filtering Algorithms Pawel Matuszyk and Myra Spiliopoulou Knowledge Management and Discovery Otto-von-Guericke University Magdeburg, Germany 04. June 2014 Pawel Matuszyk and Myra Spiliopoulou Predicting the Performance of CF 1 / 13

Motivation CF is a widely used family of algorithms for recommender systems e.g. matrix factorization neighbourhood-based methods not appropriate for all applications how do we know, if CF is applicable? implementing a CF method running expensive experiments tuning evaluation OR predicting the performance of CF given a dataset Pawel Matuszyk and Myra Spiliopoulou Predicting the Performance of CF 2 / 13

Visual Analysis a method for a visual assessment of dataset characteristics mapping of users into equivalence classes [u] = {u x U R(u x ) = R(u) } U = a set of users R(u x ) = set of ratings of user u x building of a co-rating matrix one cell = average number of co-ratings between ([u x ], [u y ]) a heatmap as visualization histogram of cardinalities of user classes Pawel Matuszyk and Myra Spiliopoulou Predicting the Performance of CF 3 / 13

MovieLens 1M Dataset Figure: Visualization of the Movie Lens 1M dataset. Pawel Matuszyk and Myra Spiliopoulou Predicting the Performance of CF 4 / 13

Epinions Dataset Figure: Visualization of the Epinions dataset. Pawel Matuszyk and Myra Spiliopoulou Predicting the Performance of CF 5 / 13

Characteristics of Datasets measures for quantifying the characteristics of dataset D sparsity(d) = 1 u U R(u) U Items [AB11] quantifying the distribution of co-ratings (high values better) Entropy(D) = x,y ( ) cor([u x ],[u y ]) x,y cor([ux ],[uy ])log cor([ux ],[u y ]) 2 x,y cor([ux ],[uy ]) GiniIndex(D) = 1 x,y ( cor([u x ],[u y ]) x,y cor([ux ],[uy ]))2 Pawel Matuszyk and Myra Spiliopoulou Predicting the Performance of CF 6 / 13

Building a Training Dataset mapping of the dataset measures to the RMSE values for each dataset D: sparsity(d), Entropy(D), Gini(D) two CF methods: user-based CF with cosine similarity SVD++ [Kor08] two target attributes: {RMSE UB CF, RMSE MF } target value = best value of RMSE found by a grid search 4 datasets 4 learning instances Pawel Matuszyk and Myra Spiliopoulou Predicting the Performance of CF 7 / 13

Results of the Grid Search Results of the Grid Search 1.0 RMSE Method MF UB CF 0.5 0.0 Epinions Flixter ML1M Netflix Dataset Pawel Matuszyk and Myra Spiliopoulou Predicting the Performance of CF 8 / 13

Correlation of our measures with RMSE Table: Pearson s product moment correlation coefficients between our measures and the RMSE. Significance at level < 0.03 marked in red. Measure Correlation with RMSE Alternative p-value UB-CF MF Hypothesis UB-CF MF 1-Gini 0.9969276652 0.9760339396 true correl > 0 0.001536 0.01198 Entropy -0.9488311629-0.9849657734 true correl < 0 0.02558 0.007517 Sparsity 0.737570808 0.7795095148 true correl > 0 0.1312 0.1102 (1-Gini) Sparsity 0.9969300691 0.9758969096 true correl > 0 0.001535 0.01205 Entropy Sparsity -0.9409525839-0.9733224195 true correl < 0 0.02952 0.01334 strong linear correlation of our measures with the RMSE correlation based on only 4 instances, but p-values prove significance Pawel Matuszyk and Myra Spiliopoulou Predicting the Performance of CF 9 / 13

CF-Performance Predictor linear regression (LR) as predictor of RMSE RMSE = α (1 Gini) + β Sparsity + γ evaluation: learn LR on RMSE UB CF and use it to predict RMSE MF and vice versa evaluation measure: Pearson s product moment correlation between predictions and real values learnt parameters: Method α β γ UB-CF 1158.0332 0.925 0.1004 MF 621.7 1.13-0.2 Pawel Matuszyk and Myra Spiliopoulou Predicting the Performance of CF 10 / 13

Experimental Results Regression on UB-CF MF Corr. p-value Corr. p-value UB-CF 0.99967 0.000167 0.98455 0.00773 MF 0.99649 0.00175 0.98768 0.00616 Table: Pearson s product moment correlation coefficients of RMSE predictions with real values. Pawel Matuszyk and Myra Spiliopoulou Predicting the Performance of CF 11 / 13

Conclusions based only on dataset statistics we built a performance predictor results highly and significantly correlate with real values alternative to implementing and running expensive experiments limitation: we tested the method on datasets with rating range between 1 and 5 rating behaviour with a different rating rang is different (our future work) Pawel Matuszyk and Myra Spiliopoulou Predicting the Performance of CF 12 / 13

References [AB11] Deepa Anand and Kamal Bharadwaj. Utilizing various sparsity measures for enhancing accuracy of collaborative recommender systems based on local and global similarities. Expert Syst. Appl., 38(5):5101 5109, 2011. [Kor08] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In 14th ACM SIGKDD, pages 426 434. ACM, 2008. Pawel Matuszyk and Myra Spiliopoulou Predicting the Performance of CF 13 / 13