A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

Similar documents
Decoupled Collaborative Ranking

Large-scale Collaborative Ranking in Near-Linear Time

Maximum Margin Matrix Factorization for Collaborative Ranking

Matrix Factorization Techniques for Recommender Systems

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Algorithms for Collaborative Filtering

Ranking-Oriented Evaluation Metrics

CS249: ADVANCED DATA MINING

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Restricted Boltzmann Machines for Collaborative Filtering

Data Mining Techniques

ECS171: Machine Learning

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Linear and Logistic Regression. Dr. Xiaowei Huang

SQL-Rank: A Listwise Approach to Collaborative Ranking

ECE 5424: Introduction to Machine Learning

Ranking and Filtering

Collaborative Filtering. Radek Pelánek

arxiv: v2 [cs.ir] 4 Jun 2018

Recurrent Latent Variable Networks for Session-Based Recommendation

Collaborative Filtering via Different Preference Structures

Machine Learning for NLP

Learning Task Grouping and Overlap in Multi-Task Learning

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

Collaborative Filtering on Ordinal User Feedback

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Linear classifiers: Overfitting and regularization

Personalized Ranking for Non-Uniformly Sampled Items

Probabilistic Neighborhood Selection in Collaborative Filtering Systems

Dynamic Poisson Factorization

Gradient Boosting (Continued)

Collaborative Recommendation with Multiclass Preference Context

Exploiting Geographic Dependencies for Real Estate Appraisal

Stochastic Gradient Descent

Collaborative topic models: motivations cont

Generative Models for Discrete Data

Logistic Regression Logistic

Semestrial Project - Expedia Hotel Ranking

Andriy Mnih and Ruslan Salakhutdinov

Ad Placement Strategies

CS60021: Scalable Data Mining. Large Scale Machine Learning

ECE 5984: Introduction to Machine Learning

Learning to Recommend Point-of-Interest with the Weighted Bayesian Personalized Ranking Method in LBSNs

Predictive Discrete Latent Factor Models for large incomplete dyadic data

CS260: Machine Learning Algorithms

Online Learning and Sequential Decision Making

Circle-based Recommendation in Online Social Networks

ECS171: Machine Learning

Delta Boosting Machine and its application in Actuarial Modeling Simon CK Lee, Sheldon XS Lin KU Leuven, University of Toronto

Large-scale Information Processing, Summer Recommender Systems (part 2)

Lessons Learned from the Netflix Contest. Arthur Dunbar

Predicting the Performance of Collaborative Filtering Algorithms

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

NCDREC: A Decomposability Inspired Framework for Top-N Recommendation

arxiv: v2 [cs.ir] 14 May 2018

Recommendation Systems

Factor Modeling for Advertisement Targeting

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer

Introduction to Logistic Regression

VBM683 Machine Learning

Mixed Membership Matrix Factorization

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

Impact of Data Characteristics on Recommender Systems Performance

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends

CSE 258, Winter 2017: Midterm

Preference Relation-based Markov Random Fields for Recommender Systems

2018 EE448, Big Data Mining, Lecture 4. (Part I) Weinan Zhang Shanghai Jiao Tong University

What is Happening Right Now... That Interests Me?

ECS289: Scalable Machine Learning

Introduction to Machine Learning (67577) Lecture 7

Scaling Neighbourhood Methods

Domokos Miklós Kelen. Online Recommendation Systems. Eötvös Loránd University. Faculty of Natural Sciences. Advisor:

Item Recommendation for Emerging Online Businesses

Using SVD to Recommend Movies

Mixture-Rank Matrix Approximation for Collaborative Filtering

Context-aware factorization methods for implicit feedback based recommendation problems

arxiv: v2 [cs.ir] 22 Feb 2018

Matrix Factorization and Factorization Machines for Recommender Systems

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

A Statistical View of Ranking: Midway between Classification and Regression

Learning Optimal Ranking with Tensor Factorization for Tag Recommendation

Linear classifiers: Logistic regression

Recommendation Systems

Matrix Factorization Techniques for Recommender Systems

Supplementary material for UniWalk: Explainable and Accurate Recommendation for Rating and Network Data

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Department of Computer Science, Guiyang University, Guiyang , GuiZhou, China

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning

* Matrix Factorization and Recommendation Systems

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models

Gradient Boosting, Continued

CTJLSVM: Componentwise Triple Jump Acceleration for Training Linear SVM

Matrix Factorization and Collaborative Filtering

Optimization Methods for Machine Learning

Mixed Membership Matrix Factorization

Big Data Analytics. Lucas Rego Drumond

Rating Prediction with Topic Gradient Descent Method for Matrix Factorization in Recommendation

Joint user knowledge and matrix factorization for recommender systems

Transcription:

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation Yue Ning 1 Yue Shi 2 Liangjie Hong 2 Huzefa Rangwala 3 Naren Ramakrishnan 1 1 Virginia Tech 2 Yahoo Research. Yue Shi is now with Facebook, Liangjie Hong is now with Etsy. 3 George Mason University August 27, 2017

Outline Introduction Problem Challenges The Proposed Framework Applications Adaptive Logistic Regression Adaptive Gradient Boosting Decision Tree Adaptive Matrix Factorization Experimental Evaluation Datasets & Metrics Comparison Methods Ranking Scores Summary

Challenges in Personalized Recommender Systems Alleviate average experiences for users.

Challenges in Personalized Recommender Systems Alleviate average experiences for users. Lack of generic empirical frameworks for different models.

Challenges in Personalized Recommender Systems Alleviate average experiences for users. Lack of generic empirical frameworks for different models. Distributed model learning and less access of data.

Example of Personal Models Global ndcg score 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.10.20.30.40.50.60.70.80.9 1 Personal ndcg score Global MAP score 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.10.20.30.40.50.60.70.80.9 1 Personal MAP score Figure: An example of global and personal models. Left figure showcases the ndcg score of users from global (y-axis) and personal (x-axis) models. (Right: MAP score).

System Framework Input Dataset C 1 C 2 Global Model: w (0) t u HashTable Mapping User Data g (1) g (2) g (3) g (t) Userid t u g (0:tu) Fetch t u Personal Model Personal Model Personal Model Personal Model Figure: System Framework. Component C 1 trains a global model. Component C 2 generates a hashtable based on users data distribution. Users request t u from C 2 and C 1 returns a subsequence of gradients g (0:tu) to users.

Adaptation Mechanism Global update Local update θ (T ) = θ (0) η t u 1 θ u = θ (0) η 1 t=1 θ: the global model parameter. T g (t) (θ) t=1 θ u : the personal model parameter. u: the index for one user. T g (t) (θ) η 2 g (t) (θ u ) t=t u t u : the index of global gradients for user u. g (t) (θ): global gradients g (t) (θ u ): personal gradients

How do we decide t u? Group users into C groups based on their data sizes in descending order. Decide the position p u = i C, C is # groups. i is the group assignment for user u. the first group (i=1) of users has the most data. Set t u = T p u T: total iterations in the global SGD algorithm Users with the most data have the earliest stop for global gradients.

Adaptive Logistic Regression Objective: min L(w) = f (w) + λr(w) (1) w f (w) is the negative log-likelihood. r(w) is a regularization function. Adaptation Procedure: Global update Local update w (0) u t u 1 = w (0) η 1 w u (T ) = w u (0) t=1 T t u η 2 t=1 g (t) (w) (2) g (t) (w u ) (3)

Adaptive Gradient Boosting Decision Tree Objective: L (t) = = N d N d l(y d, F (t 1) d + ρh (t) ) + Ω(h (t) ) l(y d, F (0) d + ρh (0:t) ) + Ω(h (t) ) (4) Adaptation Procedure: F (0) u = F (0) + ρh (0:tu) (5) F (T ) u = F (0) u + ρh (tu:t ) u (6)

Adaptive Matrix Factorization Objective: min (r ui µ b u b i q T u p i ) q,p,b u,i + λ( q u 2 + p i 2 + b 2 u + b 2 i ) (7) Adaptation Procedure: q (0) u = q (0) b (0) u = b (0) u η 1 t u t=0 u η 1 t u k=0 g (t) (q u ), q (T u ) = q (0) u g (t) (T ) (0) (b u ), b u = b u T t u η 2 t=0 T t u η 2 t=0 g (t) ( q u ) (8) g (t) ( b u ) (9)

Properties Generality: The framework is generic to a variety of machine learning models that can be optimized by gradient-based approaches. Extensibility: The framework is extensible to be used for more sophisticated use cases. Scalability: In this framework, the training process of a personal model for one user is independent of all the other users.

Datasets Table: Dataset Statistics News Portal # users 54845 # features 351 Movie Ratings # click events 2,378,918 Netflix Movielens # view events 26,916,620 # users 478920 1721 avg # click events per user 43 # items 17766 3331 avg # events per user 534 sparsity 0.00942 0.039 For LogReg and GBDT: News Portal dataset For Matrix Factorization: Movie rating datasets (Netflix, Movielens)

Metrics MAP: Mean Average Precision. MRR: Mean Reciprocal Rank. AUC: Area Under (ROC) Curve. ndcg: Normalized Discounted Cumulative Gain. RMSE: Root Mean Square Error MAE: Mean Absolute Error

Comparison Methods Table: Objective functions for different methods. Model LogReg N Global d=1 f (w) + λ w 2 2 Nu Local j=1 f (w u) + λ w u 2 2 Nu MTL j f (w u ) + λ1 2 w u w 2 + λ2 2 w u 2 Model GBDT N Global d l(y d, F (0) d + ρh (0:t) ) + Ω(h (t) ) Nu Local j l(y j, F (0) j + ρh (0:t) ) + Ω(h (t) ) MTL - Model MF Global u,i (r ui µ b u b i q T u p i ) + λ( q u 2 + p i 2 + bu 2 + bi 2 ) Local (r ui µ b i Nu u b i q T u p i) + λ( q u 2 + p i 2 + b u 2 + b i 2) MTL global+λ 2 [(q u q) 2 + (p i p) 2 + (b u A u ) 2 + (b i A i ) 2 ] Global: models are trained on all users data Local: models are learned locally on per user s data MTL: users models are averaged by a global parameter.

Ranking Performance - LogReg AUC score on Test MRR score on Test 0.74 0.72 0.7 0.68 0.66 0.64 0.62 0.6 0.58 0.56 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 Global Local MTL Adaptive 0 100 200 300 400 500 epochs (a) AUC Global Local MTL Adaptive MAP score on Test 0 20 40 60 80 100120140160180200 epochs (c) MRR ndcg score on Test 0.3 0.28 0.26 0.24 0.22 0.2 0.18 0.16 0.6 0.58 0.56 0.54 0.52 0.5 0.48 0.46 Global Local MTL Adaptive 0 20 40 60 80 100120140160180200 epochs (b) MAP Global Local MTL Adaptive 0 20 40 60 80 100120140160180200 epochs (d) ndcg AUC, MAP, MRR and ndcg scores on the test dataset with varying training epochs. The proposed adaptive LogReg models achieve higher scores with fewer epochs. Global models perform the worst.

Ranking Performance - GBDT Table: Performance comparison based on MAP, MRR, AUC and ndcg for GBDT. Each value is calculated from the average of 10 runs with standard deviation. Global-GBDT #Trees MAP MRR AUC ndcg 20 0.2094(1e-3) 0.3617(2e-3) 0.6290(1e-3) 0.5329(6e-4) 50 0.2137(1e-3) 0.3726(1e-3) 0.6341(1e-3) 0.5372(6e-4) 100 0.2150(8e-3) 0.3769(1e-3) 0.6356(8e-4) 0.5392(6e-4) 200 0.2161(5e-4) 0.3848(1e-3) 0.6412(6e-4) 0.5415(5e-4) Local-GBDT #Trees MAP MRR AUC ndcg 20 0.2262(2e-3) 0.4510(5e-3) 0.6344(3e-3) 0.5604(2e-3) 50 0.2319(2e-3) 0.4446(4e-3) 0.6505(2e-3) 0.5651(2e-3) 100 0.2328(1e-3) 0.4465(5e-3) 0.6558(2e-3) 0.5651(2e-3) 200 0.2322(2e-3) 0.4431(2e-3) 0.6566(1e-3) 0.5649(1e-3) Adaptive-GBDT #Trees MAP MRR AUC ndcg 20+50 0.2343(2e-3) 0.4474(4e-3) 0.6555(2e-3) 0.5661(2e-3) 50+50 0.2325(2e-3) 0.4472(1e-4) 0.6561(8e-4) 0.5666(6e-4) 10+100 0.2329(2e-3) 0.4423(3e-3) 0.6587(1e-3) 0.5650(3e-3)

Ranking Performance - GBDT Test MAP 0.6 0.59 0.58 0.57 0.56 0.55 0.54 0.53 0.52 0.51 0.5 Group1(GBDT) Global Local Adaptive (a) Group 1 Test MAP 0.15 0.14 0.13 0.12 0.11 0.1 0.09 0.08 0.07 0.06 0.05 Group7(GBDT) Global Local Adaptive (b) Group 7 Figure: MAP Comparison of Group 1 (least) and Group 7 (most) for GBDT methods. MAP score for the groups of users with least data (Group 1) and most data (Group 7) for GBDT models. Adaptive-GBDT outperform both global and local GBDT models in terms of MAP for all groups of users.

Ranking Performance - LogReg vs GBDT AUC score 0.7 0.68 0.66 0.64 0.62 0.6 0.58 0.56 0.54 0.52 Global-LogReg Local-LogReg MTL-LogReg Adaptive-LogReg 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 % of training samples (a) LogReg AUC score 0.68 0.67 0.66 0.65 0.64 0.63 0.62 0.61 0.6 Global-GBDT Local-GBDT Adaptive-GBDT 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 % of training samples (b) GBDT AUC score for Global-GBDT, Local-GBDT, and Adaptive-GBDT with # of training samples from 20% to 100%. On average of AUC, Adaptive-GBDT performs better than other methods. With the increase of training samples, GBDT based methods tend to perform better while LogReg methods achieve relatively stable scores.

Results - MF Test RMSE Test RMSE 1 0.98 0.96 0.94 0.92 0.9 0.88 0.86 Global Local MTL Adaptive 0.84 0.82 0.8 0.78 0.76 0.74 0.72 0.7 Global Local MTL Adaptive (a) ML-RMSE (b) ML-MAE 1.05 1 0.95 0.9 0.85 0.8 Global Local MTL Adaptive (c) Netflix-RMSE (d) Netflix-MAE Test MAE Test MAE 0.85 0.8 0.75 0.7 0.65 0.6 Global Local MTL Adaptive RMSE and MAE on MovieLens(ML) and Netflix datasets. The quartile analysis of the group level RMSE and MAE for different MF models. Gold: Adaptive-MF

Summary Effectively and efficiently build personal models that lead to improved recommendation performance over either the global model or the local model. Adaptively learn personal models by exploiting the global gradients according to individuals characteristic. Our experiments demonstrate the usefulness of our framework across a wide scope, in terms of both model classes and application domains.

Thank you! Q&A