CS249: ADVANCED DATA MINING

Similar documents
Recommendation Systems

CS249: ADVANCED DATA MINING

Recommendation Systems

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING

Collaborative Filtering. Radek Pelánek

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

CS249: ADVANCED DATA MINING

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Decoupled Collaborative Ranking

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

CS145: INTRODUCTION TO DATA MINING

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES

* Matrix Factorization and Recommendation Systems

CS145: INTRODUCTION TO DATA MINING

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS

Ranking and Filtering

CS6220: DATA MINING TECHNIQUES

Matrix Factorization Techniques for Recommender Systems

CS6220: DATA MINING TECHNIQUES

CS425: Algorithms for Web Scale Data

CS6220: DATA MINING TECHNIQUES

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer

Scaling Neighbourhood Methods

CS6220: DATA MINING TECHNIQUES

Data Mining Techniques

Lecture 11. Kernel Methods

Matrix Factorization Techniques for Recommender Systems

Data Science Mastery Program

Collaborative topic models: motivations cont

Collaborative Filtering with Temporal Dynamics with Using Singular Value Decomposition

CS145: INTRODUCTION TO DATA MINING

Generative Models for Discrete Data

Collaborative Filtering

Algorithms for Collaborative Filtering

Matrix Factorization and Collaborative Filtering

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Chapter 6: Classification

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

CS246 Final Exam, Winter 2011

Click Prediction and Preference Ranking of RSS Feeds

CS 6375 Machine Learning

Andriy Mnih and Ruslan Salakhutdinov

CS145: INTRODUCTION TO DATA MINING

Recommendation Systems

A Modified PMF Model Incorporating Implicit Item Associations

Prediction of Citations for Academic Papers

Introduction to Logistic Regression

Collaborative Filtering on Ordinal User Feedback

Lecture 6. Notes on Linear Algebra. Perceptron

Stochastic Modeling of a Computer System with Software Redundancy

Linear and Logistic Regression. Dr. Xiaowei Huang

CS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering

What is Happening Right Now... That Interests Me?

Joint user knowledge and matrix factorization for recommender systems

Collaborative Filtering via Different Preference Structures

Large-scale Collaborative Ranking in Near-Linear Time

Data Mining Techniques

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Location Regularization-Based POI Recommendation in Location-Based Social Networks

Personalized Ranking for Non-Uniformly Sampled Items

Department of Computer Science, Guiyang University, Guiyang , GuiZhou, China

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems

Learning to Recommend Point-of-Interest with the Weighted Bayesian Personalized Ranking Method in LBSNs

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

arxiv: v2 [cs.ir] 14 May 2018

CS6220: DATA MINING TECHNIQUES

Large Scale Data Analysis Using Deep Learning

RaRE: Social Rank Regulated Large-scale Network Embedding

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Context-aware factorization methods for implicit feedback based recommendation problems

Item Recommendation for Emerging Online Businesses

Recurrent Latent Variable Networks for Session-Based Recommendation

Collaborative Filtering Matrix Completion Alternating Least Squares

SQL-Rank: A Listwise Approach to Collaborative Ranking

Worksheets for GCSE Mathematics. Algebraic Expressions. Mr Black 's Maths Resources for Teachers GCSE 1-9. Algebra

Matrix Factorization with Content Relationships for Media Personalization

Impact of Data Characteristics on Recommender Systems Performance

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Collaborative Recommendation with Multiclass Preference Context

Side Information Aware Bayesian Affinity Estimation

Content-based Recommendation

Ad Placement Strategies

NOWADAYS, Collaborative Filtering (CF) [14] plays an

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Lecture 10: Logistic Regression

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

UVA CS 4501: Machine Learning

FINAL: CS 6375 (Machine Learning) Fall 2014

Matrix Factorization and Factorization Machines for Recommender Systems

Transcription:

CS249: ADVANCED DATA MINING Recommender Systems Instructor: Yizhou Sun yzsun@cs.ucla.edu May 17, 2017

Methods Learnt: Last Lecture Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve Bayes; Logistic Regression SVM; NN K-means; hierarchical clustering; DBSCAN; Mixture Models; kernel k-means PLSA; LDA Matrix Factorization Graph & Network Label Propagation SCAN; Spectral Clustering Prediction Ranking Linear Regression GLM Collaborative Filtering PageRank Feature Representation Word embedding Network embedding 2

Methods to Learn Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve Bayes; Logistic Regression SVM; NN K-means; hierarchical clustering; DBSCAN; Mixture Models; kernel k-means PLSA; LDA Matrix Factorization Graph & Network Label Propagation SCAN; Spectral Clustering Prediction Ranking Linear Regression GLM Collaborative Filtering PageRank Feature Representation Word embedding Network embedding 3

Recommender Systems What is Recommender System? Collaborative Filtering Content-based Recommendation Hybrid methods Evaluation Metrics Summary 4

Recommender Systems Application areas

In the Social Web

Why using Recommender Systems? Value for the customer Find things that are interesting Narrow down the set of choices Help me explore the space of options Discover new things Entertainment Value for the provider Additional and probably unique personalized service for the customer Increase trust and customer loyalty Increase sales, click trough rates, conversion etc. Opportunities for promotion, persuasion Obtain more knowledge about customers

Sparse Matrix Matrix Representation Explicit Feedback 8

Implicit Feedback: only know whether user and item has interacted 9

A Network Point of View Link prediction problem? 10

Methods Collaborative filtering Content-based recommendation Hybrid methods 11

Recommender Systems What is Recommender System? Collaborative Filtering Content-based Recommendation Hybrid methods Evaluation Metrics Summary 12

Collaborative Filtering (CF) The most prominent approach to generate recommendations used by large, commercial e-commerce sites well-understood, various algorithms and variations exist applicable in many domains (book, movies, DVDs,..) Approach use the "wisdom of the crowd" to recommend items Basic assumption and idea Users give ratings to catalog items (implicitly or explicitly) Customers who had similar tastes in the past, will have similar tastes in the future

Major Methods for CF Memory-based Collaborative Filtering User-based CF Compute similarity between users and active users, and use similar users ratings as prediction Item-based CF Compute similarity between items, and predict similar rating to similar items that the active user has rated before Model-based Collaborative Filtering 14

User-based Collaborative Filtering 1. Define similarity between users according to the history matrix 2. Decide how many peers to consider 3. Use peers ratings to predict the rating between an active user and an item 15

(1) Define Similarities between Users Pearson correlation between user a and b rr aa,pp : rrrrrrrrrrrr oooo uuuuuuuu aa tttt iiiiiiii pp PP: aa ssssss oooo iiiiiiiiii ttttttt aaaaaa rrrrrrrrrr bbbb bbbbbbb aa aaaaaa bb rr aa, rr bb : aaaaaaaaaaaaaa rrrrrrrrrrrr oooo uuuuuuuu aa aaaaaa bb Or, ssssss aa, bb = cccccc(rr aa,rr bb ) σσ rr aa σσ(rr bb ) cccccc(rr aa, rr bb ): covariance between a and b σσ rr aa, σσ rr bb : standard deviation of a and b 16

ssssss AAAAAAAAAA, UUUUUUUUU Example rr AAAAAAAAAA = 5+3+4+4 = 4; σσ AAAAAAAAAA = 0.707 4 rr UUUUUUUUU = 3+1+2+3 = 2.25; σσ UUUUUUUUU = 0.9574 4 cccccc AAAAAAAAAA, UUUUUUUUU = 0.6667; =>ssssss AAAAAAAAAA, UUUUUUUUUU = 00.66666666 00.777777 00.99999999 = 00. 88888888 Item1 Item2 Item3 Item4 Item5 Alice 5 3 4 4? User1 3 1 2 3 3 User2 4 3 4 3 5 User3 3 3 1 5 4 User4 1 5 5 2 1 sim = 0.85 sim = 0.70 sim = -0.79 17

(2) Decide how many peers to use Usually only use top K most similar users for prediction i.e., based on top-k most similar users rating for an item 18

(3) Predict the rating A common prediction function: Calculate, whether the neighbors' ratings for the unseen item i are higher or lower than their average Combine the rating differences use the similarity as a weight Add/subtract the neighbors' bias from the active user's average and use this as a prediction

Example Use top-2 neighbor for prediction Alice s top-2 neighbor are User1 and User2 pppppppp AAAAAAAAAA, IIIIIIIII = rr AAAAAAAAAA + ssssss AAAAAAAAAA,UUUUUUUUU rr UUUUUUUUU,IIIIIIIII rr UUUUUUUUU +ssssss AAAAAAAAAA,UUUUUUUU2 rr UUUUUUUU2,IIIIIIIII rr UUUUUUUU2 = 4 + ssssss AAAAAAAAAA,UUUUUUUUU +ssssss(aaaaaaaaaa,uuuuuuuuu) 0.85 3 2.25 +0.70 (5 3.5) 0.85+0.70 =5.0887 20

Model-based Collaborative Filtering User-based CF is said to be "memory-based" the rating matrix is directly used to find neighbors / make predictions does not scale for most real-world scenarios large e-commerce sites have tens of millions of customers and millions of items Model-based approaches based on an offline pre-processing or "model-learning" phase at run-time, only the learned model is used to make predictions models are updated / re-trained periodically large variety of techniques used model-building and updating can be computationally expensive 21

Matrix Factorization for Recommendation Map users and items into the same latent space Reference: Koren et al., Matrix Factorization Techniques for Recommender System, Computer (Volume: 42, Issue: 8), 2009 22

Now users and items are comparable Recommendation: find items that are close to users in the new space 23

Training stage Procedure Use existing matrix to learn the latent feature vector for both users and items by matrix factorization Recommendation stage Predict the score for unknown (user, item) pairs 24

Training Stage rr uuuu : the rating from u to i pp uu : the latent feature vector for user u qq ii : the latent feature vector for item I rr uuuu : score function for (u,i), rr uuuu = qq TT ii pp uu Objective function: min pp,qq uu,ii DD rr uuuu qq TT 2 2 2 ii pp uu + λλ( qqii + ppuu ) 25

Learning Algorithm Stochastic gradient descent For each rating (u, i): uuuuuuuuuuuu pp uu : pp uu pp uu + ηη ( rr uuuu rr uuuu qq ii λλpp uu ) uuuuuuuuuuuu qq ii : qq ii qq ii + ηη rr uuuu rr uuuu pp uu λλqq ii Where ηη is the learning rate 26

Prediction Stage For an unseen pair (u, i) rr uuuu = qq TT ii pp uu = pp TT uu qq ii Example: rr AAAA = pp AA TT qq WW = 1.2 1.5 + 0.8 1.7 = 3.16 27

Variations Adding biases bb uuuu = μμ + bb ii + bb uu rr uuuu = μμ + bb ii + bb uu + qq TT ii pp uu Objective function: min pp,qq,bb uu,ii DD rr uuuu μμ bb ii bb uu qq TT 2 ii pp uu + 2 2 λλ( qq ii + ppuu + uu bb 2 uu + ii bb 2 ii ) Adding temporal dynamics (tt) rr uuuu = μμ + bbii (tt) + bb uu (tt) + qq TT ii pp uu (tt) 28

Results 29

Implicit Feedback Models Only implicit signals are received E.g., click though, music streaming play Methods: Turn it into binary classification problem: Logistic Matrix Factorization Johnson, Logistic Matrix Factorization for Implicit Feedback Data, NIPS workshop 2014 Turn it into ranking problem: BPR: Bayesian Personalized Ranking Rendel et al., BPR: Bayesian Personalized Ranking from Implicit Feedback, UAI 09 30

Logistic MF Model: Loss function For each user-item pair: JJ uuuu = 11 lui =1 logp l ui = 1 11 lui =0 logp(l ui = 0) 31

Bayesian Ranking Data re-arrangement: DD ss = {(uu, ii, jj) ii II uu + aaaaaa jj II\I uu + } For user u, s/he ranks item i higher than j, Model: pp ii > uu jj Θ = σσ( xx uuuuuu (Θ)) where xx uuuuuu = xx uuuu xx uujj and xx uuuu =< ww uu, hh i > Loss Function 32

Issues of CF Cold Start: There needs to be enough other users already in the system to find a match. Sparsity: If there are many items to be recommended, even if there are many users, the user/ratings matrix is sparse, and it is hard to find users that have rated the same items. First Rater: Cannot recommend an item that has not been previously rated. New items Esoteric items Popularity Bias: Cannot recommend items to someone with unique tastes. Tends to recommend popular items. 33

Recommender Systems What is Recommender System? Collaborative Filtering Content-based Recommendation Hybrid methods Evaluation Metrics Summary 34

Content-based recommendation Collaborative filtering does NOT require any information about content, However, it might be reasonable to exploit such information E.g. recommend fantasy novels to people who liked fantasy novels in the past What do we need: Some information about the available items such as the genre ("content") Some sort of user profile describing what the user likes (the preferences) The task: Learn user preferences Locate/recommend items that are "similar" to the user preferences

Content representation and item similarities User profile Item Simple approach Compute the similarity of an unseen item with the user profile based on the keyword overlap (e.g. using the Dice coefficient) sim(b i, b j ) = 2 kkkkkkkkkkkkkkkk bb ii kkkkkkkkkkkkkkkk bb jj kkkkkkkkkkkkkkkk bb ii + kkkkkkkkkkkkkkkk bb jj Other advanced similarity measure

Recommender Systems What is Recommender System? Collaborative Filtering Content-based Recommendation Hybrid methods Evaluation Metrics Summary 37

Hybrid Methods Combining both user-item interaction and other external sources of information One example: Factorization Machines Steffen Rendle, Factorization Machines, in ICDM 10, Sydney, Australia. 38

Factorization Machines Treat each user-item transaction as one data point U = {Alice (A), Bob (B), Charlie (C), } I = {Titanic (TI), Notting Hill (NH), Star Wars (SW), Star Trek (ST),..} S = {(A, TI, 2010-1, 5), (A,NH, 2010-2, 3), (A, SW, 2010-4, 1), (B, SW, 2009-5, 4), (B, ST, 2009-8, 5), (C, TI, 2009-9, 1), (C, SW, 2009-12, 5)} 39

FM: Feature Preparation Each data point has a feature vector x, and a target value (e.g., rating score) 40

The Model Model second-order interaction to overcome the sparsity ww 0 : gggggggggggg bbbbbbbb ww ii : sssssssssssssss oooo iiiii vvvvvvvvvvvvvvvv ww iiii =< vv ii, vv jj > : sssssssssssssss oooo ttttt iiiiiiiiiiiiiiiiiiiiii oooo iiiii aaaaaa jjjjj vvvvvvvvvvvvvvvv E.g., interaction between Alice and Titanic, or Alice and Bob 41

O(kn) Time Complexity of Second-Order Interaction k: dimension of v; n: dimension of x 42

Apply to Recommendation Explicit Feedback: Treat it as a prediction task, with mean square error loss Implicit Feedback: Treat it as a binary classification or ranking task, with logistic loss or pairwise logistic loss Learning: Stochastic gradient descent 43

Recommender Systems What is Recommender System? Collaborative Filtering Content-based Recommendation Hybrid methods Evaluation Metrics Summary 44

Accuracy measures: Explicit Feedback Datasets with items rated by users MovieLens datasets 100K-10M ratings Netflix 100M ratings Historic user ratings constitute ground truth Metrics measure error rate Mean Absolute Error (MAE) computes the deviation between predicted ratings and actual ratings Root Mean Square Error (RMSE) is similar to MAE, but places more emphasis on larger deviation

Implicit Feedback: Precision and Recall Precision: a measure of exactness, determines the fraction of relevant items retrieved out of all items retrieved E.g. the proportion of recommended movies that are actually good Recall: a measure of completeness, determines the fraction of relevant items retrieved out of all relevant items E.g. the proportion of all good movies recommended

More Implicit Feedback Measures Precision@k; recall@k AUC: Area under ROC curve Area under Precision-Recall Curve MRR: Mean reciprocal rank over a set of queries Q MMMMMM = 1 QQ 1 QQ ii=1, rrrrrrkk rrrrrrkk ii is the rank ii position of the first relevant item for the ith query 47

Recommender Systems What is Recommender System? Collaborative Filtering Content-based Recommendation Hybrid methods Evaluation Metrics Summary 48

Recommendation Summary User-based CF, matrix factorization-based CF Explicit feedback, implicit feedback Content-based recommendation Hybrid methos Evaluation 49

References http://ijcai13.org/files/tutorial_slides/td3. pdf http://research.microsoft.com/pubs/1153 96/EvaluationMetrics.TR.pdf https://datajobs.com/data-sciencerepo/recommender-systems-[netflix].pdf http://www.librec.net/tutorial.html 50