Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Similar documents
Matrix Factorization Techniques for Recommender Systems

Collaborative Filtering. Radek Pelánek

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Recommendation Systems

Recommendation Systems

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Andriy Mnih and Ruslan Salakhutdinov

Collaborative Filtering

Recommendation Systems

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Using SVD to Recommend Movies

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Collaborative Filtering Applied to Educational Data Mining

a Short Introduction

* Matrix Factorization and Recommendation Systems

CS425: Algorithms for Web Scale Data

Matrix Factorization and Collaborative Filtering

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

A Modified PMF Model Incorporating Implicit Item Associations

6.034 Introduction to Artificial Intelligence

Matrix Factorization Techniques for Recommender Systems

Decoupled Collaborative Ranking

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Data Science Mastery Program

Lecture Notes 10: Matrix Factorization

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

CS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering

CS425: Algorithms for Web Scale Data

Domokos Miklós Kelen. Online Recommendation Systems. Eötvös Loránd University. Faculty of Natural Sciences. Advisor:

Algorithms for Collaborative Filtering

Collaborative Filtering with Temporal Dynamics with Using Singular Value Decomposition

Introduction to Computational Advertising

Generative Models for Discrete Data

Collaborative Recommendation with Multiclass Preference Context

Ranking and Filtering

Jeffrey D. Ullman Stanford University

The BigChaos Solution to the Netflix Prize 2008

Collaborative Filtering Matrix Completion Alternating Least Squares

Scaling Neighbourhood Methods

Recommender Systems. From Content to Latent Factor Analysis. Michael Hahsler


Impact of Data Characteristics on Recommender Systems Performance

Data Mining Techniques

Structured matrix factorizations. Example: Eigenfaces

Low Rank Matrix Completion Formulation and Algorithm

EE 381V: Large Scale Learning Spring Lecture 16 March 7

Content-based Recommendation

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Recommender System for Yelp Dataset CS6220 Data Mining Northeastern University

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Collaborative Filtering on Ordinal User Feedback

Collaborative Topic Modeling for Recommending Scientific Articles

Recommender systems, matrix factorization, variable selection and social graph data

Large-scale Collaborative Ranking in Near-Linear Time

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Collaborative Filtering

The Pragmatic Theory solution to the Netflix Grand Prize

Scalable Hierarchical Recommendations Using Spatial Autocorrelation

Collaborative Filtering: A Machine Learning Perspective

Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Prediction in Social Networks

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Matrix and Tensor Factorization from a Machine Learning Perspective

Collaborative topic models: motivations cont

Dimensionality Reduction

Large-scale Ordinal Collaborative Filtering

Circle-based Recommendation in Online Social Networks

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Predicting the Performance of Collaborative Filtering Algorithms

The BellKor Solution to the Netflix Grand Prize

Lecture 2 Part 1 Optimization

Data Mining Techniques

Joint user knowledge and matrix factorization for recommender systems

Data Mining and Matrices

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Matrix Factorization and Factorization Machines for Recommender Systems

Collaborative Filtering for Implicit Feedback

Problems. Looks for literal term matches. Problems:

14 Singular Value Decomposition

Recommender Systems: Overview and. Package rectools. Norm Matloff. Dept. of Computer Science. University of California at Davis.

arxiv: v2 [cs.ir] 14 May 2018

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Quick Introduction to Nonnegative Matrix Factorization

CS281 Section 4: Factor Analysis and PCA

Adaptive one-bit matrix completion

2.3. Clustering or vector quantization 57

MATRIX RECOVERY FROM QUANTIZED AND CORRUPTED MEASUREMENTS

ISyE 691 Data mining and analytics

Restricted Boltzmann Machines for Collaborative Filtering

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

SQL-Rank: A Listwise Approach to Collaborative Ranking

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Dimension Reduction and Iterative Consensus Clustering

Clustering based tensor decomposition

Kernelized Matrix Factorization for Collaborative Filtering

15 Singular Value Decomposition

Transcription:

Recommender Systems Dipanjan Das Language Technologies Institute Carnegie Mellon University 20 November, 2007

Today s Outline What are Recommender Systems? Two approaches Content Based Methods Collaborative Filtering Details of the Netflix Progress Prize Paper

Today s Outline What are Recommender Systems? Two approaches Content Based Methods Collaborative Filtering Details of the Netflix Progress Prize Paper

Recommender Systems Aim to Measure the user interest in items or products Provide personalized recommendations suiting her taste Bell et al, 2007

Recommender Systems Broadly: profiling user preferences modeling user-product interaction Bell et al, 2007

Content Based Approaches Building profile for each user and product user profile: demographic information, answer to a questionnaire product profile: movie genre, actors, box office popularity... Bell et al, 2007

Content Based Approaches Building profile for each user and product Resulting profiles help find a match between users and products Bell et al, 2007

Content Based Approaches Cons Requires gathering external information genre, popularity at box office, etc. Not easy to create Bell et al, 2007

Collaborative Filtering Coined by Goldberg et al, 1992. Basic principles Analysis of user-product dependencies to identify new user-product associations No need to create explicit user profiles Bell et al, 2007

Collaborative Filtering Identification of pairs of items rated similarly or rated by like-minded users Only requirement is the past behavior of users Domain independent but addresses elusive aspects of data Bell et al, 2007

Collaborative Filtering Users Items

Collaborative Filtering Ratings Users Items

Collaborative Filtering User u Item i

Collaborative Filtering Estimation Problem User u Item i

Robert M. Bell, Yehuda Koren and Chris Volinsky. Modeling relationships at multiple scales to improve accuracy of large recommender systems. Proceedings of KDD 2007

Paper Outline Introduction Neighborhood Based Approaches Factorization Based Approaches Experiments Results Conclusion

Paper Outline Introduction Neighborhood Based Approaches Factorization Based Approaches Experiments Results Conclusion

Bell et al, 2007 The Netflix progress prize winning team Netflix Data >100 million movie ratings ~480,000 real customers 17,770 movies

Bell et al, 2007 Netflix data is many times larger than data used in previous research Potential to reduce gap between scientific research and real world CF systems

Paper Outline Introduction Neighborhood Based Approaches Factorization Based Approaches Experiments Results Conclusion

Neighborhood Based Approaches In order to estimate rui, the rating for an user u for an item i, a set of neighboring users N(u;i) are used these users tend to rate items similarly to u they actually rated item i i.e. rvi is known for v N(u;i)

Collaborative Filtering User u Item i

Collaborative Filtering v1 v2 v3 v4 User u Item i

Neighborhood Based Approaches A weighted average of the neighbor s ratings r ui = v N(u;i) s uvr vi v N(u;i) s uv

Neighborhood Based Approaches A weighted average of the neighbor s ratings r ui = v N(u;i) s uvr vi v N(u;i) s uv Similarities

Neighborhood Based Approaches A weighted average of the neighbor s ratings r ui = v N(u;i) s uvr vi v N(u;i) s uv Similarities Often, Pearson s Correlation Coefficient or Cosine Similarity

Neighborhood Based Approaches Analogous alternative is to use itemoriented approach A set of neighboring items N(i;u) are used These items are rated similarly as i by other users

Collaborative Filtering User u Item i

Collaborative Filtering User u j1 Item i j2 j3

Neighborhood Based Approaches r ui = j N(i;u) s ijr uj j N(i;u) s ij

Neighborhood Based Approaches r ui = j N(i;u) s ijr uj j N(i;u) s ij Similarities Again, Pearson s Correlation Coefficient or Cosine Similarity

Neighborhood Based Approaches Sarwar et al. (2000) found that the item oriented approach worked better Item oriented approach is computationally efficient because number of items is much less than number of users Extremely popular methods

Neighborhood Based Approaches Problems: Heuristic nature of the similarity values: suv or sij Different rating algorithms use different measures

Neighborhood Based Approaches Problems: These methods do not account for interaction between neighbors Each similarity sij between i and j N(i;u) is computed independently of other similarities sik for k N(i;u) - j

Neighborhood Based Approaches Problems For example, if there are three movies in the set - the LOTR trilogy The algorithm ignores the similarity of the three movies when predicting the rating for another movie

Neighborhood Based Approaches Bell et al. provide solutions They use an item oriented approach Instead of similarities, they use the term weights or wij Computation of the weights happen together Dependencies between neighbors taken care of

Neighborhood Based Approaches In the first step, neighbors are selected Among all items rated by u, the g most similar to i are selected Similarity is by correlation coefficient This set is called N(i;u) as before

Neighborhood Based Approaches The revised definition r ui = j N(i;u) w ijr uj j N(i;u) w ij w ij 0

Neighborhood Based Approaches The revised definition r ui = j N(i;u) w ijr uj j N(i;u) w ij w ij 0 Prevents overfitting

Neighborhood Based Approaches Let U(i) be the set of users who rated item i [of course, user u U(i)] For each user v U(i), let N(i;u,v) denote the subset of N(i;u) that includes items rated by v

Neighborhood Based Approaches For each user v U(i), we seek weights that will perfectly interpolate the rating of i from the ratings of the given neighbors Therefore, r vi = j N(i;u,v) w ijr vj j N(i;u,v) w ij

Neighborhood Based Approaches r vi = j N(i;u,v) w ijr vj j N(i;u,v) w ij Only unknowns

Neighborhood Based Approaches r vi = j N(i;u,v) w ijr vj j N(i;u,v) w ij Only unknowns Set of items that v has rated and is a subset of N(i;u)

Neighborhood Based Approaches r vi = j N(i;u,v) w ijr vj j N(i;u,v) w ij Only unknowns Set of items that v has rated and is a subset of N(i;u) one equation and N(i;u,v) unknowns, many solutions

Neighborhood Based Approaches From r vi = j N(i;u,v) w ijr vj j N(i;u,v) w ij To min w v U(i) ( r vi j N(i;u,v) w ijr vj j N(i;u,v) w ij ) 2

Neighborhood Based Approaches From Weights that work well for all users. j N(i;u,v) w ijr A least vj r vi = squares problem j N(i;u,v) w ij To min w v U(i) ( r vi j N(i;u,v) w ijr vj j N(i;u,v) w ij ) 2

Neighborhood Based Approaches However, From it treats all users in U(i) equally. We should give j N(i;u,v) w more ijr weight to vj rusers vi = who rated many items of N(i;u) j N(i;u,v) w ij To min w v U(i) ( r vi j N(i;u,v) w ijr vj j N(i;u,v) w ij ) 2

Neighborhood Based Approaches However, From it treats all users in U(i) equally. We should give j N(i;u,v) w more ijr weight to vj rusers vi = who rated many items ( of N(i;u) ) Further to j N(i;u,v) w ij min w v U(i) c i (r vi j N(i;u,v) w ijr vj j N(i;u,v) w ij ) 2 / v U(i) c i

Neighborhood Based Approaches However, From it treats all users in U(i) equally. We should give j N(i;u,v) w more ijr weight to vj r users vi = who rated many items ( of ) N(i;u) j N(i;u,v) w ij ( ) ( min w Further to v U(i) c i (r vi e c i = ( j N(i;u,v) w ij) 2. this point, we switch to matr j N(i;u,v) w ijr vj j N(i;u,v) w ij ) 2 / v U(i) c i

Neighborhood Based Approaches The authors next convert the optimization problem into a quadratic program They claim that the solution is found in 3-4 iterations They also provide a revised model that tries to deal with the sparsity problem of the rating matrix

Neighborhood Based Approaches The revised model assumes that the matrix is dense, but accounts for the sparseness by shrinking Shrinking is the process of penalizing parameters that have less data associated with them In another revised model, the authors use user-user similarities along with itemitem similarities

Neighborhood Based Approaches The authors also remove global effects from the data An example is the tendency for ratings of some items and by some users to differ systematically from the average These effects are removed and the residual data gives better results

Paper Outline Introduction Neighborhood Based Approaches Factorization Based Approaches Experiments Results Conclusion

Factorization Based Approaches Limited set of features computed for all users and items Allows linking users with items and then estimating the associated ratings Feature example: movie genres

Factorization Based Approaches A goal can be placing each movie and each user within these genre-oriented scales When given a certain user-movie pair, the rating can be estimated by the closeness of the features representing the movie and the user

Factorization Based Approaches Pertains to the aim in content based methods Goal is to undercover latent features of the given data to explain the ratings A surrogate for external information Techniques like Singular Value Decomposition (SVD) or Principal Component Analysis (PCA)

Factorization Based Approaches Given an m x n matrix R, SVD computes the best rank-f approximation R f R f is defined as the product of two rank-f matrices Pm x f and Qn x f In other words, R f = PQ T R f captures the f most prominent features of the data, leaving out noisy portions

Factorization Based Approaches Each unknown rating rui is estimated as R f ui, the dot product of the u-th row of P and the i-th row of Q However, SVD computation can only work when all entries of R are known The goal of SVD is undefined when many entries in R are missing

Factorization Based Approaches The authors provide a solution EM algorithm for PCA (Roweis, 1997) We try to compute rank-f matrices Q and P that will minimize R - PQ T F (Frobenius Norm)

Factorization Based Approaches We can fix the matrix P as some matrix ˆP This implies that the minimization of R P Q T would be equivalent to the least squares solution of R = ˆP Q T F Similarly, we can fix the matrix Q as some matrix ˆQ The minimization would be equivalent to the least square solution of R = P ˆQ T ˆ

Factorization Based Approaches These least square problems can be minimized by setting Q T = ( ˆP T ˆP ) 1 ˆP T R and P = R ˆQ( ˆQ T ˆQ) 1 This will be an iterative process that recomputes matrices P and Q.

( ) ( Factorization Based Approaches llows: Q T P ( ) 1 P T P P T R RQ ( ) 1 Q T Q

( ) ( Factorization Based Approaches llows: Q T ( ) 1 P T P P T R P RQ ( ) 1 Q T Q Can be shown that there is one possible global minimum (Roweis, 1997)

( ) ( Factorization Based Approaches llows: Q T ( ) 1 P T P P T R P RQ ( ) 1 Q T Q This has an ability to deal with missing values

Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K

Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K u-th row of P

Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K i-th row of Q

Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K set of known ratings

Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K What should be the value of f, the rank of P and Q?

Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K As f grows, there is more flexibility in minimizing the error

Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K It is found that achieving a low error might involve result in overfitting

Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K Also, it showed that using f>2 resulted in bad estimation quality

Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K The authors used shrinkage to alleviate the overfitting problem

Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K They computed the factors one by one, and used shrinking the results after each step

ComputeNextFactor(Known ratings: r ui, User factors Q m f, Item factors Q n f ) % Compute f-th column of matrices P and Q to fit given ratings % Columns 1,..., f 1 of P and Q were already computed Constants: α = 25, ɛ = 10 4 % Compute residuals portion not explained by previous factors for each given rating r ui do res ui r ui f 1 l=1 P ulq il res ui n uires ui n ui % shrinkage +αf % Compute the f-th factor for each user and item by solving % many least squares problems, each with a single unknown while Err(P new, Q new )/Err(P old, Q old ) < 1 ɛ for each user u = 1,..., n do P uf i:(u,i) K res uiq if i:(u,i) K Q2 if for each item i = 1,..., m do Q if return P, Q u:(u,i) K res uip uf u:(u,i) K P 2 uf This way, we compute f factors by calling the function Com-

Neighborhood Aware Factorization The mentioned factorization based method describes a user u as a fixed linear combination of the f movie factors The fixed linear combination transformed to a more adaptive linear combination that changes as a function of the item i to be rated by u

Paper Outline Introduction Neighborhood Based Approaches Factorization Based Approaches Experiments Results Conclusion

Experiments Evaluated on the Netflix data Performance measured as Root Mean Squared Error (RMSE) puts more emphasis on large errors than averaged absolute error

Experiments Two datasets: the Probe Set and the Quiz Set Contained 1.4 million user ratings each performed by the users The Probe Set was a part of the training data with the true ratings Benchmark: Cinemagic from Netflix with RMSE=0.9514

Paper Outline Introduction Neighborhood Based Approaches Factorization Based Approaches Experiments Results Conclusion

Results on Probe Set: Neighborhood Based Approach!"#$%&'(%)*+,-./01'!"#%!"#(%!"#(!"#$!"#'%!"#'!"#&% *+,,-./01+234/5-6 7+6-.84/5-6 7+6-.84/5-639:;5-,!"#&!"#$%!"#$! $! &! '! (! %! )! )*+,-./01'

Results on Probe Set: Neighborhood Based Approach!"#$%&'(%)*+,-./01' Shrunk Correlation!"#% Coefficients!"#(%!"#(!"#$!"#'%!"#'!"#&% *+,,-./01+234/5-6 7+6-.84/5-6 7+6-.84/5-639:;5-,!"#&!"#$%!"#$! $! &! '! (! %! )! )*+,-./01'

Results on Probe Set: Neighborhood Based Approach!"#$%&'(%)*+,-./01' Reaches Peak!"#%!"#(%!"#(!"#$!"#'%!"#'!"#&% *+,,-./01+234/5-6 7+6-.84/5-6 7+6-.84/5-639:;5-,!"#&!"#$%!"#$! $! &! '! (! %! )! )*+,-./01'

Results on Probe Set: Neighborhood Based Approach!"#$%&'(%)*+,-./01' Reaches Peak!"#%!"#(% Difference!!"#(!"#$!"#'%!"#'!"#&% *+,,-./01+234/5-6 7+6-.84/5-6 7+6-.84/5-639:;5-,!"#&!"#$%!"#$! $! &! '! (! %! )! )*+,-./01'

Results on Probe Set: Neighborhood Based Approach!"#$%&'(%)*+,-./01' Reaches Peak!"#%!"#(% Difference!!"#$!"#(!"#'%!"#'!"#&%!"#&!"#$%!"#$! $! &! '! (! %! )! )*+,-./01' *+,,-./01+234/5-6 7+6-.84/5-6 7+6-.84/5-639:;5-, Modeling neighbor-neighbor relations!

Results on Probe Set: Factorization Approach!"#$%&'(%)*+,-./'!"#'$!"#'!"#&$!"#$!"#& +,-./0123/024 5/612.7,8!"#%$!"#%!"#!$! &! (! )! *! )*+,-./'

Quiz Set Results ieves a significant further decrease of the RMSE. "#$%&#!'((')*+!,-.//,! 8+'19:+'1!,-.57,! 0&)*$123&*2$4!,-.567! ;$<2'9=$<2'!,-.,/>!! 0&)*$123&*2$4!?@=$<2'9=$<2'!,-.,,A!! 0&)*$123&*2$4!?@=$<2'9=$<2'! B!:+'19:+'1!,-7..,! C$=%24&*2$4!,-7.AA!

Conclusions Good results on a real world dataset Using shrinkage to prevent overfitting of parameters Interactions between users and movies that jointly optimizes parameter estimates Incorporating local, neighborhood based estimates into a factorization model

Questions