Recommender Systems Dipanjan Das Language Technologies Institute Carnegie Mellon University 20 November, 2007
Today s Outline What are Recommender Systems? Two approaches Content Based Methods Collaborative Filtering Details of the Netflix Progress Prize Paper
Today s Outline What are Recommender Systems? Two approaches Content Based Methods Collaborative Filtering Details of the Netflix Progress Prize Paper
Recommender Systems Aim to Measure the user interest in items or products Provide personalized recommendations suiting her taste Bell et al, 2007
Recommender Systems Broadly: profiling user preferences modeling user-product interaction Bell et al, 2007
Content Based Approaches Building profile for each user and product user profile: demographic information, answer to a questionnaire product profile: movie genre, actors, box office popularity... Bell et al, 2007
Content Based Approaches Building profile for each user and product Resulting profiles help find a match between users and products Bell et al, 2007
Content Based Approaches Cons Requires gathering external information genre, popularity at box office, etc. Not easy to create Bell et al, 2007
Collaborative Filtering Coined by Goldberg et al, 1992. Basic principles Analysis of user-product dependencies to identify new user-product associations No need to create explicit user profiles Bell et al, 2007
Collaborative Filtering Identification of pairs of items rated similarly or rated by like-minded users Only requirement is the past behavior of users Domain independent but addresses elusive aspects of data Bell et al, 2007
Collaborative Filtering Users Items
Collaborative Filtering Ratings Users Items
Collaborative Filtering User u Item i
Collaborative Filtering Estimation Problem User u Item i
Robert M. Bell, Yehuda Koren and Chris Volinsky. Modeling relationships at multiple scales to improve accuracy of large recommender systems. Proceedings of KDD 2007
Paper Outline Introduction Neighborhood Based Approaches Factorization Based Approaches Experiments Results Conclusion
Paper Outline Introduction Neighborhood Based Approaches Factorization Based Approaches Experiments Results Conclusion
Bell et al, 2007 The Netflix progress prize winning team Netflix Data >100 million movie ratings ~480,000 real customers 17,770 movies
Bell et al, 2007 Netflix data is many times larger than data used in previous research Potential to reduce gap between scientific research and real world CF systems
Paper Outline Introduction Neighborhood Based Approaches Factorization Based Approaches Experiments Results Conclusion
Neighborhood Based Approaches In order to estimate rui, the rating for an user u for an item i, a set of neighboring users N(u;i) are used these users tend to rate items similarly to u they actually rated item i i.e. rvi is known for v N(u;i)
Collaborative Filtering User u Item i
Collaborative Filtering v1 v2 v3 v4 User u Item i
Neighborhood Based Approaches A weighted average of the neighbor s ratings r ui = v N(u;i) s uvr vi v N(u;i) s uv
Neighborhood Based Approaches A weighted average of the neighbor s ratings r ui = v N(u;i) s uvr vi v N(u;i) s uv Similarities
Neighborhood Based Approaches A weighted average of the neighbor s ratings r ui = v N(u;i) s uvr vi v N(u;i) s uv Similarities Often, Pearson s Correlation Coefficient or Cosine Similarity
Neighborhood Based Approaches Analogous alternative is to use itemoriented approach A set of neighboring items N(i;u) are used These items are rated similarly as i by other users
Collaborative Filtering User u Item i
Collaborative Filtering User u j1 Item i j2 j3
Neighborhood Based Approaches r ui = j N(i;u) s ijr uj j N(i;u) s ij
Neighborhood Based Approaches r ui = j N(i;u) s ijr uj j N(i;u) s ij Similarities Again, Pearson s Correlation Coefficient or Cosine Similarity
Neighborhood Based Approaches Sarwar et al. (2000) found that the item oriented approach worked better Item oriented approach is computationally efficient because number of items is much less than number of users Extremely popular methods
Neighborhood Based Approaches Problems: Heuristic nature of the similarity values: suv or sij Different rating algorithms use different measures
Neighborhood Based Approaches Problems: These methods do not account for interaction between neighbors Each similarity sij between i and j N(i;u) is computed independently of other similarities sik for k N(i;u) - j
Neighborhood Based Approaches Problems For example, if there are three movies in the set - the LOTR trilogy The algorithm ignores the similarity of the three movies when predicting the rating for another movie
Neighborhood Based Approaches Bell et al. provide solutions They use an item oriented approach Instead of similarities, they use the term weights or wij Computation of the weights happen together Dependencies between neighbors taken care of
Neighborhood Based Approaches In the first step, neighbors are selected Among all items rated by u, the g most similar to i are selected Similarity is by correlation coefficient This set is called N(i;u) as before
Neighborhood Based Approaches The revised definition r ui = j N(i;u) w ijr uj j N(i;u) w ij w ij 0
Neighborhood Based Approaches The revised definition r ui = j N(i;u) w ijr uj j N(i;u) w ij w ij 0 Prevents overfitting
Neighborhood Based Approaches Let U(i) be the set of users who rated item i [of course, user u U(i)] For each user v U(i), let N(i;u,v) denote the subset of N(i;u) that includes items rated by v
Neighborhood Based Approaches For each user v U(i), we seek weights that will perfectly interpolate the rating of i from the ratings of the given neighbors Therefore, r vi = j N(i;u,v) w ijr vj j N(i;u,v) w ij
Neighborhood Based Approaches r vi = j N(i;u,v) w ijr vj j N(i;u,v) w ij Only unknowns
Neighborhood Based Approaches r vi = j N(i;u,v) w ijr vj j N(i;u,v) w ij Only unknowns Set of items that v has rated and is a subset of N(i;u)
Neighborhood Based Approaches r vi = j N(i;u,v) w ijr vj j N(i;u,v) w ij Only unknowns Set of items that v has rated and is a subset of N(i;u) one equation and N(i;u,v) unknowns, many solutions
Neighborhood Based Approaches From r vi = j N(i;u,v) w ijr vj j N(i;u,v) w ij To min w v U(i) ( r vi j N(i;u,v) w ijr vj j N(i;u,v) w ij ) 2
Neighborhood Based Approaches From Weights that work well for all users. j N(i;u,v) w ijr A least vj r vi = squares problem j N(i;u,v) w ij To min w v U(i) ( r vi j N(i;u,v) w ijr vj j N(i;u,v) w ij ) 2
Neighborhood Based Approaches However, From it treats all users in U(i) equally. We should give j N(i;u,v) w more ijr weight to vj rusers vi = who rated many items of N(i;u) j N(i;u,v) w ij To min w v U(i) ( r vi j N(i;u,v) w ijr vj j N(i;u,v) w ij ) 2
Neighborhood Based Approaches However, From it treats all users in U(i) equally. We should give j N(i;u,v) w more ijr weight to vj rusers vi = who rated many items ( of N(i;u) ) Further to j N(i;u,v) w ij min w v U(i) c i (r vi j N(i;u,v) w ijr vj j N(i;u,v) w ij ) 2 / v U(i) c i
Neighborhood Based Approaches However, From it treats all users in U(i) equally. We should give j N(i;u,v) w more ijr weight to vj r users vi = who rated many items ( of ) N(i;u) j N(i;u,v) w ij ( ) ( min w Further to v U(i) c i (r vi e c i = ( j N(i;u,v) w ij) 2. this point, we switch to matr j N(i;u,v) w ijr vj j N(i;u,v) w ij ) 2 / v U(i) c i
Neighborhood Based Approaches The authors next convert the optimization problem into a quadratic program They claim that the solution is found in 3-4 iterations They also provide a revised model that tries to deal with the sparsity problem of the rating matrix
Neighborhood Based Approaches The revised model assumes that the matrix is dense, but accounts for the sparseness by shrinking Shrinking is the process of penalizing parameters that have less data associated with them In another revised model, the authors use user-user similarities along with itemitem similarities
Neighborhood Based Approaches The authors also remove global effects from the data An example is the tendency for ratings of some items and by some users to differ systematically from the average These effects are removed and the residual data gives better results
Paper Outline Introduction Neighborhood Based Approaches Factorization Based Approaches Experiments Results Conclusion
Factorization Based Approaches Limited set of features computed for all users and items Allows linking users with items and then estimating the associated ratings Feature example: movie genres
Factorization Based Approaches A goal can be placing each movie and each user within these genre-oriented scales When given a certain user-movie pair, the rating can be estimated by the closeness of the features representing the movie and the user
Factorization Based Approaches Pertains to the aim in content based methods Goal is to undercover latent features of the given data to explain the ratings A surrogate for external information Techniques like Singular Value Decomposition (SVD) or Principal Component Analysis (PCA)
Factorization Based Approaches Given an m x n matrix R, SVD computes the best rank-f approximation R f R f is defined as the product of two rank-f matrices Pm x f and Qn x f In other words, R f = PQ T R f captures the f most prominent features of the data, leaving out noisy portions
Factorization Based Approaches Each unknown rating rui is estimated as R f ui, the dot product of the u-th row of P and the i-th row of Q However, SVD computation can only work when all entries of R are known The goal of SVD is undefined when many entries in R are missing
Factorization Based Approaches The authors provide a solution EM algorithm for PCA (Roweis, 1997) We try to compute rank-f matrices Q and P that will minimize R - PQ T F (Frobenius Norm)
Factorization Based Approaches We can fix the matrix P as some matrix ˆP This implies that the minimization of R P Q T would be equivalent to the least squares solution of R = ˆP Q T F Similarly, we can fix the matrix Q as some matrix ˆQ The minimization would be equivalent to the least square solution of R = P ˆQ T ˆ
Factorization Based Approaches These least square problems can be minimized by setting Q T = ( ˆP T ˆP ) 1 ˆP T R and P = R ˆQ( ˆQ T ˆQ) 1 This will be an iterative process that recomputes matrices P and Q.
( ) ( Factorization Based Approaches llows: Q T P ( ) 1 P T P P T R RQ ( ) 1 Q T Q
( ) ( Factorization Based Approaches llows: Q T ( ) 1 P T P P T R P RQ ( ) 1 Q T Q Can be shown that there is one possible global minimum (Roweis, 1997)
( ) ( Factorization Based Approaches llows: Q T ( ) 1 P T P P T R P RQ ( ) 1 Q T Q This has an ability to deal with missing values
Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K
Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K u-th row of P
Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K i-th row of Q
Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K set of known ratings
Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K What should be the value of f, the rank of P and Q?
Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K As f grows, there is more flexibility in minimizing the error
Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K It is found that achieving a low error might involve result in overfitting
Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K Also, it showed that using f>2 resulted in bad estimation quality
Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K The authors used shrinkage to alleviate the overfitting problem
Factorization Based Approaches Like (Roweis, 1997), the authors minimize the squared error: Err(P, Q) def = (r ui p T u q i ) 2 (u,i) K They computed the factors one by one, and used shrinking the results after each step
ComputeNextFactor(Known ratings: r ui, User factors Q m f, Item factors Q n f ) % Compute f-th column of matrices P and Q to fit given ratings % Columns 1,..., f 1 of P and Q were already computed Constants: α = 25, ɛ = 10 4 % Compute residuals portion not explained by previous factors for each given rating r ui do res ui r ui f 1 l=1 P ulq il res ui n uires ui n ui % shrinkage +αf % Compute the f-th factor for each user and item by solving % many least squares problems, each with a single unknown while Err(P new, Q new )/Err(P old, Q old ) < 1 ɛ for each user u = 1,..., n do P uf i:(u,i) K res uiq if i:(u,i) K Q2 if for each item i = 1,..., m do Q if return P, Q u:(u,i) K res uip uf u:(u,i) K P 2 uf This way, we compute f factors by calling the function Com-
Neighborhood Aware Factorization The mentioned factorization based method describes a user u as a fixed linear combination of the f movie factors The fixed linear combination transformed to a more adaptive linear combination that changes as a function of the item i to be rated by u
Paper Outline Introduction Neighborhood Based Approaches Factorization Based Approaches Experiments Results Conclusion
Experiments Evaluated on the Netflix data Performance measured as Root Mean Squared Error (RMSE) puts more emphasis on large errors than averaged absolute error
Experiments Two datasets: the Probe Set and the Quiz Set Contained 1.4 million user ratings each performed by the users The Probe Set was a part of the training data with the true ratings Benchmark: Cinemagic from Netflix with RMSE=0.9514
Paper Outline Introduction Neighborhood Based Approaches Factorization Based Approaches Experiments Results Conclusion
Results on Probe Set: Neighborhood Based Approach!"#$%&'(%)*+,-./01'!"#%!"#(%!"#(!"#$!"#'%!"#'!"#&% *+,,-./01+234/5-6 7+6-.84/5-6 7+6-.84/5-639:;5-,!"#&!"#$%!"#$! $! &! '! (! %! )! )*+,-./01'
Results on Probe Set: Neighborhood Based Approach!"#$%&'(%)*+,-./01' Shrunk Correlation!"#% Coefficients!"#(%!"#(!"#$!"#'%!"#'!"#&% *+,,-./01+234/5-6 7+6-.84/5-6 7+6-.84/5-639:;5-,!"#&!"#$%!"#$! $! &! '! (! %! )! )*+,-./01'
Results on Probe Set: Neighborhood Based Approach!"#$%&'(%)*+,-./01' Reaches Peak!"#%!"#(%!"#(!"#$!"#'%!"#'!"#&% *+,,-./01+234/5-6 7+6-.84/5-6 7+6-.84/5-639:;5-,!"#&!"#$%!"#$! $! &! '! (! %! )! )*+,-./01'
Results on Probe Set: Neighborhood Based Approach!"#$%&'(%)*+,-./01' Reaches Peak!"#%!"#(% Difference!!"#(!"#$!"#'%!"#'!"#&% *+,,-./01+234/5-6 7+6-.84/5-6 7+6-.84/5-639:;5-,!"#&!"#$%!"#$! $! &! '! (! %! )! )*+,-./01'
Results on Probe Set: Neighborhood Based Approach!"#$%&'(%)*+,-./01' Reaches Peak!"#%!"#(% Difference!!"#$!"#(!"#'%!"#'!"#&%!"#&!"#$%!"#$! $! &! '! (! %! )! )*+,-./01' *+,,-./01+234/5-6 7+6-.84/5-6 7+6-.84/5-639:;5-, Modeling neighbor-neighbor relations!
Results on Probe Set: Factorization Approach!"#$%&'(%)*+,-./'!"#'$!"#'!"#&$!"#$!"#& +,-./0123/024 5/612.7,8!"#%$!"#%!"#!$! &! (! )! *! )*+,-./'
Quiz Set Results ieves a significant further decrease of the RMSE. "#$%&#!'((')*+!,-.//,! 8+'19:+'1!,-.57,! 0&)*$123&*2$4!,-.567! ;$<2'9=$<2'!,-.,/>!! 0&)*$123&*2$4!?@=$<2'9=$<2'!,-.,,A!! 0&)*$123&*2$4!?@=$<2'9=$<2'! B!:+'19:+'1!,-7..,! C$=%24&*2$4!,-7.AA!
Conclusions Good results on a real world dataset Using shrinkage to prevent overfitting of parameters Interactions between users and movies that jointly optimizes parameter estimates Incorporating local, neighborhood based estimates into a factorization model
Questions