Collaborative Filtering

Size: px

Start display at page:

Download "Collaborative Filtering"

Austin Preston
5 years ago
Views:

February 28 th, 2013 Carlos Guestrin 2013 1 Collaborative Filtering Goal: Find movies of interest to a

1 Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin February 28 th, 2013 Carlos Guestrin Collaborative Filtering Goal: Find movies of interest to a user based on movies watched by the user and others Methods: matrix factorization, GraphLab Carlos Guestrin

2 Women on the Verge of a Nervous Breakdown The Celebra8on What do I recommend??? recommend City of God Wild Strawberries La Dolce Vita Carlos Guestrin Cold-Start Problem Challenge: Cold-start problem (new movie or user) Methods: use features of movie/user IN THEATERS Carlos Guestrin

Figures from Ben Recht Carlos Guestrin 2013 5 Matrix Completion Problem = Filling missing data?

3 Netflix Prize Given 100 million ratings on a scale of 1 to 5, predict 3 million ratings to highest accuracy total movies total users Over 8 billion total ratings How to fill in the blanks? Figures from Ben Recht Carlos Guestrin Matrix Completion Problem = Filling missing data? ij known for black cells ij unknown for white cells Rows index users movies Columns index index movies users Carlos Guestrin

4 Interpreting Low-Rank Matrix Completion (aka Matrix Factorization) = L R Carlos Guestrin Matrix Completion via Rank Minimization Given observed values: Find matrix Such that: But Introduce bias: Two issues: Carlos Guestrin

5 Approximate Matrix Completion Minimize squared error: (Other loss functions are possible) Choose rank k: Optimization problem: Carlos Guestrin Coordinate Descent for Matrix Factorization L,R (L u R v r uv ) 2 (u,v,r uv)2:r uv6=? Fix movie factors, optimize for user factors First Observation: Carlos Guestrin

6 Minimizing Over User Factors For each user u: L u v2v u (L u R v r uv ) 2 In matrix form: Second observation: Solve by Carlos Guestrin Coordinate Descent for Matrix Factorization: Alternating Least-Squares L,R (L u R v r uv ) 2 (u,v,r uv)2:r uv6=? Fix movie factors, optimize for user factors Independent least-squares over users (L u R v r uv ) 2 L u v2v u Fix user factors, optimize for movie factors Independent least-squares over movies (L u R v r uv ) 2 R v u2u v System may be underdetered: Converges to Carlos Guestrin

7 Effect of Regularization L,R (L u R v r uv ) 2 (u,v,r uv)2:r uv6=? = L R Carlos Guestrin What you need to know Matrix completion problem for collaborative filtering Over-detered -> low-rank approximation Rank imization is NP-hard Minimize least-squares prediction for known values for given rank of matrix Must use regularization Coordinate descent algorithm = Alternating Least Squares Carlos Guestrin

8 Case Study 4: Collaborative Filtering SGD for Matrix Completion Matrix-norm Minimization Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin 2013 Carlos Guestrin March 7 th, Stochastic Gradient Descent L,R 1 2 (L u R v r uv ) 2 + u 2 L 2 F + v 2 R 2 F r uv Observe one rating at a time r uv Gradient observing r uv : Updates: Carlos Guestrin

9 Local Optima v. Global Optima We are solving: L,R r uv (L u R v r uv ) 2 + u L 2 F + v R 2 F We (kind of) wanted to solve: Which is NP-hard How do these things relate??? Carlos Guestrin Eigenvalue Decompositions for PSD Matrices Given a (square) symmetric positive semidefinite matrix: Eigenvalues: Thus rank is: Approximation: Property of trace: Thus, approximate rank imization by: Carlos Guestrin

10 Generalizing the Trace Trick Non-square matrices ain t got no trace For (square) positive definite matrices, matrix factorization: For rectangular matrices, singular value decomposition: Nuclear norm: Carlos Guestrin Nuclear Norm Minimization Optimization problem: Possible to relax equality constraints: Both are convex problems! (solved by semidefinite programg) Carlos Guestrin

11 Analysis of Nuclear Norm Nuclear norm imization is a convex relaxation of rank imization problem: rank( ) r uv = uv, 8r uv 2, r uv 6=? r uv = uv, 8r uv 2, r uv 6=? Theorem [Candes, Recht 08]: If there is a true matrix of rank k, And, we observe at least random entries of true matrix Ckn 1.2 log n Then true matrix is recovered exactly with high probability with convex nuclear norm imization! Under certain conditions Carlos Guestrin Nuclear Norm Minimization versus Direct (Bilinear) Low Rank Solutions Nuclear norm imization: Annoying because: r uv ( uv r uv ) 2 + Instead: (L u R v L,R r uv r uv ) 2 + u L 2 F + v R 2 F Annoying because: But =inf So And L,R 1 2 L 2 F R 2 F : = LR 0 Under certain conditions [Burer, Monteiro 04] Carlos Guestrin

12 What you need to know Stochastic gradient descent for matrix factorization Norm imization as convex relaxation of rank imization Trace norm for PSD matrices Nuclear norm in general Intuitive relationship between nuclear norm imization and direct (bilinear) imization Carlos Guestrin Case Study 4: Collaborative Filtering Nonnegative Matrix Factorization Projected Gradient Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin March 7 th, 2013 Carlos Guestrin

13 Matrix factorization solutions can be unintuitive Many, many, many applications of matrix factorization E.g., in text data, can do topic modeling (alternative to LDA): = L R Would like: But Carlos Guestrin Nonnegative Matrix Factorization = L R Just like before, but L 0,R 0 r uv (L u R v r uv ) 2 + u L 2 F + v R 2 F Constrained optimization problem Many, many, many, many solution methods we ll check out a simple one Carlos Guestrin

14 Projected Gradient Standard optimization: Want to imize: f( ) Use gradient updates: (t+1) (t) t rf( (t) ) Constrained optimization: Given convex set C of feasible solutions Want to find ima within C: f( ) 2 C Projected gradient: Take a gradient step (ignoring constraints): Projection into feasible set: Carlos Guestrin Projected Stochastic Gradient Descent for Nonnegative Matrix Factorization 1 (L u R v r uv ) 2 + u L 0,R L 2 F + v 2 R 2 F r uv Gradient step observing r uv ignoring constraints: # " # " L(t+1) u R (t+1) v (1 t u )L (t) u (1 t v )R (t) v t t R (t) v t t L (t) u Convex set: Projection step: Carlos Guestrin

15 What you need to know In many applications, want factors to be nonnegative Corresponds to constrained optimization problem Many possible approaches to solve, e.g., projected gradient Carlos Guestrin

Collaborative Filtering Matrix Completion Alternating Least Squares

Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016