Collaborative Filtering Matrix Completion Alternating Least Squares

Size: px

Start display at page:

Download "Collaborative Filtering Matrix Completion Alternating Least Squares"

Stanley Tucker
6 years ago
Views:

1 Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016 Sham Kakade

2 Collaborative Filtering Goal: Find movies of interest to a user based on movies watched by the user and others Methods: matrix factorization Sham Kakade

3 Women on the Verge of a Nervous Breakdown The Celebration What do I recommend??? City of God Wild Strawberries La Dolce Vita Sham Kakade

Netflix Prize Given 100 million ratings on a scale of 1 to 5, predict 3 million ratings to highest accuracy 17770 total

4 Netflix Prize Given 100 million ratings on a scale of 1 to 5, predict 3 million ratings to highest accuracy total movies total users Over 8 billion total ratings How to fill in the blanks? Sham Kakade 2016 Figures from Ben Recht 4

5 Matrix Completion Problem X = X ij known for black cells X ij unknown for white cells Rows index users movies Columns index movies users Filling missing data? Sham Kakade

6 Interpreting Low-Rank Matrix Completion (aka Matrix Factorization) X = L R Sham Kakade

7 Identifiability of Factors X = L R If r uv is described by L u, R v what happens if we redefine the topics as Then, Sham Kakade

8 Matrix Completion via Rank Minimization Given observed values: Find matrix Such that: But Introduce bias: Two issues: Sham Kakade

9 Approximate Matrix Completion Minimize squared error: (Other loss functions are possible) Choose rank k: Optimization problem: Sham Kakade

10 Coordinate Descent for Matrix Factorization Fix movie factors, optimize for user factors First observation: Sham Kakade

11 Minimizing Over User Factors For each user u: In matrix form: Second observation: Solve by Sham Kakade

12 Coordinate Descent for Matrix Factorization: Alternating Least-Squares Fix movie factors, optimize for user factors Independent least-squares over users Fix user factors, optimize for movie factors Independent least-squares over movies System may be underdetermined: Converges to Sham Kakade

13 Effect of Regularization X = L R Sham Kakade

14 What you need to know Matrix completion problem for collaborative filtering Over-determined -> low-rank approximation Rank minimization is NP-hard Minimize least-squares prediction for known values for given rank of matrix Must use regularization Coordinate descent algorithm = Alternating Least Squares Sham Kakade

15 Case Study 4: Collaborative Filtering SGD for Matrix Completion, more algorithms (Matrix-norm Minimization) Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016 Sham Kakade

16 Stochastic Gradient Descent Observe one rating at a time r uv Gradient observing r uv : Updates: Sham Kakade

17 Local Optima v. Global Optima We are solving: We (kind of) wanted to solve: Which is NP-hard How do these things relate??? Sham Kakade

18 Eigenvalue Decompositions for PSD Matrices Given a (square) symmetric positive semidefinite matrix: Eigenvalues: Thus rank is: Approximation: Property of trace: Thus, approximate rank minimization by: Sham Kakade

19 Generalizing the Trace Trick Non-square matrices have no trace For (square) positive semidefinite matrices, eigendecomposition: For rectangular matrices, singular value decomposition: Nuclear norm: Sham Kakade

20 Nuclear Norm Minimization Optimization problem: Possible to relax equality constraints: Both are convex problems! (solved by semidefinite programming) Sham Kakade

21 Nuclear Norm Minimization vs. Direct (Bilinear) Low Rank Solutions Nuclear norm minimization: Annoying because: Instead: Annoying because: But So And Under certain conditions [Burer, Monteiro 04] Sham Kakade

22 Nuclear Norm Minimization vs. Direct (Bilinear) Low Rank Solutions Nuclear norm minimization: Annoying because: Instead: Annoying because: But So And Under certain conditions [Burer, Monteiro 04] Sham Kakade

23 Theory Suppose true matrix is exactly low rank, what might we hope for? Exact recovery? Statistically? Computationally? Is this possible? Assumptions: Sham Kakade

24 Analysis of Nuclear Norm Nuclear norm minimization = convex relaxation of rank minimization: Theorem [Candes, Recht 08]: If there is a true matrix of rank k, And, we observe at least random entries of true matrix Then true matrix is recovered exactly with high probability via convex nuclear norm minimization! Under certain conditions Sham Kakade

25 Alternating Minimization Alt. Min. used in practice. Alt. Min. was widely thought to be a search heuristic, does it work? Sham Kakade

26 SGD SGD also widely used in practice (streaming, fast) Does it work? Sham Kakade

27 What you need to know Stochastic gradient descent for matrix factorization Norm minimization as convex relaxation of rank minimization Trace norm for PSD matrices Nuclear norm in general Intuitive relationship between nuclear norm minimization and direct (bilinear) minimization Sham Kakade

28 Case Study 4: Collaborative Filtering Nonnegative Matrix Factorization Projected Gradient Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 6 th, 2015 Sham Kakade

29 Matrix factorization solutions can be unintuitive Many, many, many applications of matrix factorization E.g., in text data, can do topic modeling (alternative to LDA): X = L R Would like: But Sham Kakade

30 Nonnegative Matrix Factorization X = L R Just like before, but Constrained optimization problem Many, many, many, many solution methods we ll check out a simple one Sham Kakade

31 Recall: Projected Gradient Standard optimization: Want to minimize: Use, e.g., gradient updates: Constrained optimization: Given convex set C of feasible solutions Want to find minima within C: Projected gradient: Take a gradient step (ignoring constraints): Projection into feasible set: Sham Kakade

32 Projected Stochastic Gradient Descent for Nonnegative Matrix Factorization Gradient step observing r uv ignoring constraints: Convex set: Projection step: Sham Kakade

33 What you need to know In many applications, want factors to be nonnegative Corresponds to constrained optimization problem Many possible approaches to solve, e.g., projected gradient Sham Kakade

34 Case Study 4: Collaborative Filtering Cold Start Problem Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016 Sham Kakade

35 Cold-Start Problem Challenge: Cold-start problem (new movie or user) Methods: use features of movie/user IN THEATERS Sham Kakade

36 Cold-Start Problem More Formally Consider a new user u and predicting that user s ratings No previous observations Objective considered so far: Optimal user factor: Predicted user ratings: Sham Kakade

37 An Alternative Formulation A simpler model for collaborative filtering We would not have this issue if we assumed all users were identical What about for new movies? What if we had side information? What dimension should w be? Fit linear model: Minimize: Sham Kakade

38 Personalization If we don t have any observations about a user, use wisdom of the crowd Address cold-start problem Clearly, not all users are the same Just as in personalized click prediction, consider model with global and userspecific parameters As we gain more information about the user, forget the crowd Sham Kakade

39 User Features In addition to movie features, may have information about the user: Combine with features of movie: Unified linear model: Sham Kakade

40 Feature-Based Approach vs. Matrix Factorization Feature-based approach: Feature representation of user and movies fixed Can address cold-start problem Matrix factorization approach: Suffers from cold-start problem User & movie features are learned from data A unified model: Sham Kakade

41 Unified Collaborative Filtering via SGD Gradient step observing r uv For L,R For w and w u : Sham Kakade

42 What you need to know Cold-start problem Feature-based methods for collaborative filtering Help address cold-start problem Unified approach Sham Kakade

Collaborative Filtering

Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin