Collaborative Filtering

Similar documents
Collaborative Filtering Matrix Completion Alternating Least Squares

Using SVD to Recommend Movies

Low Rank Matrix Completion Formulation and Algorithm

Matrix Factorizations: A Tale of Two Norms

Lecture Notes 10: Matrix Factorization

Symmetric Factorization for Nonconvex Optimization

Spectral k-support Norm Regularization

Maximum Margin Matrix Factorization

Logistic Regression Logistic

Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering

Recommendation Systems

* Matrix Factorization and Recommendation Systems

A Simple Algorithm for Nuclear Norm Regularized Problems

ECS289: Scalable Machine Learning

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

EE 381V: Large Scale Learning Spring Lecture 16 March 7

Generalized Conditional Gradient and Its Applications

Rank minimization via the γ 2 norm

Nonnegative Matrix Factorization

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

CS168: The Modern Algorithmic Toolbox Lecture #18: Linear and Convex Programming, with Applications to Sparse Recovery

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Linear dimensionality reduction for data analysis

Lecture 9: September 28

Structured matrix factorizations. Example: Eigenfaces

6.034 Introduction to Artificial Intelligence

1-Bit Matrix Completion

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer

High-dimensional Statistics

Generative Models for Discrete Data

Matrix Completion: Fundamental Limits and Efficient Algorithms

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

1-Bit Matrix Completion

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Data Mining Techniques

1-Bit Matrix Completion

Collaborative Filtering. Radek Pelánek

High-dimensional Statistical Models

Matrix Factorization and Collaborative Filtering

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Statistical Performance of Convex Tensor Decomposition

Homework 4. Convex Optimization /36-725

A Quick Tour of Linear Algebra and Optimization for Machine Learning

Andriy Mnih and Ruslan Salakhutdinov

Approximate SDP solvers, Matrix Factorizations, the Netflix Prize, and PageRank. Mittagseminar Martin Jaggi, Oct

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Matrix Completion from Fewer Entries

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Lecture 5 : Projections

Overview. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Convex sets, conic matrix factorizations and conic rank lower bounds

Scaling Neighbourhood Methods

Data Science Mastery Program

Provable Alternating Minimization Methods for Non-convex Optimization

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

LMI Methods in Optimal and Robust Control

From Compressed Sensing to Matrix Completion and Beyond. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

The convex algebraic geometry of rank minimization

Matrix Factorization Techniques for Recommender Systems

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Kernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin

Chapter 2. Optimization. Gradients, convexity, and ALS

SDP Relaxations for MAXCUT

1 Non-negative Matrix Factorization (NMF)

Information Retrieval

CS281 Section 4: Factor Analysis and PCA

Compressed Sensing and Neural Networks

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

Quick Introduction to Nonnegative Matrix Factorization

Kernelized Perceptron Support Vector Machines

a Short Introduction

Lecture 8: February 9

Algorithms for Collaborative Filtering

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Collaborative Filtering: A Machine Learning Perspective

Lock-Free Approaches to Parallelizing Stochastic Gradient Descent

10-725/36-725: Convex Optimization Prerequisite Topics

Lecture 8: Linear Algebra Background

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Graphical Models for Collaborative Filtering

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

Adaptive one-bit matrix completion

Canonical Problem Forms. Ryan Tibshirani Convex Optimization

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Convex and Nonsmooth Optimization: Assignment Set # 5 Spring 2009 Professor: Michael Overton April 23, 2009

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Recommendation Systems

Binary matrix completion

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

PU Learning for Matrix Completion

Large Scale Data Analysis Using Deep Learning

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Transcription:

Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin February 28 th, 2013 Carlos Guestrin 2013 1 Collaborative Filtering Goal: Find movies of interest to a user based on movies watched by the user and others Methods: matrix factorization, GraphLab Carlos Guestrin 2013 2 1

Women on the Verge of a Nervous Breakdown The Celebra8on What do I recommend??? recommend City of God Wild Strawberries La Dolce Vita Carlos Guestrin 2013 3 Cold-Start Problem Challenge: Cold-start problem (new movie or user) Methods: use features of movie/user IN THEATERS Carlos Guestrin 2013 4 2

Netflix Prize Given 100 million ratings on a scale of 1 to 5, predict 3 million ratings to highest accuracy 17770 total movies 480189 total users Over 8 billion total ratings How to fill in the blanks? Figures from Ben Recht Carlos Guestrin 2013 5 Matrix Completion Problem = Filling missing data? ij known for black cells ij unknown for white cells Rows index users movies Columns index index movies users Carlos Guestrin 2013 6 3

Interpreting Low-Rank Matrix Completion (aka Matrix Factorization) = L R Carlos Guestrin 2013 7 Matrix Completion via Rank Minimization Given observed values: Find matrix Such that: But Introduce bias: Two issues: Carlos Guestrin 2013 8 4

Approximate Matrix Completion Minimize squared error: (Other loss functions are possible) Choose rank k: Optimization problem: Carlos Guestrin 2013 9 Coordinate Descent for Matrix Factorization L,R (L u R v r uv ) 2 (u,v,r uv)2:r uv6=? Fix movie factors, optimize for user factors First Observation: Carlos Guestrin 2013 10 5

Minimizing Over User Factors For each user u: L u v2v u (L u R v r uv ) 2 In matrix form: Second observation: Solve by Carlos Guestrin 2013 11 Coordinate Descent for Matrix Factorization: Alternating Least-Squares L,R (L u R v r uv ) 2 (u,v,r uv)2:r uv6=? Fix movie factors, optimize for user factors Independent least-squares over users (L u R v r uv ) 2 L u v2v u Fix user factors, optimize for movie factors Independent least-squares over movies (L u R v r uv ) 2 R v u2u v System may be underdetered: Converges to Carlos Guestrin 2013 12 6

Effect of Regularization L,R (L u R v r uv ) 2 (u,v,r uv)2:r uv6=? = L R Carlos Guestrin 2013 13 What you need to know Matrix completion problem for collaborative filtering Over-detered -> low-rank approximation Rank imization is NP-hard Minimize least-squares prediction for known values for given rank of matrix Must use regularization Coordinate descent algorithm = Alternating Least Squares Carlos Guestrin 2013 14 7

Case Study 4: Collaborative Filtering SGD for Matrix Completion Matrix-norm Minimization Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin 2013 Carlos Guestrin March 7 th, 2013 15 Stochastic Gradient Descent L,R 1 2 (L u R v r uv ) 2 + u 2 L 2 F + v 2 R 2 F r uv Observe one rating at a time r uv Gradient observing r uv : Updates: Carlos Guestrin 2013 16 8

Local Optima v. Global Optima We are solving: L,R r uv (L u R v r uv ) 2 + u L 2 F + v R 2 F We (kind of) wanted to solve: Which is NP-hard How do these things relate??? Carlos Guestrin 2013 17 Eigenvalue Decompositions for PSD Matrices Given a (square) symmetric positive semidefinite matrix: Eigenvalues: Thus rank is: Approximation: Property of trace: Thus, approximate rank imization by: Carlos Guestrin 2013 18 9

Generalizing the Trace Trick Non-square matrices ain t got no trace For (square) positive definite matrices, matrix factorization: For rectangular matrices, singular value decomposition: Nuclear norm: Carlos Guestrin 2013 19 Nuclear Norm Minimization Optimization problem: Possible to relax equality constraints: Both are convex problems! (solved by semidefinite programg) Carlos Guestrin 2013 20 10

Analysis of Nuclear Norm Nuclear norm imization is a convex relaxation of rank imization problem: rank( ) r uv = uv, 8r uv 2, r uv 6=? r uv = uv, 8r uv 2, r uv 6=? Theorem [Candes, Recht 08]: If there is a true matrix of rank k, And, we observe at least random entries of true matrix Ckn 1.2 log n Then true matrix is recovered exactly with high probability with convex nuclear norm imization! Under certain conditions Carlos Guestrin 2013 21 Nuclear Norm Minimization versus Direct (Bilinear) Low Rank Solutions Nuclear norm imization: Annoying because: r uv ( uv r uv ) 2 + Instead: (L u R v L,R r uv r uv ) 2 + u L 2 F + v R 2 F Annoying because: But =inf So And L,R 1 2 L 2 F + 1 2 R 2 F : = LR 0 Under certain conditions [Burer, Monteiro 04] Carlos Guestrin 2013 22 11

What you need to know Stochastic gradient descent for matrix factorization Norm imization as convex relaxation of rank imization Trace norm for PSD matrices Nuclear norm in general Intuitive relationship between nuclear norm imization and direct (bilinear) imization Carlos Guestrin 2013 23 Case Study 4: Collaborative Filtering Nonnegative Matrix Factorization Projected Gradient Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin March 7 th, 2013 Carlos Guestrin 2013 24 12

Matrix factorization solutions can be unintuitive Many, many, many applications of matrix factorization E.g., in text data, can do topic modeling (alternative to LDA): = L R Would like: But Carlos Guestrin 2013 25 Nonnegative Matrix Factorization = L R Just like before, but L 0,R 0 r uv (L u R v r uv ) 2 + u L 2 F + v R 2 F Constrained optimization problem Many, many, many, many solution methods we ll check out a simple one Carlos Guestrin 2013 26 13

Projected Gradient Standard optimization: Want to imize: f( ) Use gradient updates: (t+1) (t) t rf( (t) ) Constrained optimization: Given convex set C of feasible solutions Want to find ima within C: f( ) 2 C Projected gradient: Take a gradient step (ignoring constraints): Projection into feasible set: Carlos Guestrin 2013 27 Projected Stochastic Gradient Descent for Nonnegative Matrix Factorization 1 (L u R v r uv ) 2 + u L 0,R 0 2 2 L 2 F + v 2 R 2 F r uv Gradient step observing r uv ignoring constraints: # " # " L(t+1) u R (t+1) v (1 t u )L (t) u (1 t v )R (t) v t t R (t) v t t L (t) u Convex set: Projection step: Carlos Guestrin 2013 28 14

What you need to know In many applications, want factors to be nonnegative Corresponds to constrained optimization problem Many possible approaches to solve, e.g., projected gradient Carlos Guestrin 2013 29 15