Matrix Factorization and Collaborative Filtering

Similar documents
Approximate SDP solvers, Matrix Factorizations, the Netflix Prize, and PageRank. Mittagseminar Martin Jaggi, Oct

Data Mining and Matrices

Collaborative Filtering. Radek Pelánek

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Collaborative Filtering

Perceptron (Theory) + Linear Regression

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Matrix Factorization Techniques for Recommender Systems

* Matrix Factorization and Recommendation Systems

Data Science Mastery Program

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Andriy Mnih and Ruslan Salakhutdinov

Matrix Factorization Techniques for Recommender Systems

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Recommendation Systems

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Decoupled Collaborative Ranking

Collaborative Filtering Matrix Completion Alternating Least Squares

Recommendation Systems

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Using SVD to Recommend Movies

Recommendation Systems

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Scaling Neighbourhood Methods

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

CS425: Algorithms for Web Scale Data

Clustering based tensor decomposition

Collaborative Filtering

CS425: Algorithms for Web Scale Data

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer

SQL-Rank: A Listwise Approach to Collaborative Ranking

Generative Models for Discrete Data

CS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering

Deriving Principal Component Analysis (PCA)

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Restricted Boltzmann Machines for Collaborative Filtering

CS249: ADVANCED DATA MINING

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

MLE/MAP + Naïve Bayes

Lecture Notes 10: Matrix Factorization

Collaborative topic models: motivations cont

A Modified PMF Model Incorporating Implicit Item Associations

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Algorithms for Collaborative Filtering

Deep Learning (CNNs)

ECS289: Scalable Machine Learning

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

Probabilistic Matrix Factorization

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

Data Mining Techniques

6.034 Introduction to Artificial Intelligence

Structured matrix factorizations. Example: Eigenfaces

Large-scale Information Processing, Summer Recommender Systems (part 2)

Matrix and Tensor Factorization from a Machine Learning Perspective

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Introduction to Logistic Regression

Collaborative Filtering on Ordinal User Feedback

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Domokos Miklós Kelen. Online Recommendation Systems. Eötvös Loránd University. Faculty of Natural Sciences. Advisor:

MLE/MAP + Naïve Bayes

Ranking and Filtering

Bayesian Networks (Part II)

Hidden Markov Models

Matrix Factorization and Factorization Machines for Recommender Systems

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS

Learning representations

Mixed Membership Matrix Factorization

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

CS246 Final Exam, Winter 2011

Jeffrey D. Ullman Stanford University

a Short Introduction

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018

Collabora've Filtering

Collaborative Filtering Applied to Educational Data Mining

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Machine Learning Techniques

Collaborative Filtering: A Machine Learning Perspective

Bayesian Networks (Part I)

Large-scale Collaborative Ranking in Near-Linear Time

Principal Component Analysis

arxiv: v2 [cs.ir] 14 May 2018

Dimensionality Reduction and Principle Components Analysis

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Department of Computer Science, Guiyang University, Guiyang , GuiZhou, China

CS 572: Information Retrieval

From Non-Negative Matrix Factorization to Deep Learning

Mixed Membership Matrix Factorization

Collaborative Recommendation with Multiclass Preference Context

Recommender systems, matrix factorization, variable selection and social graph data

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

EE 381V: Large Scale Learning Spring Lecture 16 March 7

Impact of Data Characteristics on Recommender Systems Performance

Linear Regression (continued)

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Nonnegative Matrix Factorization

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Transcription:

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Matrix Factorization and Collaborative Filtering MF Readings: (Koren et al., 2009) Matt Gormley Lecture 25 April 19, 2017 1

Reminders Homework 8: Graphical Models Release: Mon, Apr. 17 Due: Mon, Apr. 24 at 11:59pm Homework 9: Applications of ML Release: Mon, Apr. 24 Due: Wed, May 3 at 11:59pm 2

Outline Recommender Systems Content Filtering Collaborative Filtering (CF) CF: Neighborhood Methods CF: Latent Factor Methods Matrix Factorization Background: Low- rank Factorizations Residual matrix Unconstrained Matrix Factorization Optimization problem Gradient Descent, SGD, Alternating Least Squares User/item bias terms (matrix trick) Singular Value Decomposition (SVD) Non- negative Matrix Factorization Extra: Matrix Multiplication in ML Matrix Factorization Linear Regression PCA (Autoencoders) K- means 7

RECOMMENDER SYSTEMS 8

Recommender Systems A Common Challenge: Assume you re a company selling items of some sort: movies, songs, products, etc. Company collects millions of ratings from users of their items To maximize profit / user happiness, you want to recommend items that users are likely to want 9

Recommender Systems 10

Recommender Systems 11

Recommender Systems 12

Recommender Systems Problem Setup 500,000 users 20,000 movies 100 million ratings Goal: To obtain lower root mean squared error (RMSE) than Netflix s existing system on 3 million held out ratings 13

Recommender Systems 14

Recommender Systems Setup: Items: movies, songs, products, etc. (often many thousands) Users: watchers, listeners, purchasers, etc. (often many millions) Feedback: 5- star ratings, not- clicking next, purchases, etc. Key Assumptions: Can represent ratings numerically as a user/item matrix Users only rate a small number of items (the matrix is sparse) Doctor Strange Star Trek: Beyond Zootopia Alice 1 5 Bob 3 4 Charlie 3 5 2 15

Recommender Systems 16

Two Types of Recommender Systems Content Filtering Example: Pandora.com music recommendations (Music Genome Project) Con: Assumes access to side information about items (e.g. properties of a song) Pro: Got a new item to add? No problem, just be sure to include the side information Collaborative Filtering Example: Netflix movie recommendations Pro: Does not assume access to side information about items (e.g. does not need to know about movie genres) Con: Does not work on new items that have no ratings 17

COLLABORATIVE FILTERING 19

Collaborative Filtering Everyday Examples of Collaborative Filtering... Bestseller lists Top 40 music lists The recent returns shelf at the library Unmarked but well- used paths thru the woods The printer room at work Read any good books lately? Common insight: personal tastes are correlated If Alice and Bob both like X and Alice likes Y then Bob is more likely to like Y especially (perhaps) if Bob knows Alice Slide from William Cohen 20

Two Types of Collaborative Filtering 1. Neighborhood Methods 2. Latent Factor Methods Figures from Koren et al. (2009) 21

Two Types of Collaborative Filtering 1. Neighborhood Methods In the figure, assume that a green line indicates the movie was watched Algorithm: 1. Find neighbors based on similarity of movie preferences 2. Recommend movies that those neighbors watched Figures from Koren et al. (2009) 22

Two Types of Collaborative Filtering Assume that both movies and users live in some low- dimensional space describing their properties Recommend a movie based on its proximity to the user in the latent space 2. Latent Factor Methods Figures from Koren et al. (2009) 23

MATRIX FACTORIZATION 24

Matrix Factorization Many different ways of factorizing a matrix We ll consider three: 1. Unconstrained Matrix Factorization 2. Singular Value Decomposition 3. Non- negative Matrix Factorization MF is just another example of a common recipe: 1. define a model 2. define an objective function 3. optimize with SGD 25

Whiteboard Matrix Factorization Background: Low- rank Factorizations Residual matrix 27

Example: MF for Netflix Problem HISTORY BOTH ROMANCE 1 2 3 4 5 6 7 NERO 1 1 1 1 1 1 1 1 1-1 - 1 JULIUS CAESAR - 1-1 CLEOPATRA - 1-1 - 1-1 SLEEPLESS IN SEATTLE PRETTY WOMAN CASABLANCA 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 HISTORY ROMANCE X HISTORY ROMANCE R U (a) Example of rank-2 matrix factorization 1 1 1 1-1 - 1-1 0 0 0 1 1 1 1 NERO JULIUS CAESAR CLEOPATRA 1 1 1 0 0 SLEEPLESS IN SEATTLE V T PRETTY WOMAN CASABLANCA 0 0 0 1 1 1 1 (a) Example of rank-2 matrix factorization NERO JULIU US CAESAR CLEO OPATRA SLEEP PLESS IN SEA ATTLE PRET TTY WOMAN CASA ABLANCA 1 0 0 0 0 0 0 HISTORY 2 0 0 0 0 0 0 3 0 0 0 0 0 0 BOTH 4 0 0 1 0 0 0 5 0 0 1 0 0 0 ROMANCE 6 0 0 1 0 0 0 7 0 0 1 0 0 0 R (b) Residual matrix Figures from Aggarwal (2016) 28

Regression vs. Collaborative Filtering Regression Collaborative Filtering TRAINING ROWS NO DEMARCATION BETWEEN TRAINING AND TEST ROWS TEST ROWS INDEPENDENT VARIABLES (a) Classification Figures from Aggarwal (2016) DEPENDENT VARIABLE NO DEMARCATION BETWEEN DEPENDENT AND INDEPENDENT VARIABLES (b) Collaborative filtering 29

UNCONSTRAINED MATRIX FACTORIZATION 30

Unconstrained Matrix Factorization Whiteboard Optimization problem SGD SGD with Regularization Alternating Least Squares User/item bias terms (matrix trick) 31

Unconstrained Matrix Factorization In- Class Exercise Derive a block coordinate descent algorithm for the Unconstrained Matrix Factorization problem. User vectors: u R r Set of non- missing entries Item vectors: i R r Rating prediction: Objective:, (u,i) Z (v ui T u i ) 2 v ui = T u i 32

Matrix Factorization (with matrices) User vectors: (W u ) T R r Item vectors: H i R r Rating prediction: Figures from Koren et al. (2009) V ui = W u H i =[WH] ui Figures from Gemulla et al. (2011) 33

Matrix Factorization (with vectors) User vectors: u Item vectors: i R r R r Rating prediction: Figures from Koren et al. (2009) v ui = T u i 34

Matrix Factorization (with vectors) Set of non- missing entries: Objective:, (u,i) Z (v ui T u i ) 2 Figures from Koren et al. (2009) 35

Matrix Factorization (with vectors) Regularized Objective:, (u,i) Z (v ui T u i ) 2 Figures from Koren et al. (2009) + ( i 2 + u 2 ) i u SGD update for random (u,i): 36

Matrix Factorization (with vectors) Regularized Objective:, (u,i) Z (v ui T u i ) 2 Figures from Koren et al. (2009) + ( i i 2 + SGD update for random (u,i): e ui v ui T u i u u + (e ui i u ) i i + (e ui u i ) u u 2 ) 37

Matrix Factorization (with matrices) User vectors: (W u ) T R r Item vectors: H i R r Rating prediction: Figures from Koren et al. (2009) V ui = W u H i =[WH] ui Figures from Gemulla et al. (2011) 38

Matrix Factorization SGD (with matrices) Figures from Koren et al. (2009) step size Figure from Gemulla et al. (2011) Figure from Gemulla et al. (2011) 39

Example Factors Factor vector 2 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Matrix Factorization Freddy Got Fingered Half Baked Julien Donkey-Boy Kill Bill: Vol. 1 Freddy vs. Jason Natural Born Killers Road Trip I Heart Huckabees Scarface Punch-Drunk Love The Royal Tenenbaums The Longest Yard Being John Malkovich The Fast and the Furious Lost in Translation Belle de Jour Armageddon Catwoman The Wizard of Oz Citizen Kane Annie Hall Coyote Ugly Sophie s Choice Maid in Manhattan Runaway Bride Moonstruck Stepmom Sister Act The Way We Were The Sound of Music The Waltons: Season 1 1.5 1.0 0.5 0.0 0.5 1.0 Factor vector 1 Figure 3. The first two vectors from a matrix decomposition of the Netflix Prize data. Selected movies are placed at the appropriate spot based on their factor vectors in two dimensions. The plot reveals distinct genres, including clusters of movies with strong female leads, fraternity humor, and quirky independent films. Figure from Koren et al. (2009) 40

Matrix Factorization Comparison of Optimization Algorithms ALS = alternating least squares Figure from Gemulla et al. (2011) 41

SVD FOR COLLABORATIVE FILTERING 42

Singular Value Decomposition for Collaborative Filtering Whiteboard Optimization problem Equivalence to Unconstrained Matrix Factorization (fully specified, no regularization) 43

NON- NEGATIVE MATRIX FACTORIZATION 44

Implicit Feedback Datasets What information does a five- star rating contain? Implicit Feedback Datasets: In many settings, users don t have a way of expressing dislike for an item (e.g. can t provide negative ratings) The only mechanism for feedback is to like something Examples: Facebook has a Like button, but no Dislike button Google s +1 button Pinterest pins Purchasing an item on Amazon indicates a preference for it, but there are many reasons you might not purchase an item (besides dislike) Search engines collect click data but don t have a clear mechanism for observing dislike of a webpage Examples from Aggarwal (2016) 45

Non- negative Matrix Factorization Whiteboard Optimization problem Multiplicative updates 46

Summary Recommender systems solve many real- world (*large- scale) problems Collaborative filtering by Matrix Factorization (MF) is an efficient and effective approach MF is just another example of a common recipe: 1. define a model 2. define an objective function 3. optimize with SGD 55