Using SVD to Recommend Movies

Similar documents
INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

Collaborative Filtering Matrix Completion Alternating Least Squares

Collaborative Filtering


Collaborative Filtering. Radek Pelánek

A Simple Algorithm for Nuclear Norm Regularized Problems

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Andriy Mnih and Ruslan Salakhutdinov

Collaborative Filtering: A Machine Learning Perspective

Low Rank Matrix Completion Formulation and Algorithm

Jeffrey D. Ullman Stanford University

Latent Semantic Analysis. Hongning Wang

Restricted Boltzmann Machines for Collaborative Filtering

Graphical Models for Collaborative Filtering

Recommendation Systems

Lecture Notes 10: Matrix Factorization

Matrix Factorization Techniques for Recommender Systems

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Lecture 8: Linear Algebra Background

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Collaborative Filtering Applied to Educational Data Mining

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Algorithms for Collaborative Filtering

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Matrix Factorization Techniques for Recommender Systems

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Singular Value Decompsition

Overview. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Collaborative Filtering

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Singular Value Decomposition

CS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering

Quick Introduction to Nonnegative Matrix Factorization

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Matrix Completion: Fundamental Limits and Efficient Algorithms

Data Mining Techniques

Lecture 9: SVD, Low Rank Approximation

6.034 Introduction to Artificial Intelligence

Latent Semantic Analysis. Hongning Wang

a Short Introduction

Matrix decompositions

The Singular Value Decomposition

Lecture 9: September 28

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

Learning representations

Nonnegative Matrix Factorization

Linear Least Squares. Using SVD Decomposition.

Scaling Neighbourhood Methods

* Matrix Factorization and Recommendation Systems

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Probabilistic Latent Semantic Analysis

Matrix Factorizations: A Tale of Two Norms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Domokos Miklós Kelen. Online Recommendation Systems. Eötvös Loránd University. Faculty of Natural Sciences. Advisor:

Singular Value Decomposition

ENGG5781 Matrix Analysis and Computations Lecture 0: Overview

CS425: Algorithms for Web Scale Data

Problems. Looks for literal term matches. Problems:

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Computational Methods. Eigenvalues and Singular Values

Maths for Signals and Systems Linear Algebra in Engineering

Chapter 7: Symmetric Matrices and Quadratic Forms

Information Retrieval

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Spectral Regularization

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Structured matrix factorizations. Example: Eigenfaces

A Least Squares Formulation for Canonical Correlation Analysis

Recommendation Systems

Review of Some Concepts from Linear Algebra: Part 2

Embeddings Learned By Matrix Factorization

Weighted Low Rank Approximations

Principal Component Analysis (PCA)

MLCC 2015 Dimensionality Reduction and PCA

Regularization via Spectral Filtering

Matrix and Tensor Factorization from a Machine Learning Perspective

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Dimensionality Reduction

STA141C: Big Data & High Performance Statistical Computing

Lecture 2: Linear Algebra Review

Preliminary Examination, Numerical Analysis, August 2016

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

CS 231A Section 1: Linear Algebra & Probability Review

Maximum Margin Matrix Factorization

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

Positive Definite Matrix

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Transcription:

Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 /

Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last update: December 12, 2009 2 /

Collaborative Filtering Introduction Goal: Predict the preferences of a user based on data taken from similar users Last update: December 12, 2009 3 /

Introduction Collaborative Filtering (cont.) Idea: Distill the essence of the data into features that can be used for prediction The Singular Value Decomposition (SVD) can be used to factor the matrix to give an approximation of the missing values Last update: December 12, 2009 4 /

Introduction The Netflix Competition The Netflix competition pitted teams from across the world to find ways to predict movies better for users The Netflix data set includes: Integral ratings from 1 to 5 for 17,000 movies From 480,000 users Totalling over 100 million ratings Essentially a sparse 480000 17000 matrix Total entries: roughly 8.5 billion. Matrix is close to 99% empty! Last update: December 12, 2009 5 /

Singular Value Decomposition Singular Value Decomposition (SVD) Idea: Compress each dimension of the matrix (users movies) to approximate the known values. Each dimension is then represented by a finite number of features which are combined to estimate the missing values. Last update: December 12, 2009 6 /

Singular Value Decomposition SVD Definition The SVD is a factorization of a matrix such that: Where: M is the ratings matrix M = UΣV T Σ is a diagonal matrix of singular values and U and V are the eigenvectors of MM T and M T M, respectively. Intuitively, the singular values in Σ are weighting factors for U and V. The singular values are the square roots of the eigenvalues of MM T corresponding to the eigenvectors in U and V. Last update: December 12, 2009 7 /

Singular Value Decomposition Using SVD for Matrix Approximation If approximation or compression is desired, a lower-rank SVD M can be computed from M: Rank r is chosen and only the r singular values of greatest magnitude are used. Thus the rank of U and V are reduced as well U, Σ, and V are reduced to dimension m r, r r, and r n respectively. This approximation minimizes the Frobenius norm of A = M M: min{m, n} A F = m n σi 2 = a ij 2 i=1 i=1 j=1 Last update: December 12, 2009 8 /

Singular Value Decomposition Incremental SVD An incremental algorithm for SVD was developed by Simon Funk based on a paper by Genevieve Gorrell, in which the SVD is approximated numerically via gradient descent An approximation rank r is chosen beforehand such that there are r feature vectors for movies and users The Σ matrix is left blended into the feature vectors, so only U and V remain Initial values for these feature vectors are set arbitrarily, i.e. 0.1 Gradient descent is used to minimize the function for each feature vector along users and movies Last update: December 12, 2009 9 /

Singular Value Decomposition Incremental SVD (cont.) Based on the gradients, the incremental SVD update function for Gradient Descent is: err = η (M ij M ) ij userval = U if U if = U if + err(v fj ) V fj = V fj + err(userval) For learning rate η, user i, movie j, and feature f {1, r}. Last update: December 12, 2009 10 /

Experiments Experiments The data set was reduced so that experimentation and testing was feasible. The set was reduced randomly to: 2,000 movies 1,000,000 users 3,867,371 total ratings And the data was further split into an 80% training set and a 20% validation set. Last update: December 12, 2009 11 /

Learning Rate Experiments Several learning rates were tested, for η = 0.1, 0.01, and 0.001 Last update: December 12, 2009 12 /

Experiments Number of features with no regularization The rank r, or number of features chosen, was first tested without regularization. The learning rate eta = 0.001 was used based on the above experiments. Last update: December 12, 2009 13 /

Experiments Number of features with regularization Regularization proved to be key in being able to use more features without overfitting. This is a graph of the validation set RMSE: Last update: December 12, 2009 14 /

Experiments Number of features with regularization cont. And the training set RMSE: Last update: December 12, 2009 15 /

Conclusion Conclusion In conclusion, the incremental SVD seems to do a pretty good job of approximating the missing values in the matrix and predicting expected ratings. Learning rate 0.001 seemed to perform the best out of the learning rates tested The ideal number of features for this data set is around 70, with regularization parameter 0.015 Out of the values for regularization tested, the most effective regularization coefficient value was 0.020 on 90 features, getting a validation RMSE of.9101. Last update: December 12, 2009 /