Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Similar documents
Andriy Mnih and Ruslan Salakhutdinov

Recommendation Systems

Collaborative Filtering. Radek Pelánek

Recommendation Systems

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Large-scale Collaborative Ranking in Near-Linear Time

CS425: Algorithms for Web Scale Data

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

* Matrix Factorization and Recommendation Systems

Matrix Factorization and Collaborative Filtering

Collaborative Filtering on Ordinal User Feedback

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Matrix Factorization Techniques for Recommender Systems

Collaborative Filtering

Algorithms for Collaborative Filtering

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

The Pragmatic Theory solution to the Netflix Grand Prize

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer

CS249: ADVANCED DATA MINING

Probabilistic Partial User Model Similarity for Collaborative Filtering

Collaborative Filtering Applied to Educational Data Mining

Collaborative Filtering via Ensembles of Matrix Factorizations

Restricted Boltzmann Machines for Collaborative Filtering

Generative Models for Discrete Data

Lessons Learned from the Netflix Contest. Arthur Dunbar

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

CS 175: Project in Artificial Intelligence. Slides 4: Collaborative Filtering

CS425: Algorithms for Web Scale Data

Mixed Membership Matrix Factorization

Predicting the Performance of Collaborative Filtering Algorithms

Data Mining Techniques

Ranking and Filtering

Impact of Data Characteristics on Recommender Systems Performance

Large-scale Ordinal Collaborative Filtering

A Modified PMF Model Incorporating Implicit Item Associations

Collaborative Filtering with Temporal Dynamics with Using Singular Value Decomposition

Decoupled Collaborative Ranking

Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Prediction in Social Networks

Mixed Membership Matrix Factorization

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

Collaborative Recommendation with Multiclass Preference Context

Matrix Factorization with Content Relationships for Media Personalization

The BellKor Solution to the Netflix Grand Prize

Recommender Systems: Overview and. Package rectools. Norm Matloff. Dept. of Computer Science. University of California at Davis.

Matrix Factorization Techniques for Recommender Systems

Learning to Recommend Point-of-Interest with the Weighted Bayesian Personalized Ranking Method in LBSNs

The BigChaos Solution to the Netflix Prize 2008

Information Retrieval and Organisation

Matrix Factorization and Neighbor Based Algorithms for the Netflix Prize Problem

Collabora've Filtering

Similarity and recommender systems

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

Department of Computer Science, Guiyang University, Guiyang , GuiZhou, China

SQL-Rank: A Listwise Approach to Collaborative Ranking

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS

Matrix Factorization and Factorization Machines for Recommender Systems

Scaling Neighbourhood Methods

NetBox: A Probabilistic Method for Analyzing Market Basket Data

Introduction to Computational Advertising

Click-Through Rate prediction: TOP-5 solution for the Avazu contest

The Normal Distribution. Chapter 6

Using SVD to Recommend Movies

Predicting Neighbor Goodness in Collaborative Filtering

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Collaborative Topic Modeling for Recommending Scientific Articles

Database Privacy: k-anonymity and de-anonymization attacks

Incremental Matrix Factorization for Collaborative Filtering

CS246 Final Exam, Winter 2011

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

Data Science Mastery Program

Learning in Probabilistic Graphs exploiting Language-Constrained Patterns

Facing the information flood in our daily lives, search engines mainly respond

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

The Equivalence between Row and Column Linear Regression: A Surprising Feature of Linear Regression Updated Version 2.

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

LOCAL APPROACHES FOR COLLABORATIVE FILTERING

Introduction of Recruit

Computational Cognitive Science

Classification: Naïve Bayes. Nathan Schneider (slides adapted from Chris Dyer, Noah Smith, et al.) ENLP 19 September 2016

Collaborative Filtering via Different Preference Structures

CSE 258, Winter 2017: Midterm

arxiv: v2 [cs.ir] 14 May 2018

Ordinal Boltzmann Machines for Collaborative Filtering

Sequential Recommender Systems

Rating Prediction with Topic Gradient Descent Method for Matrix Factorization in Recommendation

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

Collaborative Filtering Matrix Completion Alternating Least Squares

Collective Intelligence

Scalable Hierarchical Recommendations Using Spatial Autocorrelation

Content-based Recommendation

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Be able to define the following terms and answer basic questions about them:

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Item Recommendation for Emerging Online Businesses

Uwe Aickelin and Qi Chen, School of Computer Science and IT, University of Nottingham, NG8 1BB, UK {uxa,

Transcription:

Data Mining The art of extracting knowledge from large bodies of structured data. Let s put it to use! 1

Recommendations 2

Basic Recommendations with Collaborative Filtering

Making Recommendations 4

The Netflix Prize (2006-2009) 5

The Netflix Prize (2006-2009) 6

What was the Netflix Prize? In October, 2006 Netflix released a dataset containing 100 million anonymous movie ratings and challenged the data mining, machine learning, and computer science communities to develop systems that could beat the accuracy of its recommendation system, Cinematch. Thus began the Netflix Prize, an open competition for the best collaborative filtering algorithm to predict user ratings for films, solely based on previous ratings without any other information about the users or films. 7

The Netflix Prize Datasets Netflix provided a training dataset of 100,480,507 ratings that 480,189 users gave to 17,770 movies. Each training rating (or instance) is of the form user, movie, data of rating, rating. The user and movie fields are integer IDs, while ratings are from 1 to 5 (integral) stars. 8

The Netflix Prize Datasets The qualifying dataset contained over 2,817,131 instances of the form user, movie, date of rating, with ratings known only to the jury. A participating team s algorithm had to predict grades on the entire qualifying set, consisting of a validation and test set. During the competition, teams were only informed of the score for a validation or quiz set of 1,408,342 ratings. The jury used a test set of 1,408,789 ratings to determine potential prize winners. 9

The Netflix Prize Data Movie Ratings 1 2.. m 1 5 2 5 4 Users 2 2 5 3. 2 2 4 2 n 5 1 5? 10

The Netflix Prize Data Movie Ratings 1 2.. m 1 5 2 5 4 Instances (samples, examples, observations) 2 2 5 3. 2 2 4 2 n 5 1 5? 11

The Netflix Prize Data Features (attributes, dimensions) 1 2.. m 1 5 2 5 4 Users 2 2 5 3. 2 2 4 2 n 5 1 5? 12

The Netflix Prize Goal Movie Ratings Users Star Wars Hoop Dreams Contact Titanic Joe 5 2 5 4 John 2 5 3 Al 2 2 4 2 Everaldo 5 1 5? 13 Goal: Predict? (a movie rating) for a user

The Netflix Prize Methods Bennett, James, and Stan Lanning. "The Netflix Prize." Proceedings of KDD Cup and Workshop. Vol. 2007. 2007. 14

The Netflix Prize Methods 15 We will discuss these methods now. We will discuss these methods by the end of the course.

Raw Averages User average: Simply assign the average rating given by user r u U, where U is the set of all users. Item average: Simply assign i I r u,i, where I is the set of all items and r u,i is the rating given to item i by user u. 16

Raw Averages User average: Simply assign the average rating given by user r u U, where U is the set of all users. Item average: Simply assign i I r u,i, where I is the set of all items and r u,i is the rating given to item i by user u. 17 What about universally good or bad movies? Or skewed rating systems?

Bayesian Method Apply Bayes Theorem: Of the ratings r R a user could give for a movie, assign the highest value: P r i = P i r P r P i where P r i is the (conditional) probability of rating r given item i, P i r is the (conditional) probability of item i given rating r, P r is the (prior) probability of rating r, and P i is the (prior) probability of item i. 18

Bayesian Method Apply Bayes Theorem: Of the ratings r R a user could give for a movie, assign the highest value: 19 P r i = P i r P r P i where P r i is the (conditional) probability of rating r given item i, P i r is the (conditional) probability of item i given rating r, P r is the (prior) probability of rating r, and P i is the (prior) probability of item i. But this method still doesn t account for the similarity between users.

Cute Kitten Picture Intermission 20

Key to Collaborative Filtering Common insight: personal tastes are correlated If Alice and Bob both like X and Alice likes Y, then Bob is more likely to like Y, especially (perhaps) if Bob knows Alice. 21

Collaborative Filtering Collaborative filtering (CF) systems work by collecting user feedback in the form of ratings for items in a given domain and exploiting similarities in rating behavior amongst several users in determining how to recommend an item 22

Collaborative Filtering Dataset Items 1 2.. m 1 5 2 5 4 Users 2 2 5 3. 2 2 4 2 n 5 1 5? 23 Goal: Predict? (an item) for n (a user)

Types of Collaborative Filtering 1 Neighborhood- or Memory-based 2 Model-based 3 Hybrid 24

Types of Collaborative Filtering 1 Neighborhood- or Memory-based We ll talk about this type now. 2 3 25

Neighborhood-based CF A subset of users are chosen based on their similarity to the active users, and a weighted combination of their ratings is used to produce predictions for this user. 26

It has three steps: 1 Neighborhood-based CF Assign a weight to all users with respect to similarity with the active user 2 3 Select k users that have the highest similarity with the active user commonly called the neighborhood. Compute a prediction from a weighted combination of the selected neighbors ratings. 27

Neighborhood-based CF 28 Step 1 In step 1, the weight w a,u is a measure of similarity between the user u and the active user a. The most commonly used measure of similarity is the Pearson correlation coefficient between the ratings of the two users: w a,u = i I i I r a,i ra r u,i ru r a,i ra 2 i I r u,i ru where I is the set of items rated by both users, r u,i is the rating given to item i by user u, and ru is the mean rating given by user u. 2

Neighborhood-based CF Step 2 In step 2, some sort of threshold is used on the similarity score to determine the neighborhood. 29

Neighborhood-based CF Step 3 In step 3, predictions are generally computed as the weighted average of deviations from the neighbor s mean, as in: p a,i = ra = u K r u,i ru w a,u w a,u u K where p a,i is the prediction for the active user a for item i, w a,u is the similarity between users a and u, and K is the neighborhood or set of most similar users. 30

Neighborhood-base CF 31 Common Problems: The search for similar users has high computational complexity, causing conventional neighborhood-based CF algorithms to not scale well. It is common for the active user to have highly correlated neighbors that are based on very few co-rated (overlapping) items, which often result in bad predictors. When measuring the similarity between users, items that have been rated by all (and universally liked or disliked) are not as useful as less common items.

Item-to-Item Matching An extension to neighborhood-based CF. Addresses the problem of high computational complexity of searching for similar users. The idea: Rather than matching similar users, match a user s rated items to similar items. 32

Item-to-Item Matching In this approach, similarities between pairs of items i and j are computed off-line using Pearson correlation, given by: w i,j = u U u U r u,i ri r u,j r j r u,i ri 2 r u,j r j where U is the set of all users who have rated both items i and j, r u,i is the rating of user u on item i, and ri is the average rating of the ith item across users. u U 2 33

Item-to-Item Matching Now, the rating for item i for user a can be predicted using a simple weighted average, as in: p a,i = r u,i w i,j where K is the neighborhood set of the k items rated by a that are most similar to i. j K j K w i,j 34

Significance Weighting Another extension to neighborhood-based CF. Addresses the problem of bad predictors generated by active user to have highly correlated neighbors that are based on very few co-rated (overlapping) items. The idea: Multiply the similarity weight by a significance weighting factor, which devalues the correlations based on a few co-rated items. 35

Inverse User Frequency Yet another extension to neighborhood-based CF. Addresses the problem of the dominance of items that have been rated by all (and universally liked or disliked), yet are not as useful as less common items. The idea: Weight an item rating by the inverse of the frequency that item is rated. 36

Inverse User Frequency When measuring the similarity between users, items that have been rated by all (and universally liked or disliked) are not as useful as less common items. To account for this, compute f i = log n n i where n i is the number of users who have rated item i out of the total number of n users. To apply inverse user frequency while using similarity-based CF, the original rating is transformed for i by multiplying it by the factor f i. 37

And Now Let s run the data mining on some data! 38

References Prem Melville and Vikas Sindhwani. Recommender Systems. In Encyclopedia of Machine Learning, Claude Sammut and Geoffrey Webb (Eds), Springer, 2010. 39