Recommender Systems: Overview and. Package rectools. Norm Matloff. Dept. of Computer Science. University of California at Davis.

Similar documents
a Short Introduction

Generative Models for Discrete Data

Collaborative Filtering

Recommendation Systems

Recommendation Systems

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Similarity and recommender systems

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

PolyanNA, a Novel, Prediction-Oriented R

* Matrix Factorization and Recommendation Systems

Algorithms for Collaborative Filtering

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Large-scale Collaborative Ranking in Near-Linear Time

Impact of Data Characteristics on Recommender Systems Performance

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

SQL-Rank: A Listwise Approach to Collaborative Ranking

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Stat 315c: Introduction

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Machine Learning CSE546

Improving Response Prediction for Dyadic Data

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Recommendation Systems

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.

A Tour of Recommender Systems

CS425: Algorithms for Web Scale Data

Nonlinear Dimensionality Reduction

Collaborative topic models: motivations cont

Matrix Factorization and Factorization Machines for Recommender Systems

Scaling Neighbourhood Methods

Andriy Mnih and Ruslan Salakhutdinov

CENG 793. On Machine Learning and Optimization. Sinan Kalkan

Collaborative Topic Modeling for Recommending Scientific Articles

Collaborative Filtering. Radek Pelánek

Data Science Mastery Program

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Lecture 13 Fundamentals of Bayesian Inference

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Matrix Factorization Techniques for Recommender Systems

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

Lecture #11: Classification & Logistic Regression

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1

Introduction to Logistic Regression

A Tour of Recommender Systems

Chapter 1 - Lecture 3 Measures of Location

Probabilistic Neighborhood Selection in Collaborative Filtering Systems

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Lesson 5: Measuring Variability for Symmetrical Distributions

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Decoupled Collaborative Ranking

Targeted Maximum Likelihood Estimation in Safety Analysis

CS-E4830 Kernel Methods in Machine Learning

Matrix Factorization Techniques for Recommender Systems

Using SVD to Recommend Movies

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Click Prediction and Preference Ranking of RSS Feeds

Lecture 6. Regression

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Structured matrix factorizations. Example: Eigenfaces

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)

Linear Models for Regression CS534

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

Consistency of Nearest Neighbor Methods

Probabilistic Matrix Factorization

Collaborative Filtering Matrix Completion Alternating Least Squares

Quantum Recommendation Systems

CS425: Algorithms for Web Scale Data

Matrix Factorization and Collaborative Filtering

Maximum Margin Matrix Factorization for Collaborative Ranking

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Content-based Recommendation

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

Generative Clustering, Topic Modeling, & Bayesian Inference

Review: Support vector machines. Machine learning techniques and image analysis

Sample questions for Fundamentals of Machine Learning 2018

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Collaborative Filtering via Ensembles of Matrix Factorizations

Discrete Choice Modeling

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Lecture 8: Linear Algebra Background

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

Divide-and-conquer. Curs 2015

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Jan 9

Bayesian Methods for Machine Learning

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Linear Regression (continued)

Machine Learning (CS 567) Lecture 2

Regression - Modeling a response

STA 4273H: Statistical Machine Learning

Latent classes for preference data

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Predicting the Treatment Status

Transcription:

Recommender December 13, 2016

What Are Recommender Systems?

What Are Recommender Systems? Various forms, but here is a common one, say for data on movie ratings:

What Are Recommender Systems? Various forms, but here is a common one, say for data on movie ratings: Input data format: u s e r I D f i l m I D r a t i n g

What Are Recommender Systems? Various forms, but here is a common one, say for data on movie ratings: Input data format: u s e r I D f i l m I D r a t i n g Implies matrix Y ij =, rating user i gives to film j.

What Are Recommender Systems? Various forms, but here is a common one, say for data on movie ratings: Input data format: u s e r I D f i l m I D r a t i n g Implies matrix Y ij =, rating user i gives to film j. Most of the Y ij are NAs; most users haven t rated most films.

What Are Recommender Systems? Various forms, but here is a common one, say for data on movie ratings: Input data format: u s e r I D f i l m I D r a t i n g Implies matrix Y ij =, rating user i gives to film j. Most of the Y ij are NAs; most users haven t rated most films. Want to predict the unknown Y ij.

What Are Recommender Systems? Various forms, but here is a common one, say for data on movie ratings: Input data format: u s e r I D f i l m I D r a t i n g Implies matrix Y ij =, rating user i gives to film j. Most of the Y ij are NAs; most users haven t rated most films. Want to predict the unknown Y ij. Also called the matrix completion problem, but we also want to predict for new users as they come in.

What Are Recommender Systems? Various forms, but here is a common one, say for data on movie ratings: Input data format: u s e r I D f i l m I D r a t i n g Implies matrix Y ij =, rating user i gives to film j. Most of the Y ij are NAs; most users haven t rated most films. Want to predict the unknown Y ij. Also called the matrix completion problem, but we also want to predict for new users as they come in. Above referred to as collaborative filtering. Various other approaches are popular as well.

Applications

Applications Why the interest?

Why the interest? Applications Amazon book RS; no bricks and mortar stores to browse in these days.

Why the interest? Applications Amazon book RS; no bricks and mortar stores to browse in these days. Minnesota class finder: A student considering taking Class Stat 101 would have estimates of (a) how well she would like the class and (b) what grade she would get.

Why the interest? Applications Amazon book RS; no bricks and mortar stores to browse in these days. Minnesota class finder: A student considering taking Class Stat 101 would have estimates of (a) how well she would like the class and (b) what grade she would get. Prediction of adverse reactions to prescription drugs.

Why the interest? Applications Amazon book RS; no bricks and mortar stores to browse in these days. Minnesota class finder: A student considering taking Class Stat 101 would have estimates of (a) how well she would like the class and (b) what grade she would get. Prediction of adverse reactions to prescription drugs. Etc.

Statistical Issues (Part I)

Statistical Issues (Part I) RS research done almost entirely in machine learning (ML) community.

Statistical Issues (Part I) RS research done almost entirely in machine learning (ML) community. Big gap between ML and stat people in general.

Statistical Issues (Part I) RS research done almost entirely in machine learning (ML) community. Big gap between ML and stat people in general. Stat: Sample from a population. Properties of estimators wrt that population. Derive MSE, asymptotics.

Statistical Issues (Part I) RS research done almost entirely in machine learning (ML) community. Big gap between ML and stat people in general. Stat: Sample from a population. Properties of estimators wrt that population. Derive MSE, asymptotics. ML: There is only the given data, no population. Derive inequalities related to various data quantities.

Statistical Issues (Part I) RS research done almost entirely in machine learning (ML) community. Big gap between ML and stat people in general. Stat: Sample from a population. Properties of estimators wrt that population. Derive MSE, asymptotics. ML: There is only the given data, no population. Derive inequalities related to various data quantities. Discussed in (Breiman, 2001) and (Matloff, 2014).

Statistical Issues (Part I) RS research done almost entirely in machine learning (ML) community. Big gap between ML and stat people in general. Stat: Sample from a population. Properties of estimators wrt that population. Derive MSE, asymptotics. ML: There is only the given data, no population. Derive inequalities related to various data quantities. Discussed in (Breiman, 2001) and (Matloff, 2014). ML people in denial.

Statistical Issues (Part I) RS research done almost entirely in machine learning (ML) community. Big gap between ML and stat people in general. Stat: Sample from a population. Properties of estimators wrt that population. Derive MSE, asymptotics. ML: There is only the given data, no population. Derive inequalities related to various data quantities. Discussed in (Breiman, 2001) and (Matloff, 2014). ML people in denial. They talk about predicting new cases, overfitting etc. all implicitly assuming sampling from a population.

3 Broad Categories of Methods

3 Broad Categories of Methods k-nearest Neighbor (knn) Methods:

3 Broad Categories of Methods k-nearest Neighbor (knn) Methods: One of the earliest RS methods.

3 Broad Categories of Methods k-nearest Neighbor (knn) Methods: One of the earliest RS methods. To predict how I might rate Movie X, average the ratings of the k people in the dataset who are most similar to me and who have rated Movie X.

3 Broad Categories of Methods k-nearest Neighbor (knn) Methods: One of the earliest RS methods. To predict how I might rate Movie X, average the ratings of the k people in the dataset who are most similar to me and who have rated Movie X. Matrix Factorization (MF) Methods:

3 Broad Categories of Methods k-nearest Neighbor (knn) Methods: One of the earliest RS methods. To predict how I might rate Movie X, average the ratings of the k people in the dataset who are most similar to me and who have rated Movie X. Matrix Factorization (MF) Methods: Currently the most popular method.

3 Broad Categories of Methods k-nearest Neighbor (knn) Methods: One of the earliest RS methods. To predict how I might rate Movie X, average the ratings of the k people in the dataset who are most similar to me and who have rated Movie X. Matrix Factorization (MF) Methods: Currently the most popular method. Find known factors U, V such that M = (Y ij ) UV

3 Broad Categories of Methods k-nearest Neighbor (knn) Methods: One of the earliest RS methods. To predict how I might rate Movie X, average the ratings of the k people in the dataset who are most similar to me and who have rated Movie X. Matrix Factorization (MF) Methods: Currently the most popular method. Find known factors U, V such that M = (Y ij ) UV Random effects (RE) models: I believe these are little known in the ML RS community. The basic model is Y ij = µ + α i + β j + ɛ ij with α i and β j being random and latent effects, mean 0.

3 Broad Categories of Methods k-nearest Neighbor (knn) Methods: One of the earliest RS methods. To predict how I might rate Movie X, average the ratings of the k people in the dataset who are most similar to me and who have rated Movie X. Matrix Factorization (MF) Methods: Currently the most popular method. Find known factors U, V such that M = (Y ij ) UV Random effects (RE) models: I believe these are little known in the ML RS community. The basic model is Y ij = µ + α i + β j + ɛ ij with α i and β j being random and latent effects, mean 0. Then apply MLE (e.g. lme4 package in R) or, better, Method of Moments (MM).

Predicted Values

Predicted Values knn: Ŷ = average of the neighbors ratings of item j.

Predicted Values knn: Ŷ = average of the neighbors ratings of item j. MF: Ŷij = (UV ) ij

Predicted Values knn: Ŷ = average of the neighbors ratings of item j. MF: Ŷij = (UV ) ij RE: Ŷij = µ + α i + β j

Some Proposals for Novel Variations

Hybrid knn-mf Model Find U, V for M UV. Some Proposals for Novel Variations

Hybrid knn-mf Model Find U, V for M UV. Some Proposals for Novel Variations The rows of V are (or should be) interpreted as forming a kind of basis, consisting of synthetic reprentative users.

Hybrid knn-mf Model Find U, V for M UV. Some Proposals for Novel Variations The rows of V are (or should be) interpreted as forming a kind of basis, consisting of synthetic reprentative users. For each new case C to predict for item j, find the closest row of V to (the known elements of) C, denoted by R.

Hybrid knn-mf Model Find U, V for M UV. Some Proposals for Novel Variations The rows of V are (or should be) interpreted as forming a kind of basis, consisting of synthetic reprentative users. For each new case C to predict for item j, find the closest row of V to (the known elements of) C, denoted by R. Predict the rating to be element j of R j. Should be much, much faster, and maybe more accurate.

Multiplicative Model for Binary R

Multiplicative Model for Binary R The case Y = 0,1.

Multiplicative Model for Binary R The case Y = 0,1. Use of knn straightforward.

Multiplicative Model for Binary R The case Y = 0,1. Use of knn straightforward. MF not designed for binary Y, but still could use them; they are only heuristics anyway.

Multiplicative Model for Binary R The case Y = 0,1. Use of knn straightforward. MF not designed for binary Y, but still could use them; they are only heuristics anyway. For RE, lme4 package offers GLM analysis for binary Y, but MLE is too slow, especially with large data.

Multiplicative Model for Binary R The case Y = 0,1. Use of knn straightforward. MF not designed for binary Y, but still could use them; they are only heuristics anyway. For RE, lme4 package offers GLM analysis for binary Y, but MLE is too slow, especially with large data. Desirable: Alternative using MM.

Multiplicative Model (cont d.)

Multiplicative Model (cont d.) Instead of independent additive model, Y ij = µ + α i + β j + ɛ ij, have an independent multiplicative model, P(Y ij = 1) = να i β j

Multiplicative Model (cont d.) Instead of independent additive model, Y ij = µ + α i + β j + ɛ ij, have an independent multiplicative model, P(Y ij = 1) = να i β j For identifiability, set Eα i = Eβ j = 1.

Multiplicative Model (cont d.) Instead of independent additive model, Y ij = µ + α i + β j + ɛ ij, have an independent multiplicative model, P(Y ij = 1) = να i β j For identifiability, set Eα i = Eβ j = 1. Apply MM, get very simple closed-form expressions.

Our

Our At https://github.com/pooja-rajkumar/, soon to be submitted to CRAN.

Our At https://github.com/pooja-rajkumar/, soon to be submitted to CRAN. knn, MF, RE (MLE/MM)

Our At https://github.com/pooja-rajkumar/, soon to be submitted to CRAN. knn, MF, RE (MLE/MM) Implements (or will implement) the above proposed methods and others.

Our At https://github.com/pooja-rajkumar/, soon to be submitted to CRAN. knn, MF, RE (MLE/MM) Implements (or will implement) the above proposed methods and others. Some parallel computational capability, using Software Alchemy (Matloff, 2016).

Statistical Issues (Part II)

Statistical Issues (Part II) After fitting a model, a statistician s natural inclination is to make plots, to assess the fit.

Statistical Issues (Part II) After fitting a model, a statistician s natural inclination is to make plots, to assess the fit. But to my knowledge, no RS researcher/package has done this.

Statistical Issues (Part II) After fitting a model, a statistician s natural inclination is to make plots, to assess the fit. But to my knowledge, no RS researcher/package has done this. So, does various plots.

Covariate Data

Covariate Data A few months ago, out of the blue, Netflix made an announcement about their RS:

Covariate Data A few months ago, out of the blue, Netflix made an announcement about their RS: They found that demographics (age, gender, income etc.) do NOT enhance predictive ability. (Already incorporated in the ratings.)

Covariate Data A few months ago, out of the blue, Netflix made an announcement about their RS: They found that demographics (age, gender, income etc.) do NOT enhance predictive ability. (Already incorporated in the ratings.) They stated that they found other data that DOES help but said they won t tell us what kind of data!

Covariate Data A few months ago, out of the blue, Netflix made an announcement about their RS: They found that demographics (age, gender, income etc.) do NOT enhance predictive ability. (Already incorporated in the ratings.) They stated that they found other data that DOES help but said they won t tell us what kind of data! (So why did they bother with their announcement?)

Covariate Data A few months ago, out of the blue, Netflix made an announcement about their RS: They found that demographics (age, gender, income etc.) do NOT enhance predictive ability. (Already incorporated in the ratings.) They stated that they found other data that DOES help but said they won t tell us what kind of data! (So why did they bother with their announcement?) But covariate data should be helpful in some settings, e.g. knn with a new case that has covariate data but not much of a ratings history.

Covariate Data A few months ago, out of the blue, Netflix made an announcement about their RS: They found that demographics (age, gender, income etc.) do NOT enhance predictive ability. (Already incorporated in the ratings.) They stated that they found other data that DOES help but said they won t tell us what kind of data! (So why did they bother with their announcement?) But covariate data should be helpful in some settings, e.g. knn with a new case that has covariate data but not much of a ratings history. The package does allow covariates in most of its functions.

Example: Czech Dating Site

Ratings 1:10. Example: Czech Dating Site > dat read. csv ( r a t i n g s. dat, h e ader=false) > i d x s sample ( 1 : nrow ( dat ), 1 0 0 0 00) > dat100k dat [ i d x s, ] > mmout trainmm ( dat100k ) > dat100k [ 1, ] V1 V2 V3 14857272 115894 14800 4 # how would u s e r 1 1 5 8 9 4 l i k e member 2? > z dat100k [ 1, ] > z$v2 2 > p r e d i c t (mmout, z ) [ 1 ] 8.45566

Plotting Example

Plotting Example MLE model assumes the α, β independent, normal.

Plot Example, cont d.

Plot Example, cont d.