Collaborative topic models: motivations cont

Similar documents
Collaborative Topic Modeling for Recommending Scientific Articles

Large-Scale Social Network Data Mining with Multi-View Information. Hao Wang

Content-based Recommendation

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

Probabilistic Matrix Factorization

Generative Clustering, Topic Modeling, & Bayesian Inference

Latent Dirichlet Allocation Introduction/Overview

Recommendation Systems

Time-aware Collaborative Topic Regression: Towards Higher Relevance in Textual Item Recommendation

Collaborative Filtering

Data Mining Techniques

Factor Modeling for Advertisement Targeting

Clustering based tensor decomposition

Andriy Mnih and Ruslan Salakhutdinov

Modeling User Rating Profiles For Collaborative Filtering

Collaborative Filtering. Radek Pelánek

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Recommendation Systems

Decoupled Collaborative Ranking

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS

Latent Dirichlet Allocation (LDA)

Sequential Recommender Systems

Recurrent Latent Variable Networks for Session-Based Recommendation

Machine learning for pervasive systems Classification in high-dimensional spaces

ECE 5984: Introduction to Machine Learning

Scaling Neighbourhood Methods

Algorithms for Collaborative Filtering

Unified Modeling of User Activities on Social Networking Sites

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

Recent Advances in Bayesian Inference Techniques

Mixture Models and Expectation-Maximization

a Short Introduction

Prediction of Citations for Academic Papers

Recommendation Systems

Click Prediction and Preference Ranking of RSS Feeds

Distributed ML for DOSNs: giving power back to users

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

Text mining and natural language analysis. Jefrey Lijffijt

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Lecture 13 : Variational Inference: Mean Field Approximation

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Clustering, K-Means, EM Tutorial

Variable Latent Semantic Indexing

Topic Modeling: Beyond Bag-of-Words

6.034 Introduction to Artificial Intelligence

Topic Models and Applications to Short Documents

Collaborative Recommendation with Multiclass Preference Context

Location Regularization-Based POI Recommendation in Location-Based Social Networks

PROBABILISTIC LATENT SEMANTIC ANALYSIS

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Factor Analysis (10/2/13)

Rating Prediction with Topic Gradient Descent Method for Matrix Factorization in Recommendation

Large-scale Information Processing, Summer Recommender Systems (part 2)

Mixed Membership Stochastic Blockmodels

Nonnegative Matrix Factorization

13: Variational inference II

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

Data Mining Techniques

Mixed Membership Stochastic Blockmodels

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Latent Semantic Analysis. Hongning Wang

Latent Dirichlet Allocation

arxiv: v2 [cs.ir] 14 May 2018

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

CSCI-567: Machine Learning (Spring 2019)

Click Models for Web Search

Latent Dirichlet Allocation (LDA)

Learning to Learn and Collaborative Filtering

CS145: INTRODUCTION TO DATA MINING

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Ontology-Based News Recommendation

Introduction to Machine Learning Midterm Exam

E190Q Lecture 10 Autonomous Robot Navigation

Latent Semantic Analysis. Hongning Wang

Support Vector Machines

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

Linear Dynamical Systems

Computer science research seminar: VideoLectures.Net recommender system challenge: presentation of baseline solution

Notes on Latent Semantic Analysis

Machine Learning for OR & FE

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Recommendation. Tobias Scheffer

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

SQL-Rank: A Listwise Approach to Collaborative Ranking

Hierarchical Bayesian Nonparametrics

Generative Models for Discrete Data

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Linear Classifiers IV

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Click-Through Rate prediction: TOP-5 solution for the Avazu contest

CS6220: DATA MINING TECHNIQUES

Mixed Membership Matrix Factorization

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Collaborative Filtering on Ordinal User Feedback

Transcription:

Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem. The girl likes A and B --- problem? Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 35 / 68

Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: Two articles: " boy article A! girl article B Preferences: The boy likes A and B --- no problem. The girl likes A and B --- problem? Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 35 / 68

Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: Two articles: " boy article A! girl article B Preferences: The boy likes A and B --- no problem. The girl likes A and B --- problem? Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 35 / 68

Collaborative topic models: motivations cont what the article is about topic proportions θ GAP! We proposed an approach to fill the gap. what the users think of it item latent vector v Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 36 / 68

The basic idea 1 What the users think of an article might be different from what the article is actually about, but unlikely entirely irrelevant. 2 We assume the item latent vector v is close to topic proportions θ, but could diverge from θ if it has to. For an article, When there are few ratings, v j is unlikely to be far from θ j. When there are lots of ratings, v j is likely to diverge from θ j.it actually generates or removes some topics to cater the users. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 37 / 68

The proposed model For each article j, 1 Draw topic proportions θ j Dirichlet(α). 2 Draw item latent offset ε j N (,λv 1 I K )andsettheitemlatent vector as v j = θ j + ε j. 3 Everything else is the same, the rating becomes, E[r ij ]=ui T v j = ui T (θ j + ε j ). We call the model Collaborative Topic Regression (CTR). Offset ε j corrects θ j for the popularity (if it has to). Precision parameter λ v penalizes how much v j could diverge from θ j. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 38 / 68

The graphical model item latent vector v N (θ, λ 1 v I K ) topic proportions Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 39 / 68

Learning the model We develop a standard EM-style algorithm to learn the maximum a posteriori (MAP) estimates. user latent vector update is the same as matrix factorization u i (VC i V T + λ u I K ) 1 VC i R i { v j (UC j U T + λ v I K ) 1 (UC j R j + λ v θ j ) { item latent vector user rating information relative "weight" topic proportions if U = (no user ratings), v j = θ j Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 4 / 68

Make predictions We consider two scenarios, In-matrix prediction: items have been rated before. Out-of-matrix prediction: items have never been rated. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 41 / 68

Outline 1 Overview for Recommender Systems 2 Matrix factorization for recommendation 3 Topic modeling 4 Collaborative topic models 5 Empirical Results Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 42 / 68

Experimental settings 1 Data from CiteUlike: 5,551 users, 16,98 articles, and 24,986 bibliography entries. (Sparsity= 99.8%) For each article, we concatenate its title and abstract as its content. These articles were added to CiteULike between 24 and 21. 2 Evaluation: five-fold cross-validation with recall, recall@m = number of articles the user likes in top M total number of article the user likes. 3 Comparison: matrix factorization for collaborative filtering (CF), text-based method (LDA). Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 43 / 68

Data statistics (a) (b) #users 2 1 6 5 4 3 #articles 1 5 35 3 25 2 15 1 2 3 4 #articles 5 1 15 2 25 3 #users Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 44 / 68

Results 1 In-matrix prediction: CTR improves more when number of recommendations gets larger. 2 Out-of-matrix prediction: about the same as LDA. in matrix out of matrix.8.7.6 recall.5.4.3 5 1 15 2 number of recommended articles 5 1 15 2 method CF CTR LDA Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 45 / 68

When precision parameter λ v varies Recall λ v penalizes how v could diverge from θ, 1 When λ v is small, CTR behaves more like CF. 2 When λ v increases, CTR brings in both ratings and content. 3 When λ v is large, CTR behaves more like LDA. in matrix out of matrix recall.6.55.65.7.8.75 1 1 1 1 1 1 1 1 λ v method CF CTR LDA Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 46 / 68

Recall against #articles a user has 1 Users with few articles tend to have a diversity in the predictions. 2 Recall for users with more articles has a decreasing trend more infrequent ones. 1. CF, in matrix CTR, in matrix LDA, in matrix CTR, out of matrix LDA, out of matrix.8 recall.4.6.2. 1 2 3 4 1 2 3 4 1 2 3 4 number of articles a user has 1 2 3 4 1 2 3 4 Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 47 / 68

Recall against #users an article appears in 1 In-matrix prediction, articles with high frequencies tend to have high recall and less variance. 2 In out-of-matrix prediction, these frequencies do not have an effect (not used in training). CTR, in matrix LDA, in matrix CTR, out of matrix LDA, out of matrix 6.. 4. 2 Wang and Blei (Princeton) Recommending Scientific Articles 3 December 1, 211 25 2 15 1 5 3 25 2 15 number of users an article appears in 1 5 3 25 2 15 1 5 3 25 2 15 1 5 3 25 2 15 1 5. recall. 8 1. CF, in matrix 48 / 68

Interpretation: example user profile I top topics top articles 1. image, measure, measures, images, motion, matching 2. learning, machine, training, vector, learn, machines 3. sets, objects, defined, categories, representations 1. Information theory inference learning algorithms () 2. Machine learning in automated text categorization () 3. Artificial intelligence a modern approach ( ) 4. Data mining: practical machine learning tools... ( ) 5. Statistical learning theory ( ) 6. Modern information retrieval () 7. Pattern recognition and machine learning () 8. Recognition by components: a theory of human... ( ) 9. Data clustering a review () 1. Indexing by latent semantic analysis () Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 49 / 68

Interpretation: example user profile II top topics top articles 1. users, user, interface, interfaces, needs, explicit, implicit 2. based, world, real, characteristics, actual, exploring 3. evaluation, collaborative, products, filtering, product 1. Combining collaborative filtering with personal... ( ) 2. An adaptive system for the personalized access... () 3. Implicit interest indicators ( ) 4. Footprints history-rich tools for information foraging () 5. Using social tagging to improve social navigation () 6. User models for adaptive hypermedia and... () 7. Collaborative filtering recommender systems () 8. Knowledge tree: a distributed architecture... () 9. Evaluating collaborative filtering recommender... () 1. Personalizing search via automated analysis... () Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 5 / 68

Interpretation: example article profile I Article: Maximum likelihood from incomplete data via the EM algorithm, Dempster et al. 1977. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 51 / 68

Interpretation: another example article profile II Article: Phase-of-firing coding of natural visual stimuli in primary visual cortex. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 52 / 68

Flexible recommendation design My current simple design on the demo: http://www.cs.princeton.edu/~chongw/citeulike/users/user2832.html Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 53 / 68

Flexible recommendation design Adaptive design I:!!!!! Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 54 / 68

Flexible recommendation design Adaptive design I:!!!!!! a new topic Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 55 / 68

See the full demo http://www.cs.princeton.edu/ chongw/citeulike/ Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 56 / 68

The demo The entry point of the demo gives three links to, Users, Topics, Articles (ranked by offset and frequency) Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 57 / 68

User list page Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 58 / 68

Topic list page These topics give an overview of what this entire collection is about. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 59 / 68

Article list page ranked by the offset These articles are sorted according to their offset the divergence from the users view from the word content. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 6 / 68

User can browse his/her interests User s interests are summarized using top topics he/she is interested in. Like we saw in the previous slides. http://www.cs.princeton.edu/~chongw/citeulike/users/user2832.html Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 61 / 68

User can read the recommendations Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 62 / 68

When a user clicks on one recommendation article itself Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 63 / 68

When a user clicks on one recommendation the topics How word content is different from the people s view. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 64 / 68

When a user clicks on one topic related users This gives the top users who likes this topic. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 65 / 68

When a user clicks on one topic related documents Related documents based on word content versus based people s view. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 66 / 68

Future work We would like to work on the following directions, incorporating other ways of capturing the popularity of articles, like meta data: e.g., authors. modeling user and item profiles over time. finding new ways of using the user/item profiles and improving user experience. For example, let users choose on what topics to get recommendations. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 67 / 68

The end Thanks a lot! Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 68 / 68