Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem. The girl likes A and B --- problem? Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 35 / 68

Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: Two articles: " boy article A! girl article B Preferences: The boy likes A and B --- no problem. The girl likes A and B --- problem? Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 35 / 68

Collaborative topic models: motivations cont what the article is about topic proportions θ GAP! We proposed an approach to fill the gap. what the users think of it item latent vector v Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 36 / 68

The basic idea 1 What the users think of an article might be different from what the article is actually about, but unlikely entirely irrelevant. 2 We assume the item latent vector v is close to topic proportions θ, but could diverge from θ if it has to. For an article, When there are few ratings, v j is unlikely to be far from θ j. When there are lots of ratings, v j is likely to diverge from θ j.it actually generates or removes some topics to cater the users. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 37 / 68

The proposed model For each article j, 1 Draw topic proportions θ j Dirichlet(α). 2 Draw item latent offset ε j N (,λv 1 I K )andsettheitemlatent vector as v j = θ j + ε j. 3 Everything else is the same, the rating becomes, E[r ij ]=ui T v j = ui T (θ j + ε j ). We call the model Collaborative Topic Regression (CTR). Offset ε j corrects θ j for the popularity (if it has to). Precision parameter λ v penalizes how much v j could diverge from θ j. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 38 / 68

The graphical model item latent vector v N (θ, λ 1 v I K ) topic proportions Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 39 / 68

Learning the model We develop a standard EM-style algorithm to learn the maximum a posteriori (MAP) estimates. user latent vector update is the same as matrix factorization u i (VC i V T + λ u I K ) 1 VC i R i { v j (UC j U T + λ v I K ) 1 (UC j R j + λ v θ j ) { item latent vector user rating information relative "weight" topic proportions if U = (no user ratings), v j = θ j Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 4 / 68

Make predictions We consider two scenarios, In-matrix prediction: items have been rated before. Out-of-matrix prediction: items have never been rated. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 41 / 68

Outline 1 Overview for Recommender Systems 2 Matrix factorization for recommendation 3 Topic modeling 4 Collaborative topic models 5 Empirical Results Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 42 / 68

Experimental settings 1 Data from CiteUlike: 5,551 users, 16,98 articles, and 24,986 bibliography entries. (Sparsity= 99.8%) For each article, we concatenate its title and abstract as its content. These articles were added to CiteULike between 24 and 21. 2 Evaluation: five-fold cross-validation with recall, recall@m = number of articles the user likes in top M total number of article the user likes. 3 Comparison: matrix factorization for collaborative filtering (CF), text-based method (LDA). Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 43 / 68

Data statistics (a) (b) #users 2 1 6 5 4 3 #articles 1 5 35 3 25 2 15 1 2 3 4 #articles 5 1 15 2 25 3 #users Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 44 / 68

Results 1 In-matrix prediction: CTR improves more when number of recommendations gets larger. 2 Out-of-matrix prediction: about the same as LDA. in matrix out of matrix.8.7.6 recall.5.4.3 5 1 15 2 number of recommended articles 5 1 15 2 method CF CTR LDA Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 45 / 68

When precision parameter λ v varies Recall λ v penalizes how v could diverge from θ, 1 When λ v is small, CTR behaves more like CF. 2 When λ v increases, CTR brings in both ratings and content. 3 When λ v is large, CTR behaves more like LDA. in matrix out of matrix recall.6.55.65.7.8.75 1 1 1 1 1 1 1 1 λ v method CF CTR LDA Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 46 / 68

Recall against #articles a user has 1 Users with few articles tend to have a diversity in the predictions. 2 Recall for users with more articles has a decreasing trend more infrequent ones. 1. CF, in matrix CTR, in matrix LDA, in matrix CTR, out of matrix LDA, out of matrix.8 recall.4.6.2. 1 2 3 4 1 2 3 4 1 2 3 4 number of articles a user has 1 2 3 4 1 2 3 4 Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 47 / 68

Recall against #users an article appears in 1 In-matrix prediction, articles with high frequencies tend to have high recall and less variance. 2 In out-of-matrix prediction, these frequencies do not have an eﬀect (not used in training). CTR, in matrix LDA, in matrix CTR, out of matrix LDA, out of matrix 6.. 4. 2 Wang and Blei (Princeton) Recommending Scientific Articles 3 December 1, 211 25 2 15 1 5 3 25 2 15 number of users an article appears in 1 5 3 25 2 15 1 5 3 25 2 15 1 5 3 25 2 15 1 5. recall. 8 1. CF, in matrix 48 / 68

Interpretation: example user profile I top topics top articles 1. image, measure, measures, images, motion, matching 2. learning, machine, training, vector, learn, machines 3. sets, objects, defined, categories, representations 1. Information theory inference learning algorithms () 2. Machine learning in automated text categorization () 3. Artificial intelligence a modern approach ( ) 4. Data mining: practical machine learning tools... ( ) 5. Statistical learning theory ( ) 6. Modern information retrieval () 7. Pattern recognition and machine learning () 8. Recognition by components: a theory of human... ( ) 9. Data clustering a review () 1. Indexing by latent semantic analysis () Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 49 / 68

Interpretation: example user profile II top topics top articles 1. users, user, interface, interfaces, needs, explicit, implicit 2. based, world, real, characteristics, actual, exploring 3. evaluation, collaborative, products, filtering, product 1. Combining collaborative filtering with personal... ( ) 2. An adaptive system for the personalized access... () 3. Implicit interest indicators ( ) 4. Footprints history-rich tools for information foraging () 5. Using social tagging to improve social navigation () 6. User models for adaptive hypermedia and... () 7. Collaborative filtering recommender systems () 8. Knowledge tree: a distributed architecture... () 9. Evaluating collaborative filtering recommender... () 1. Personalizing search via automated analysis... () Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 5 / 68

Interpretation: example article profile I Article: Maximum likelihood from incomplete data via the EM algorithm, Dempster et al. 1977. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 51 / 68

Interpretation: another example article profile II Article: Phase-of-firing coding of natural visual stimuli in primary visual cortex. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 52 / 68

Flexible recommendation design My current simple design on the demo: http://www.cs.princeton.edu/~chongw/citeulike/users/user2832.html Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 53 / 68

Flexible recommendation design Adaptive design I:!!!!! Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 54 / 68

Flexible recommendation design Adaptive design I:!!!!!! a new topic Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 55 / 68

See the full demo http://www.cs.princeton.edu/ chongw/citeulike/ Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 56 / 68

The demo The entry point of the demo gives three links to, Users, Topics, Articles (ranked by offset and frequency) Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 57 / 68

User list page Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 58 / 68

Topic list page These topics give an overview of what this entire collection is about. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 59 / 68

Article list page ranked by the offset These articles are sorted according to their offset the divergence from the users view from the word content. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 6 / 68

User can browse his/her interests User s interests are summarized using top topics he/she is interested in. Like we saw in the previous slides. http://www.cs.princeton.edu/~chongw/citeulike/users/user2832.html Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 61 / 68

User can read the recommendations Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 62 / 68

When a user clicks on one recommendation article itself Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 63 / 68

When a user clicks on one recommendation the topics How word content is different from the people s view. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 64 / 68

When a user clicks on one topic related users This gives the top users who likes this topic. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 65 / 68

When a user clicks on one topic related documents Related documents based on word content versus based people s view. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 66 / 68

Future work We would like to work on the following directions, incorporating other ways of capturing the popularity of articles, like meta data: e.g., authors. modeling user and item profiles over time. finding new ways of using the user/item profiles and improving user experience. For example, let users choose on what topics to get recommendations. Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 67 / 68

The end Thanks a lot! Wang and Blei (Princeton) Recommending Scientific Articles December 1, 211 68 / 68