Collaborative topic models: motivations cont

Size: px

Start display at page:

Download "Collaborative topic models: motivations cont"

Tyler Small
6 years ago
Views:

1 Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem. The girl likes A and B --- problem? Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

2 Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: Two articles: " boy article A! girl article B Preferences: The boy likes A and B --- no problem. The girl likes A and B --- problem? Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

3 Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: Two articles: " boy article A! girl article B Preferences: The boy likes A and B --- no problem. The girl likes A and B --- problem? Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

4 Collaborative topic models: motivations cont what the article is about topic proportions θ GAP! We proposed an approach to fill the gap. what the users think of it item latent vector v Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

5 The basic idea 1 What the users think of an article might be different from what the article is actually about, but unlikely entirely irrelevant. 2 We assume the item latent vector v is close to topic proportions θ, but could diverge from θ if it has to. For an article, When there are few ratings, v j is unlikely to be far from θ j. When there are lots of ratings, v j is likely to diverge from θ j.it actually generates or removes some topics to cater the users. Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

6 The proposed model For each article j, 1 Draw topic proportions θ j Dirichlet(α). 2 Draw item latent offset ε j N (,λv 1 I K )andsettheitemlatent vector as v j = θ j + ε j. 3 Everything else is the same, the rating becomes, E[r ij ]=ui T v j = ui T (θ j + ε j ). We call the model Collaborative Topic Regression (CTR). Offset ε j corrects θ j for the popularity (if it has to). Precision parameter λ v penalizes how much v j could diverge from θ j. Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

7 The graphical model item latent vector v N (θ, λ 1 v I K ) topic proportions Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

8 Learning the model We develop a standard EM-style algorithm to learn the maximum a posteriori (MAP) estimates. user latent vector update is the same as matrix factorization u i (VC i V T + λ u I K ) 1 VC i R i { v j (UC j U T + λ v I K ) 1 (UC j R j + λ v θ j ) { item latent vector user rating information relative "weight" topic proportions if U = (no user ratings), v j = θ j Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

9 Make predictions We consider two scenarios, In-matrix prediction: items have been rated before. Out-of-matrix prediction: items have never been rated. Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

10 Outline 1 Overview for Recommender Systems 2 Matrix factorization for recommendation 3 Topic modeling 4 Collaborative topic models 5 Empirical Results Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

11 Experimental settings 1 Data from CiteUlike: 5,551 users, 16,98 articles, and 24,986 bibliography entries. (Sparsity= 99.8%) For each article, we concatenate its title and abstract as its content. These articles were added to CiteULike between 24 and Evaluation: five-fold cross-validation with recall, recall@m = number of articles the user likes in top M total number of article the user likes. 3 Comparison: matrix factorization for collaborative filtering (CF), text-based method (LDA). Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

12 Data statistics (a) (b) #users #articles #articles #users Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

13 Results 1 In-matrix prediction: CTR improves more when number of recommendations gets larger. 2 Out-of-matrix prediction: about the same as LDA. in matrix out of matrix recall number of recommended articles method CF CTR LDA Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

14 When precision parameter λ v varies Recall λ v penalizes how v could diverge from θ, 1 When λ v is small, CTR behaves more like CF. 2 When λ v increases, CTR brings in both ratings and content. 3 When λ v is large, CTR behaves more like LDA. in matrix out of matrix recall λ v method CF CTR LDA Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

15 Recall against #articles a user has 1 Users with few articles tend to have a diversity in the predictions. 2 Recall for users with more articles has a decreasing trend more infrequent ones. 1. CF, in matrix CTR, in matrix LDA, in matrix CTR, out of matrix LDA, out of matrix.8 recall number of articles a user has Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

16 Recall against #users an article appears in 1 In-matrix prediction, articles with high frequencies tend to have high recall and less variance. 2 In out-of-matrix prediction, these frequencies do not have an eﬀect (not used in training). CTR, in matrix LDA, in matrix CTR, out of matrix LDA, out of matrix Wang and Blei (Princeton) Recommending Scientific Articles 3 December 1, number of users an article appears in recall CF, in matrix 48 / 68

17 Interpretation: example user profile I top topics top articles 1. image, measure, measures, images, motion, matching 2. learning, machine, training, vector, learn, machines 3. sets, objects, defined, categories, representations 1. Information theory inference learning algorithms () 2. Machine learning in automated text categorization () 3. Artificial intelligence a modern approach ( ) 4. Data mining: practical machine learning tools... ( ) 5. Statistical learning theory ( ) 6. Modern information retrieval () 7. Pattern recognition and machine learning () 8. Recognition by components: a theory of human... ( ) 9. Data clustering a review () 1. Indexing by latent semantic analysis () Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

18 Interpretation: example user profile II top topics top articles 1. users, user, interface, interfaces, needs, explicit, implicit 2. based, world, real, characteristics, actual, exploring 3. evaluation, collaborative, products, filtering, product 1. Combining collaborative filtering with personal... ( ) 2. An adaptive system for the personalized access... () 3. Implicit interest indicators ( ) 4. Footprints history-rich tools for information foraging () 5. Using social tagging to improve social navigation () 6. User models for adaptive hypermedia and... () 7. Collaborative filtering recommender systems () 8. Knowledge tree: a distributed architecture... () 9. Evaluating collaborative filtering recommender... () 1. Personalizing search via automated analysis... () Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

19 Interpretation: example article profile I Article: Maximum likelihood from incomplete data via the EM algorithm, Dempster et al Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

20 Interpretation: another example article profile II Article: Phase-of-firing coding of natural visual stimuli in primary visual cortex. Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

21 Flexible recommendation design My current simple design on the demo: Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

22 Flexible recommendation design Adaptive design I:!!!!! Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

23 Flexible recommendation design Adaptive design I:!!!!!! a new topic Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

24 See the full demo chongw/citeulike/ Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

25 The demo The entry point of the demo gives three links to, Users, Topics, Articles (ranked by offset and frequency) Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

26 User list page Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

27 Topic list page These topics give an overview of what this entire collection is about. Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

28 Article list page ranked by the offset These articles are sorted according to their offset the divergence from the users view from the word content. Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

29 User can browse his/her interests User s interests are summarized using top topics he/she is interested in. Like we saw in the previous slides. Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

30 User can read the recommendations Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

31 When a user clicks on one recommendation article itself Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

32 When a user clicks on one recommendation the topics How word content is different from the people s view. Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

33 When a user clicks on one topic related users This gives the top users who likes this topic. Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

34 When a user clicks on one topic related documents Related documents based on word content versus based people s view. Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

35 Future work We would like to work on the following directions, incorporating other ways of capturing the popularity of articles, like meta data: e.g., authors. modeling user and item profiles over time. finding new ways of using the user/item profiles and improving user experience. For example, let users choose on what topics to get recommendations. Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

36 The end Thanks a lot! Wang and Blei (Princeton) Recommending Scientific Articles December 1, / 68

Collaborative Topic Modeling for Recommending Scientific Articles

Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao