Content-based Recommendation

Similar documents
Collaborative Topic Modeling for Recommending Scientific Articles

Collaborative topic models: motivations cont

Matrix Factorization Techniques for Recommender Systems

arxiv: v2 [cs.ir] 14 May 2018

CS Lecture 18. Topic Models and LDA

* Matrix Factorization and Recommendation Systems

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Dynamic Poisson Factorization

Generative Clustering, Topic Modeling, & Bayesian Inference

Applying hlda to Practical Topic Modeling

Study Notes on the Latent Dirichlet Allocation

Summarizing Creative Content

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

Topic Modelling and Latent Dirichlet Allocation

Latent Dirichlet Allocation Introduction/Overview

Collaborative Filtering. Radek Pelánek

Rating Prediction with Topic Gradient Descent Method for Matrix Factorization in Recommendation

Sparse Stochastic Inference for Latent Dirichlet Allocation

Content-Based Social Recommendation with Poisson Matrix Factorization

Scaling Neighbourhood Methods

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Large-Scale Social Network Data Mining with Multi-View Information. Hao Wang

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

Latent Dirichlet Allocation (LDA)

LDA with Amortized Inference

Latent variable models for discrete data

TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation

Distributed ML for DOSNs: giving power back to users

Gaussian Mixture Model

Scalable Bayesian Matrix Factorization

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Bayesian Contextual Multi-armed Bandits

Text Mining for Economics and Finance Latent Dirichlet Allocation

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Probabilistic Topic Modeling, Reinforcement Learning, and Crowdsourcing for Personalized Recommendations

Mixed Membership Matrix Factorization

Side Information Aware Bayesian Affinity Estimation

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Andriy Mnih and Ruslan Salakhutdinov

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

Divide and Transfer: Understanding Latent Factors for Recommendation Tasks

Latent Dirichlet Allocation (LDA)

Generative Models for Discrete Data

Recommendation Systems

Predictive Discrete Latent Factor Models for large incomplete dyadic data

Topic Models and Applications to Short Documents

Language Information Processing, Advanced. Topic Models

Mixed Membership Matrix Factorization

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Generalized Linear Models in Collaborative Filtering

Matrix and Tensor Factorization from a Machine Learning Perspective

Document and Topic Models: plsa and LDA

Decoupled Collaborative Ranking

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Data Mining Techniques

Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures

Bayesian nonparametric models for bipartite graphs

Lecture 13 : Variational Inference: Mean Field Approximation

cross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.

Probabilistic Local Matrix Factorization based on User Reviews

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Bayesian Nonparametric Poisson Factorization for Recommendation Systems

A Modified PMF Model Incorporating Implicit Item Associations

Time-aware Collaborative Topic Regression: Towards Higher Relevance in Textual Item Recommendation

Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Prediction in Social Networks

Latent Dirichlet Conditional Naive-Bayes Models

arxiv: v1 [cs.lg] 10 Sep 2014

20: Gaussian Processes

RaRE: Social Rank Regulated Large-scale Network Embedding

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Large-scale Ordinal Collaborative Filtering

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Probabilistic Matrix Factorization

Introduction. Chapter 1

Recommendation Systems

CS145: INTRODUCTION TO DATA MINING

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Modeling User Exposure in Recommendation

Unifying Topic, Sentiment & Preference in an HDP-Based Rating Regression Model for Online Reviews

Matrix Factorization Techniques for Recommender Systems

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Bayesian Machine Learning

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Online Bayesian Passive-Agressive Learning

METHODS FOR IDENTIFYING PUBLIC HEALTH TRENDS. Mark Dredze Department of Computer Science Johns Hopkins University

Service Recommendation for Mashup Composition with Implicit Correlation Regularization

Latent Dirichlet Allocation

Fast Supervised LDA for Discovering Micro-Events in Large-Scale Video Datasets

Latent Dirichlet Bayesian Co-Clustering

Introduction to Probabilistic Machine Learning

Latent Dirichlet Alloca/on

Using Both Latent and Supervised Shared Topics for Multitask Learning

Dimension Reduction Methods

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Non-parametric Clustering with Dirichlet Processes

Mixed Membership Matrix Factorization

arxiv: v2 [stat.ml] 4 Feb 2016

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Transferring User Interests Across Websites with Unstructured Text for Cold-Start Recommendation

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

Transcription:

Content-based Recommendation Suthee Chaidaroon June 13, 2016 Contents 1 Introduction 1 1.1 Matrix Factorization......................... 2 2 slda 2 2.1 Model................................. 3 3 flda 3 4 Collaborate Topic Regression (CTR) 4 4.1 Model................................. 6 5 CRPF 7 5.1 Model................................. 7 6 Revision 8 1 Introduction Many matrix factorization literature uses the dot product between user and item latent factors to predict rating: r ui = q T i p u. This setup is basic idea of Matrix Factorization. We could enhance user and item representation by incorporating additional user or item information. The better and natural integration between the new input sources and the prediction model will improve prediction and be able to solve cold start problem. This writeup summarizes the selected contentbased recommendation techniques using Bayesian method. We start with the classic Matrix factorization by Koren, then we turn our attention to topic model method to derive a better item representation and how each model integrate extra information to their models. The integration methods will be the theme of this writeup. 1

Figure 1: PGM for SLDA model 1.1 Matrix Factorization In a classic Matrix Factorization [1], it adds additional observed variables such as implicit feedback and user attributes to enhance user representation. The model is described as: r ui = µ + b i + b u + qi T [p u + N(u) 0.5 x ui + y ua ] i N(u) a A(u) (1) This model adds additional fixed value to a user-latent factor: a user implicit feedback x ui and user attribute y ua. The author needs to explicit tuning and normalizing user implicit feedback and attributes to achieve an improvement. For example, a user implicit feedback can be adjusted by: N(u) 0.5 Why x i has to be powered by 4.5? dataset. 2 slda i N(u) x 4.5 i This model ties to deeply with the Supervised LDA [2] is an extension of LDA. It ties an observed label, category, or real-value to an item-latent space. Its PGM is here: The connection between the topic assignment variable z d,n to a response variable y d influences the relateness between topic distribution and response variable. Without this connection, the topic distribution learned from the data might not be related to response variable. For an instance, the movie review with praise words should be grouped together given the movie rating. LDA will simply groups words based on its occurrence or theme of the movies which may not relate to the movie rating at all. Figure 2 demonstrates positive words are mapped to high rating where negative words are mapped to lower rating. 2

Figure 2: The word clusters after fitting movie reviews to slda model 2.1 Model The generative process of slda is described as follows: 1. Draw topic proportions θ α Dir(α) 2. For each word (a) Draw topic assignment z n θ Mult(θ) (b) Draw word w n z n, β 1:K Mult(β zn ) 3. Draw response variable y z 1:N, η, σ 2 N(η T z, σ 2 ) This model assumes that the response variable is drawn from Gaussian distribution with mean of η T z. The z component is an average of topic assignment for a particular item. This model implies that if when two items have similar average topic assignment, it should have very similar response values. For example movie reviews that contain a lot of negative words should have similar topic assignments and will end up getting the low rating. We can also think of this model as it attempts to find a weight vector η so that its prediction η T z to be as similar to actual response value as possible. This is basically a linear regression problem. 3 flda While slda model constructs a single regression model for all user, flda [3] generalizes this constraint by incorporating multiple regressions for each user. This model is built for recommendation problem specifically whereas slda is more general idea. If we look at its PGM from figure 3, there are two main components: user and item plates. From the standard probabilistic recommendation model, we will draw user and item latent factors and draw a rating from the dot product of these two latent factors. This model follows this same idea 3

Figure 3: PGM for flda model but adding user-bias, α i, item popularity, β j, user factor, s i, and average latent topic, z j. The rating is drawn from the normal distribution: y ij N (u ij, σ 2 ) where, u ij = x ij b+α i+β j +s i z j. The term z j is an average topic assignment vector such that z j = W j z jn n=1 W J. This idea is similar to slda. The author claims that z j is has more variability than topic distribution θ j which leads to faster convergence. [3]. This model is prone to an overfit because of a large number of latent variables. The regularization is crucial to guarantee for the model to perform well. 4 Collaborate Topic Regression (CTR) The previous work, flda, uses a linear regression to address cold start problem. A user with little rating information will use a linear regression learned from the user content to make a recommendation. The smooth transition between cold start and warm start situation made this model attractive. However, the problem of using latent topic directly lie within its inability to distinguish topics for explaining recommendations from topics important for explaining content [4]. CTR model proposed by Wang can handle this situation seamlessly by adding uncertainty to an item latent feature. 4

Figure 4: Generative process for flda model 5

Figure 5: The graphical model for the CTR model 4.1 Model CTR models item-latent vector as a topic distribution with additive Gaussian noise. Its generative process is summarized as follows: 1. For each user i, draw user latent vector u i N (0, λ 1 u I K ) 2. For each item j, (a) Draw topic proportions θ j Dirichlet(α). (b) Draw item latent offset ɛ j vector as v j = ɛ j + θ j. (c) For each word w jn, i. Draw topic assignment z jn Mult(θ). ii. Draw word w jn Mult(β zjn ). N (0, λ 1 v I K ) and set the item latent (d) For each user-item pair (i, j), draw the rating, r ij N (u T i v j, c 1 ij ). Its PGM in figure 6 shows that each item performs a linear regression to fit its latent-vector to the rating. This is a major difference from flda which roughly set item-latent vector to an inferred topic distribution per item. By allowing each item to diverge its latent vector from topic distribution, CTR is capable to construct different latent vectors for items that have similar topic distribution. For example, if two research paper mentioned one particular algorithms but the first paper is more popular among computer scientist researchers where the second paper is more suit to social behavior researchers, then both latent vectors will be different because of the offset term, ɛ j. The more users have rated the article (item), the higher precision of the offset term. 6

5 CRPF Figure 6: The graphical model for the CTR model Content-based recommendation with Poisson factorization [5] has proposed to exploit the property of Poisson factorization [6]. Two main important properties are: (1) Gamma distribution is more suitable for modelling sparse information; (2) Poisson factorization can be viewed as a resource allocation task where a user allocates his/her attention to rate particular movies. The first property works well on recommendation data because 99 % of its are zero entries. The second property treat any un-rated item as unobserved item rather than negative rated item by users. This is important property because Guassian-based MF assumes that a rating of zero implies negative response from users. Thus, modeling user attention using Gamma distribution is step-forward in recommendation problem. 5.1 Model Its generative process is described below: 1. Document Model: (a) Draw topics β vk Gamma(a, b) (b) Draw document topic intensities θ dk Gamma(c, d) (c) Draw word count w dv Poisson(θ T d β v) 2. Recommendation model: 7

(a) Draw user preferences η uk Gamma(e, f) (b) Draw document topic offsets ɛ dk Gamma(g, h) (c) Draw r ud Poisson(η T u (θ d + ɛ d )) Another interesting contribution is the connection between LDA and PMF. It turns out that PMF is a general case of LDA. In recommendation problem, we now treat each document as a user whose user latent vector is topic preference. Any word in the corpus is an item whose its latent vector is its attribute. Then, the word count observed in the document is a rating for a particular word that is rated by document. CRPF can model user s preference and it can identify group of interested users on the given document. Figure 7 shows how the article on EM algorithm are popular not only machine learning researchers but computer vision and statistical network analysis researchers as well. Figure 7: The topic distribution and user preferences estimated by CRPF 6 Revision Jun 14 - Draft - Summarized all models and their generative process. References [1] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, (8):30 37, 2009. [2] Jon D Mcauliffe and David M Blei. Supervised topic models. In Advances in neural information processing systems, pages 121 128, 2008. 8

[3] Deepak Agarwal and Bee-Chung Chen. flda: matrix factorization through latent dirichlet allocation. In Proceedings of the third ACM international conference on Web search and data mining, pages 91 100. ACM, 2010. [4] Chong Wang and David M Blei. Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 448 456. ACM, 2011. [5] Prem K Gopalan, Laurent Charlin, and David Blei. Content-based recommendations with poisson factorization. In Advances in Neural Information Processing Systems, pages 3176 3184, 2014. [6] Prem Gopalan, Jake M Hofman, and David M Blei. Scalable recommendation with poisson factorization. arxiv preprint arxiv:1311.1704, 2013. 9