Mixed Membership Matrix Factorization

Similar documents
Mixed Membership Matrix Factorization

Mixed Membership Matrix Factorization

Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures

Scaling Neighbourhood Methods

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Andriy Mnih and Ruslan Salakhutdinov

Scalable Bayesian Matrix Factorization

Large-scale Ordinal Collaborative Filtering

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

* Matrix Factorization and Recommendation Systems

Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization

Downloaded 09/30/17 to Redistribution subject to SIAM license or copyright; see

Matrix Factorization Techniques for Recommender Systems

Algorithms for Collaborative Filtering

A Modified PMF Model Incorporating Implicit Item Associations

Restricted Boltzmann Machines for Collaborative Filtering

Mixture-Rank Matrix Approximation for Collaborative Filtering

arxiv: v2 [cs.ir] 14 May 2018

Beyond link prediction and collaborative filtering: Learning to predict affinity

Social Relations Model for Collaborative Filtering

Large-scale Collaborative Ranking in Near-Linear Time

A Bayesian Treatment of Social Links in Recommender Systems ; CU-CS

Content-based Recommendation

arxiv: v1 [stat.ml] 5 Dec 2016

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

Bayesian Nonparametric Poisson Factorization for Recommendation Systems

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Unified Modeling of User Activities on Social Networking Sites

Matrix Factorization Techniques for Recommender Systems

Learning to Learn and Collaborative Filtering

LLORMA: Local Low-Rank Matrix Approximation

Bayesian Methods for Machine Learning

Dynamic Poisson Factorization

Collaborative Filtering via Rating Concentration

Hierarchical Bayesian Matrix Factorization with Side Information

GLAD: Group Anomaly Detection in Social Media Analysis

Probabilistic Matrix Factorization

Matrix and Tensor Factorization from a Machine Learning Perspective

Random function priors for exchangeable arrays with applications to graphs and relational data

Decoupled Collaborative Ranking

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Collaborative Filtering with Aspect-based Opinion Mining: A Tensor Factorization Approach

Non-Parametric Bayes

Bayesian Nonparametrics for Speech and Signal Processing

Collaborative topic models: motivations cont

STA 4273H: Statistical Machine Learning

Generative Models for Discrete Data

Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains

Kernel methods, kernel SVM and ridge regression

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

13: Variational inference II

Latent Dirichlet Bayesian Co-Clustering

Mixed Membership Stochastic Blockmodels

Gaussian Process Factorization Machines for Context-aware Recommendations

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Lecture 13 : Variational Inference: Mean Field Approximation

Modeling User Rating Profiles For Collaborative Filtering

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Recent Advances in Bayesian Inference Techniques

Cross-domain recommendation without shared users or items by sharing latent vector distributions

STA 4273H: Statistical Machine Learning

Online Bayesian Passive-Aggressive Learning

STA 4273H: Sta-s-cal Machine Learning

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Latent Dirichlet Allocation (LDA)

Factor Modeling for Advertisement Targeting

Recommendation Systems

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

STA 4273H: Statistical Machine Learning

Matrix Factorization and Collaborative Filtering

arxiv: v1 [cs.si] 7 Dec 2013

Graphical Models for Collaborative Filtering

TAPER: A Contextual Tensor- Based Approach for Personalized Expert Recommendation

Click Prediction and Preference Ranking of RSS Feeds

Graphical Models for Query-driven Analysis of Multimodal Data

Latent Variable Models and Expectation Maximization

Circle-based Recommendation in Online Social Networks

Recommendation Systems

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Gaussian Mixture Model

Machine Learning. Probabilistic KNN.

Approximate Inference using MCMC

Latent Variable Models and Expectation Maximization

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Offline Components: Collaborative Filtering in Cold-start Situations

Data Mining Techniques

UNIVERSITY OF CALIFORNIA, IRVINE. Networks of Mixture Blocks for Non Parametric Bayesian Models with Applications DISSERTATION

Generative Clustering, Topic Modeling, & Bayesian Inference

Local Expectation Gradients for Doubly Stochastic. Variational Inference

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

Collaborative Filtering

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

Collaborative Filtering on Ordinal User Feedback

Cost-Aware Collaborative Filtering for Travel Tour Recommendations

Probabilistic Local Matrix Factorization based on User Reviews

Bayesian Contextual Multi-armed Bandits

Transcription:

Mixed Membership Matrix Factorization Lester Mackey 1 David Weiss 2 Michael I. Jordan 1 1 University of California, Berkeley 2 University of Pennsylvania International Conference on Machine Learning, 2010 Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 1 / 19

Background DDP Dyadic Data Prediction (DDP) Learning from Pairs Given two sets of objects Set of users and set of items Observe labeled object pairs r uj = 5 User u gave item j a rating of 5 Predict labels of unobserved pairs How will user u rate item k? Examples Rating prediction in collaborative filtering How will user u rate movie j? Click prediction in web search Will user u click on URL j? Link prediction in a social network Is user u friends with user j? Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 2 / 19

Background Prior Models Prior Models for Dyadic Data Latent Factor Modeling / Matrix Factorization Rennie & Srebro (2005); DeCoste (2006); Salakhutdinov & Mnih (2008); Takács et al. (2009); Lawrence & Urtasun (2009) Associate latent factor vector, a u R D, with each user u Associate latent factor vector, b j R D, with each item j Generate expected rating via inner product: r uj = a u b j Pro: State-of-the-art predictive performance Con: Fundamentally static rating mechanism Assumes user u rates according to a u, regardless of context In reality, dyadic interactions are heterogeneous User s ratings may be influenced by instantaneous mood Distinct users may share single account or web browser Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 3 / 19

Background Prior Models Prior Models for Dyadic Data Mixed Membership Modeling Airoldi et al. (2008); Porteous et al. (2008) Each user u maintains distribution over topics, θu U R KU Each item j maintains distribution over topics, θj M R KM Expected rating r uj determined by interaction-specific topics sampled from user and item topic distributions Pro: Context-sensitive clustering User moods: in the mood for comedy vs. romance Item contexts: opening night vs. in high school classroom Con: Purely groupwise interactions Assumes user and item interact only through their topics Relatively poor predictive performance Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 4 / 19

M 3 F Framework Mixed Membership Matrix Factorization (M 3 F) Goal: Leverage the complementary strengths of latent factor models and mixed membership models for improved dyadic data prediction General M 3 F Framework: Users and items endowed both with latent factor vectors (a u and b j ) and with topic distribution parameters (θu U and θj M ) To rate an item User u draws topic i from θu U Item j draws topic k from θj M Expected rating r uj = a u b j }{{} static base rating M 3 F models differ in specification of β ik uj Fully Bayesian framework + β ik uj }{{} context-sensitive bias Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 5 / 19

M 3 F Framework Mixed Membership Matrix Factorization (M 3 F) Goal: Leverage the complementary strengths of latent factor models and mixed membership models for improved dyadic data prediction General M 3 F Framework: M 3 F models differ in specification of β ik uj Specific M 3 F Models: M 3 F Topic-Indexed Bias Model M 3 F Topic-Indexed Factor Model Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 6 / 19

M 3 F Models M 3 F Framework M 3 F Topic-Indexed Bias Model (M 3 F-TIB) Contextual bias decomposes into latent user and latent item bias β ik uj = c k u + d i j Item bias d i j influenced by user topic i Group predisposition toward liking/disliking item j Captures polarizing Napoleon Dynamite effect Certain movies provoke strongly differing reactions from otherwise similar users User bias c k u influenced by item topic k Predisposition of u toward liking/disliking item group Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 7 / 19

M 3 F Models M 3 F Framework M 3 F Topic-Indexed Factor Model (M 3 F-TIF) Contextual bias is an inner product of topic-indexed factor vectors β ik uj = c k u d i j User u maintains latent vector c k u R D for each item topic k Item j maintains latent vector d i j R D for each user topic i Extends globally predictive factor vectors (a u, b j ) with context-specific factors Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 8 / 19

M 3 F Inference M 3 F Inference and Prediction Goal: Predict unobserved labels given labeled pairs Posterior inference over latent topics and parameters intractable Use block Gibbs sampling with closed form conditionals User parameters sampled in parallel (same for items) Interaction-specific topics sampled in parallel Bayes optimal prediction under root mean squared error (RMSE) M 3 F-TIB: 1 T K M K U a (t) u b (t) j + cu k(t) θ M(t) jk + d i(t) j θ U(t) ui T t=1 k=1 i=1 M 3 F-TIF: 1 T T t=1 a (t) u K U b (t) j + i=1 K M k=1 θ U(t) ui θ M(t) jk cu k(t) d i(t) j Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 9 / 19

Experiments Experimental Evaluation The Data The Data Real-world movie rating collaborative filtering datasets 1M MovieLens Dataset 1 1 million ratings in {1,..., 5} 6,040 users, 3,952 movies EachMovie Dataset 2.8 million ratings in {1,..., 6} 1,648 movies, 74,424 users Netflix Prize Dataset 2 100 million ratings in {1,..., 5} 17,770 movies, 480,189 users 1 http://www.grouplens.org/ 2 http://www.netflixprize.com/ Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 10 / 19

Experiments Experimental Evaluation The Setup The Setup Evaluate movie rating prediction performance on each dataset RMSE as primary evaluation metric Performance averaged over standard train-test splits Compare to state-of-the-art latent factor models Bayesian Probabilistic Matrix Factorization 3 (BPMF) M 3 F reduces to BPMF when no topics are sampled Gaussian process matrix factorization model 4 (L&U) Matlab/MEX implementation on dual quad-core CPUs 3 Salakhutdinov & Mnih (2008) 4 Lawrence & Urtasun (2009) Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 11 / 19

1M MovieLens Data Experiments 1M MovieLens Question: How does M 3 F performance vary with number of topics and static factor dimensionality? 3,000 Gibbs samples for M 3 F-TIB and BPMF 512 Gibbs samples for M 3 F-TIF ( D = 2) Method D=10 D=20 D=30 D=40 BPMF 0.8695 0.8622 0.8621 0.8609 M 3 F-TIB (1,1) 0.8671 0.8614 0.8616 0.8605 M 3 F-TIF (1,2) 0.8664 0.8629 0.8622 0.8616 M 3 F-TIF (2,1) 0.8674 0.8605 0.8605 0.8595 M 3 F-TIF (2,2) 0.8642 0.8584* 0.8584 0.8592 M 3 F-TIB (1,2) 0.8669 0.8611 0.8604 0.8603 M 3 F-TIB (2,1) 0.8649 0.8593 0.8581* 0.8577* M 3 F-TIB (2,2) 0.8658 0.8609 0.8605 0.8599 L&U (2009) 0.8801 (RBF) 0.8791 (Linear) Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 12 / 19

EachMovie Data Experiments EachMovie Question: How does M 3 F performance vary with number of topics and static factor dimensionality? 3,000 Gibbs samples for M 3 F-TIB and BPMF 512 Gibbs samples for M 3 F-TIF ( D = 2) Method D=10 D=20 D=30 D=40 BPMF 1.1229 1.1212 1.1203 1.1163 M 3 F-TIB (1,1) 1.1205 1.1188 1.1183 1.1168 M 3 F-TIF (1,2) 1.1351 1.1179 1.1095 1.1072 M 3 F-TIF (2,1) 1.1366 1.1161 1.1088 1.1058 M 3 F-TIF (2,2) 1.1211 1.1043 1.1035 1.1020 M 3 F-TIB (1,2) 1.1217 1.1081 1.1016 1.0978 M 3 F-TIB (2,1) 1.1186 1.1004 1.0952 1.0936 M 3 F-TIB (2,2) 1.1101* 1.0961* 1.0918* 1.0905* L&U (2009) 1.1111 (RBF) 1.0981 (Linear) Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 13 / 19

Netflix Prize Data Experiments Netflix Question: How does performance vary with latent dimensionality? Contrast M 3 F-TIB (K U, K M ) = (4, 1) with BPMF 500 Gibbs samples for M 3 F-TIB and BPMF Method RMSE Time BPMF/15 0.9121 27.8s TIB/15 0.9090 46.3s BPMF/30 0.9047 38.6s TIB/30 0.9015 56.9s BPMF/40 0.9027 48.3s TIB/40 0.8990 70.5s BPMF/60 0.9002 94.3s TIB/60 0.8962 97.0s BPMF/120 0.8956 273.7s TIB/120 0.8934 285.2s BPMF/240 0.8938 1152.0s TIB/240 0.8929 1158.2s Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 14 / 19

Stratification Experiments Netflix Question: Where are improvements over BPMF being realized? Figure: RMSE improvements over BPMF/40 on the Netflix Prize as a function of movie or user rating count. Left: Each bin represents 1/6 of the movie base. Right: Each bin represents 1/8 of the user base. Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 15 / 19

Experiments Netflix The Napolean Dynamite Effect Question: Do M 3 F models capture polarization effects? Table: Top 200 Movies from the Netflix Prize dataset with the highest and lowest cross-topic variance in E(d i j r(v) ). Movie Title E(d i j r(v) ) Napoleon Dynamite -0.11 ± 0.93 Fahrenheit 9/11-0.06 ± 0.90 Chicago -0.12 ± 0.78 The Village -0.14 ± 0.71 Lost in Translation -0.02 ± 0.70 LotR: The Fellowship of the Ring 0.15 ± 0.00 LotR: The Two Towers 0.18 ± 0.00 LotR: The Return of the King 0.24 ± 0.00 Star Wars: Episode V 0.35 ± 0.00 Raiders of the Lost Ark 0.29 ± 0.00 Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 16 / 19

Conclusions Conclusions New framework for dyadic data prediction Strong predictive performance and static specificity of latent factor models Clustered context-sensitivity of mixed membership models Outperforms pure latent factor modeling while fitting fewer parameters Greatest improvements for high-variance, sparsely rated items Future work Modeling user choice: missingness is informative Nonparametric priors on topic parameters Alternative approaches to inference Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 17 / 19

References Conclusions Airoldi, E., Blei, D., Fienberg, S., and Xing, E. Mixed membership stochastic blockmodels. JMLR, 9:1981 2014, 2008. DeCoste, D. Collaborative prediction using ensembles of maximum margin matrix factorizations. In ICML, 2006. Lawrence, N.D. and Urtasun, R. Non-linear matrix factorization with Gaussian processes. In ICML, 2009. Porteous, I., Bart, E., and Welling, M. Multi-HDP: A non parametric Bayesian model for tensor factorization. In AAAI, 2008. Rennie, J. and Srebro, N. Fast maximum margin matrix factorization for collaborative prediction. In ICML, 2005. Salakhutdinov, R. and Mnih, A. Bayesian probabilistic matrix factorization using Markov chain Monte Marlo. In ICML, 2008. Takács, G., Pilászy, I., Németh, B., and Tikk, D. Scalable collaborative filtering approaches for large recommender systems. JMLR, 10:623 656, 2009. Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 18 / 19

The End Conclusions Thanks! Mackey, Weiss, Jordan (UC Berkeley, Penn) Mixed Membership Matrix Factorization ICML 2010 19 / 19