Mixed Membership Matrix Factorization

Similar documents
Mixed Membership Matrix Factorization

Mixed Membership Matrix Factorization

Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures

Scaling Neighbourhood Methods

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Scalable Bayesian Matrix Factorization

Andriy Mnih and Ruslan Salakhutdinov

Large-scale Ordinal Collaborative Filtering

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Downloaded 09/30/17 to Redistribution subject to SIAM license or copyright; see

Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization

* Matrix Factorization and Recommendation Systems

A Modified PMF Model Incorporating Implicit Item Associations

Matrix Factorization Techniques for Recommender Systems

Restricted Boltzmann Machines for Collaborative Filtering

Algorithms for Collaborative Filtering

arxiv: v2 [cs.ir] 14 May 2018

Beyond link prediction and collaborative filtering: Learning to predict affinity

Mixture-Rank Matrix Approximation for Collaborative Filtering

Social Relations Model for Collaborative Filtering

Large-scale Collaborative Ranking in Near-Linear Time

Unified Modeling of User Activities on Social Networking Sites

A Bayesian Treatment of Social Links in Recommender Systems ; CU-CS

Probabilistic Matrix Factorization

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Collaborative Filtering via Rating Concentration

Content-based Recommendation

Matrix Factorization Techniques for Recommender Systems

Bayesian Nonparametric Poisson Factorization for Recommendation Systems

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

arxiv: v1 [stat.ml] 5 Dec 2016

Dynamic Poisson Factorization

LLORMA: Local Low-Rank Matrix Approximation

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

STA 4273H: Statistical Machine Learning

Bayesian Methods for Machine Learning

Hierarchical Bayesian Matrix Factorization with Side Information

Learning to Learn and Collaborative Filtering

GLAD: Group Anomaly Detection in Social Media Analysis

Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains

Recommendation Systems

Collaborative Filtering with Aspect-based Opinion Mining: A Tensor Factorization Approach

Random function priors for exchangeable arrays with applications to graphs and relational data

Bayesian Nonparametrics for Speech and Signal Processing

Latent Dirichlet Bayesian Co-Clustering

Matrix and Tensor Factorization from a Machine Learning Perspective

Collaborative topic models: motivations cont

Recommendation Systems

Generative Models for Discrete Data

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

13: Variational inference II

Latent Variable Models and Expectation Maximization

Decoupled Collaborative Ranking

UNIVERSITY OF CALIFORNIA, IRVINE. Networks of Mixture Blocks for Non Parametric Bayesian Models with Applications DISSERTATION

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Latent Variable Models and Expectation Maximization

Data Mining Techniques

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Text Mining for Economics and Finance Latent Dirichlet Allocation

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Cross-domain recommendation without shared users or items by sharing latent vector distributions

Non-Parametric Bayes

Mixed Membership Stochastic Blockmodels

STA 4273H: Statistical Machine Learning

Recent Advances in Bayesian Inference Techniques

Online Bayesian Passive-Aggressive Learning

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Latent Dirichlet Allocation (LDA)

Gaussian Process Factorization Machines for Context-aware Recommendations

Kernel methods, kernel SVM and ridge regression

Lecture 13 : Variational Inference: Mean Field Approximation

Clustering and Gaussian Mixtures

Factor Modeling for Advertisement Targeting

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

STA 4273H: Statistical Machine Learning

Mixed Membership Stochastic Blockmodels

TAPER: A Contextual Tensor- Based Approach for Personalized Expert Recommendation

Modeling User Rating Profiles For Collaborative Filtering

Graphical Models for Collaborative Filtering

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

arxiv: v1 [cs.si] 7 Dec 2013

Circle-based Recommendation in Online Social Networks

Probabilistic Local Matrix Factorization based on User Reviews

Machine Learning. Probabilistic KNN.

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Offline Components: Collaborative Filtering in Cold-start Situations

PROBABILISTIC LATENT SEMANTIC ANALYSIS

STA 4273H: Statistical Machine Learning

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

STA 4273H: Sta-s-cal Machine Learning

Matrix Factorization and Collaborative Filtering

Cost-Aware Collaborative Filtering for Travel Tour Recommendations

Graphical Models for Query-driven Analysis of Multimodal Data

Local Expectation Gradients for Doubly Stochastic. Variational Inference

2 : Directed GMs: Bayesian Networks

Bayesian Contextual Multi-armed Bandits

Collaborative Filtering

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Classical Predictive Models

Transcription:

Mixed Membership Matrix Factorization Lester Mackey University of California, Berkeley Collaborators: David Weiss, University of Pennsylvania Michael I. Jordan, University of California, Berkeley 2011 Joint Statistical Meetings Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 1 / 23

A Problem Background Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 2 / 23

Background DDP Dyadic Data Prediction (DDP) Learning from Pairs Given two sets of objects Set of users and set of items Observe labeled object pairs User u gave item j a rating r uj of 5 Predict labels of unobserved pairs How will user u rate item k? Examples Movie rating prediction in collaborative filtering How will user u rate movie j? Click prediction in web search Will user u click on URL j? Link prediction in a social network Is user u friends with user j? Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 3 / 23

Background Prior Models Prior Models for Dyadic Data Latent Factor Modeling / Matrix Factorization Rennie & Srebro (2005); DeCoste (2006); Salakhutdinov & Mnih (2008); Takács et al. (2009); Lawrence & Urtasun (2009) Associate latent factor vector, a u R D, with each user u Associate latent factor vector, b j R D, with each item j Generate expected rating via inner product Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 4 / 23

Background Prior Models Prior Models for Dyadic Data Latent Factor Modeling / Matrix Factorization Rennie & Srebro (2005); DeCoste (2006); Salakhutdinov & Mnih (2008); Takács et al. (2009); Lawrence & Urtasun (2009) Associate latent factor vector, a u R D, with each user u Associate latent factor vector, b j R D, with each item j Generate expected rating via inner product: E(r uj ) = a u b j Pro: State-of-the-art predictive performance Con: Fundamentally static rating mechanism Assumes user u rates according to a u, regardless of context In reality, dyadic interactions are heterogeneous User s ratings may be influenced by instantaneous mood Distinct users may share single account or web browser Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 5 / 23

Background Prior Models Prior Models for Dyadic Data Mixed Membership Topic Modeling Airoldi, Blei, Fienberg, and Xing (2008); Porteous, Bart, and Welling (2008) Each user u maintains distribution over topics, θu U R KU Each item j maintains distribution over topics, θj M R KM Expected rating E(r uj ) determined by interaction-specific topics sampled from user and item topic distributions Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 6 / 23

Background Prior Models Prior Models for Dyadic Data Mixed Membership Topic Modeling Airoldi, Blei, Fienberg, and Xing (2008); Porteous, Bart, and Welling (2008) Each user u maintains distribution over topics, θu U R KU Each item j maintains distribution over topics, θj M R KM Expected rating E(r uj ) determined by interaction-specific topics sampled from user and item topic distributions Pro: Context-sensitive clustering User moods: in the mood for comedy vs. romance Item contexts: opening night vs. in high school classroom Multiple raters per account: parent vs. child Con: Purely groupwise interactions Assumes user and item interact only through their topics Relatively poor predictive performance Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 7 / 23

M 3 F Framework Mixed Membership Matrix Factorization (M 3 F) Goal: Leverage the complementary strengths of latent factor models and mixed membership models for improved dyadic data prediction General M 3 F Framework (Mackey, Weiss, and Jordan, 2010): Users and items endowed both with latent factor vectors (a u and b j ) and with topic distribution parameters (θu U and θj M ) To rate an item User u draws topic i from θu U Item j draws topic k from θj M Expected rating E(r uj ) = a u b j }{{} static base rating M 3 F models differ in specification of β ik uj Fully Bayesian framework + β ik uj }{{} context-sensitive bias Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 8 / 23

M 3 F Framework Mixed Membership Matrix Factorization (M 3 F) Goal: Leverage the complementary strengths of latent factor models and mixed membership models for improved dyadic data prediction General M 3 F Framework (Mackey, Weiss, and Jordan, 2010): M 3 F models differ in specification of β ik uj Specific M 3 F Models: M 3 F Topic-Indexed Bias Model M 3 F Topic-Indexed Factor Model Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 9 / 23

M 3 F Models M 3 F Framework M 3 F Topic-Indexed Bias Model (M 3 F-TIB) Contextual bias decomposes into latent user and latent item bias β ik uj = c k u + d i j Item bias d i j influenced by user topic i Group predisposition toward liking/disliking item j Captures polarizing Napoleon Dynamite effect Certain movies provoke strongly differing reactions from otherwise similar users User bias c k u influenced by item topic k Predisposition of u toward liking/disliking item group Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 10 / 23

M 3 F Models M 3 F Framework M 3 F Topic-Indexed Factor Model (M 3 F-TIF) Contextual bias is an inner product of topic-indexed factor vectors β ik uj = c k u d i j User u maintains latent vector c k u R D for each item topic k Item j maintains latent vector d i j R D for each user topic i Extends globally predictive factor vectors (a u, b j ) with context-specific factors Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 11 / 23

M 3 F Inference M 3 F Inference and Prediction Goal: Predict unobserved labels given labeled pairs Posterior inference over latent topics and parameters intractable Use block Gibbs sampling with closed form conditionals User parameters sampled in parallel (same for items) Interaction-specific topics sampled in parallel Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 12 / 23

M 3 F Inference M 3 F Inference and Prediction Goal: Predict unobserved labels given labeled pairs Bayes optimal prediction under root mean squared error (RMSE) M 3 F-TIB: 1 T K M K U a (t) u b (t) j + cu k(t) θ M(t) jk + d i(t) j θ U(t) ui T M 3 F-TIF: 1 T t=1 T t=1 a (t) u k=1 K U b (t) j + i=1 K M k=1 θ U(t) ui i=1 θ M(t) jk cu k(t) d i(t) j Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 13 / 23

Experiments Experimental Evaluation The Data The Data Real-world movie rating collaborative filtering datasets 1M MovieLens Dataset 1 1 million ratings in {1,..., 5} 6,040 users, 3,952 movies EachMovie Dataset 2.8 million ratings in {1,..., 6} 1,648 movies, 74,424 users Netflix Prize Dataset 2 100 million ratings in {1,..., 5} 17,770 movies, 480,189 users 1 http://www.grouplens.org/ 2 http://www.netflixprize.com/ Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 14 / 23

Experiments Experimental Evaluation The Setup The Setup Evaluate movie rating prediction performance on each dataset RMSE as primary evaluation metric Performance averaged over standard train-test splits Compare to state-of-the-art latent factor models Bayesian Probabilistic Matrix Factorization 3 (BPMF) M 3 F reduces to BPMF when no topics are sampled Gaussian process matrix factorization model 4 (L&U) Matlab/MEX implementation on dual quad-core CPUs 3 Salakhutdinov and Mnih (2008) 4 Lawrence and Urtasun (2009) Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 15 / 23

1M MovieLens Data Experiments 1M MovieLens Question: How does M 3 F performance vary with number of topics and static factor dimensionality? 3,000 Gibbs samples for M 3 F-TIB and BPMF 512 Gibbs samples for M 3 F-TIF ( D = 2) Method D=10 D=20 D=30 D=40 BPMF 0.8695 0.8622 0.8621 0.8609 M 3 F-TIB (1,1) 0.8671 0.8614 0.8616 0.8605 M 3 F-TIF (1,2) 0.8664 0.8629 0.8622 0.8616 M 3 F-TIF (2,1) 0.8674 0.8605 0.8605 0.8595 M 3 F-TIF (2,2) 0.8642 0.8584* 0.8584 0.8592 M 3 F-TIB (1,2) 0.8669 0.8611 0.8604 0.8603 M 3 F-TIB (2,1) 0.8649 0.8593 0.8581* 0.8577* M 3 F-TIB (2,2) 0.8658 0.8609 0.8605 0.8599 L&U (2009) 0.8801 (RBF) 0.8791 (Linear) Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 16 / 23

EachMovie Data Experiments EachMovie Question: How does M 3 F performance vary with number of topics and static factor dimensionality? 3,000 Gibbs samples for M 3 F-TIB and BPMF 512 Gibbs samples for M 3 F-TIF ( D = 2) Method D=10 D=20 D=30 D=40 BPMF 1.1229 1.1212 1.1203 1.1163 M 3 F-TIB (1,1) 1.1205 1.1188 1.1183 1.1168 M 3 F-TIF (1,2) 1.1351 1.1179 1.1095 1.1072 M 3 F-TIF (2,1) 1.1366 1.1161 1.1088 1.1058 M 3 F-TIF (2,2) 1.1211 1.1043 1.1035 1.1020 M 3 F-TIB (1,2) 1.1217 1.1081 1.1016 1.0978 M 3 F-TIB (2,1) 1.1186 1.1004 1.0952 1.0936 M 3 F-TIB (2,2) 1.1101* 1.0961* 1.0918* 1.0905* L&U (2009) 1.1111 (RBF) 1.0981 (Linear) Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 17 / 23

Netflix Prize Data Experiments Netflix Question: How does performance vary with latent dimensionality? Contrast M 3 F-TIB (K U, K M ) = (4, 1) with BPMF 500 Gibbs samples for M 3 F-TIB and BPMF Method RMSE Time BPMF/15 0.9121 27.8s TIB/15 0.9090 46.3s BPMF/30 0.9047 38.6s TIB/30 0.9015 56.9s BPMF/40 0.9027 48.3s TIB/40 0.8990 70.5s BPMF/60 0.9002 94.3s TIB/60 0.8962 97.0s BPMF/120 0.8956 273.7s TIB/120 0.8934 285.2s BPMF/240 0.8938 1152.0s TIB/240 0.8929 1158.2s Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 18 / 23

Stratification Experiments Netflix Question: Where are improvements over BPMF being realized? Figure: RMSE improvements over BPMF/40 on the Netflix Prize as a function of movie or user rating count. Left: Each bin represents 1/6 of the movie base. Right: Each bin represents 1/8 of the user base. Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 19 / 23

Experiments Netflix The Napolean Dynamite Effect Question: Do M 3 F models capture polarization effects? Table: Top 200 Movies from the Netflix Prize dataset with the highest and lowest cross-topic variance in E(d i j r(v) ). Movie Title E(d i j r(v) ) Napoleon Dynamite -0.11 ± 0.93 Fahrenheit 9/11-0.06 ± 0.90 Chicago -0.12 ± 0.78 The Village -0.14 ± 0.71 Lost in Translation -0.02 ± 0.70 LotR: The Fellowship of the Ring 0.15 ± 0.00 LotR: The Two Towers 0.18 ± 0.00 LotR: The Return of the King 0.24 ± 0.00 Star Wars: Episode V 0.35 ± 0.00 Raiders of the Lost Ark 0.29 ± 0.00 Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 20 / 23

Conclusions Conclusions New framework for dyadic data prediction Strong predictive performance and static specificity of latent factor models Clustered context-sensitivity of mixed membership topic models Outperforms pure latent factor modeling while fitting fewer parameters Greatest improvements for high-variance, sparsely rated items Future work Modeling user choice: missingness is informative Nonparametric priors on topic parameters Alternative approaches to inference Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 21 / 23

References Conclusions Airoldi, E., Blei, D., Fienberg, S., and Xing, E. Mixed membership stochastic blockmodels. JMLR, 9:1981 2014, 2008. DeCoste, D. Collaborative prediction using ensembles of maximum margin matrix factorizations. In ICML, 2006. Lawrence, N.D. and Urtasun, R. Non-linear matrix factorization with Gaussian processes. In ICML, 2009. Mackey, L., Weiss, D., and Jordan, M. I. Mixed membership matrix factorization. In ICML, June 2010. Porteous, I., Bart, E., and Welling, M. Multi-HDP: A non parametric Bayesian model for tensor factorization. In AAAI, 2008. Rennie, J. and Srebro, N. Fast maximum margin matrix factorization for collaborative prediction. In ICML, 2005. Salakhutdinov, R. and Mnih, A. Bayesian probabilistic matrix factorization using Markov chain Monte Marlo. In ICML, 2008. Takács, G., Pilászy, I., Németh, B., and Tikk, D. Scalable collaborative filtering approaches for large recommender systems. JMLR, 10:623 656, 2009. Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 22 / 23

The End Conclusions Thanks! Mackey (UC Berkeley) Mixed Membership Matrix Factorization JSM 2011 23 / 23