Scaling Neighbourhood Methods

Similar documents
Scalable Bayesian Matrix Factorization

Large-scale Ordinal Collaborative Filtering

Matrix and Tensor Factorization from a Machine Learning Perspective

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Andriy Mnih and Ruslan Salakhutdinov

* Matrix Factorization and Recommendation Systems

Downloaded 09/30/17 to Redistribution subject to SIAM license or copyright; see

Factorization Machines with libfm

Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

Mixed Membership Matrix Factorization

Mixed Membership Matrix Factorization

Graphical Models for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering

Generative Models for Discrete Data

Collaborative Filtering. Radek Pelánek

Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures

Recommendation Systems

A Modified PMF Model Incorporating Implicit Item Associations

Algorithms for Collaborative Filtering

COT: Contextual Operating Tensor for Context-Aware Recommender Systems

STA 4273H: Statistical Machine Learning

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Decoupled Collaborative Ranking

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

NetBox: A Probabilistic Method for Analyzing Market Basket Data

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

Rating Prediction with Topic Gradient Descent Method for Matrix Factorization in Recommendation

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Collaborative Recommendation with Multiclass Preference Context

Large-scale Collaborative Ranking in Near-Linear Time

STA 4273H: Statistical Machine Learning

ECS289: Scalable Machine Learning

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Approximate Inference using MCMC

SQL-Rank: A Listwise Approach to Collaborative Ranking

Topic Modelling and Latent Dirichlet Allocation

Lecture 13 : Variational Inference: Mean Field Approximation

arxiv: v2 [cs.ir] 14 May 2018

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Click-Through Rate prediction: TOP-5 solution for the Avazu contest

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Using SVD to Recommend Movies

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Matrix Factorization Techniques for Recommender Systems

The Origin of Deep Learning. Lili Mou Jan, 2015

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Content-based Recommendation

Recurrent Latent Variable Networks for Session-Based Recommendation

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

VCMC: Variational Consensus Monte Carlo

Bayesian Methods for Machine Learning

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends

Low Rank Matrix Completion Formulation and Algorithm

Latent Dirichlet Allocation

Collaborative topic models: motivations cont

Pattern Recognition and Machine Learning

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

arxiv: v2 [cs.lg] 5 May 2015

Probabilistic Matrix Factorization

LECTURE 15 Markov chain Monte Carlo

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Language Information Processing, Advanced. Topic Models

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Introduction to Machine Learning

Data Science Mastery Program

Social Relations Model for Collaborative Filtering

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Appendix: Modeling Approach

Bayesian Learning in Undirected Graphical Models

Recommendation Systems

Recommendation Systems

STA 4273H: Statistical Machine Learning

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

NCDREC: A Decomposability Inspired Framework for Top-N Recommendation

Introduction to Restricted Boltzmann Machines

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Matrix Factorization and Factorization Machines for Recommender Systems

Learning to Disentangle Factors of Variation with Manifold Learning

Lecture 16 Deep Neural Generative Models

Nonnegative Matrix Factorization

Collaborative Filtering Applied to Educational Data Mining

Machine Learning. Probabilistic KNN.

Cross-domain recommendation without shared users or items by sharing latent vector distributions

A Bayesian Treatment of Social Links in Recommender Systems ; CU-CS

Stochastic Proximal Gradient Algorithm

Predicting the Next Location: A Recurrent Model with Spatial and Temporal Contexts

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

Variational inference

A Matrix Factorization Technique with Trust Propagation for Recommendation in Social Networks

Click Prediction and Preference Ranking of RSS Feeds

Mixed Membership Matrix Factorization

Latent Dirichlet Allocation (LDA)

Collaborative Filtering on Ordinal User Feedback

Factorization Models for Context-/Time-Aware Movie Recommendations

Circle-based Recommendation in Online Social Networks

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Default Priors and Effcient Posterior Computation in Bayesian

Transcription:

Quick Recap

Scaling Neighbourhood Methods

Collaborative Filtering m = #items n = #users Complexity : m * m * n

Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion) Purchase ~ O(100M) (100 per billion) Browse ~ O(10B) (10000 per billion)

Implicit Signals Used Bought History Browse History Compare History

Category-partitioned v/s Category independent

Similarity Metric for boolean matrix Cosine Similarity A. Pair Count (p) - P1 and P2 B. Individual Count (n_i) - P1, P2 individually

Employing Map Reduce Calculate 'p' by forming pairs and counting Calculate 'n1' by making P1 as the key Calculate 'n2' by making P2 as the key Took 2 hours on 5 years of data Scales Horizontally

Map Reduce 1

Map Reduce 2

Map Reduce 3

Latent Variable Models and Factorization Models Akash Khandelwal, Avijit Saha, Mohit Kumar, Vivek Mehta 1 / 34

Outline Introduction Recommender Systems Recap Latent Variable Models Factorization Models Matrix Factorization Singular Value Decomposition BPMF Factorization Machine Conclusion 2 / 34

Recommender Systems (RSs) Collaborative Filtering (CF) Neighborhood Based - KNN Model Based - Cluster-based CF and Bayesian classifiers. - Latent variable models such as, LDA, plsa, and matrix factorization (MF). Content Based Knowledge Based Hybrid 3 / 34

Latent Variable Models Supplementing a set of observed variables with additional latent, or hidden, variables. Latent variable models are widely used in several domains such as machine learning, statistics, data mining. Reveals hidden structure which explains the data. Latent variable models consider a joint distribution over the hidden and observed variables. Hidden structure is found by calculating the posterior. LDA is an example of latent variable models. 4 / 34

Factorization Models One of the widely used latent variable models in the RSs community. preferences of a user are determined by a small number of unobserved latent factors. Matrix Factorization : Each user and item are mapped to a latent factor vector: u i R K v j R K Tensor Factorization : Mapping of each variable of each category type to a K dimensional latent factor vector. Many problem specific factorization models. 5 / 34

Focus Factorization Models Matrix Factorization (MF) Factorization Machine (FM) 6 / 34

Singular Value Decomposition Singular value decomposition (SVD) is a factorization of a matrix. Formally, the SVD of an R R I J is: R = UΣV, (1) where, U = I I unitary matrix Σ =I J rectangular diagonal matrix V = J J unitary matrix σ i,i of Σ are known as the singular values of R I columns of U and the J columns of V are called the left-singular vectors and right-singular vectors of R, respectively. 7 / 34

Dimensionality Reduction Using SVD Let, R R I J Estimate : Apply SVD: R = UΣV, (2) σ 1,1 σ 2,2... ˆR = U V. σk,k 0... (3) 8 / 34

Example of SVD (a) (b) (c) (d) 9 / 34

MF for RSs Figure 1: Matrix Factorization 10 / 34

MF for RSs Consider a user-movie matrix R R I J where the r ij cell represents the rating provided to the j th movie by the i th user. MF decomposes the matrix R into two low-rank matrices U = [u 1, u 2,..., u I ] T R I K and V = [v 1, v 2,..., v J ] T R J K : R UV T. (4) ( ) 2 r ij u T i v j (5) (i,j) Ω 11 / 34

MF for RSs SVD with K singular value would been the solution if R is fully observed. However, R is partially observed. Solution: Stochastic gradient descent to rank-1 update: Then iterate this for each rank K. e ij = r ij u T i v j u ik = u ik + ν(e ij v jk λu ik ) (6) v jk = v jk + ν(e ij u ik λv jk ) (7) 12 / 34

Problem with SVD approach Learning rate and regularization parameters needs to be tuned manually. Overfitting. Solution: Bayesian Probabilistic Matrix Factorization (BPMF) [1]. 13 / 34

BPMF Model The likelihood term of BPMF is as follows: p(r Θ) = N (r ij u T i v j, τ 1 ), (8) (i,j) Ω where u i is the latent factor vector for the i th user, v j is the latent factor vector for the j th item, τ is the model precision, Ω is the set of all observations, and Θ is the set of all the model parameters. 14 / 34

Priors Independent priors are placed on all the model parameters in Θ as: I p(u) = N (u i µ u, Λ 1 u ), (9) p(v ) = i=1 J N (v j µ v, Λ 1 v ). (10) j=1 15 / 34

Hyperpriors Further place Normal-Wishart priors are placed on all the hyperparameters Θ H = {{µ u, Λ u }, {µ v, Λ v }} as: where p(µ u, Λ u ) = N W (µ u, Λ u µ 0, β 0, W 0, ν 0 ), (11) = N (µ u µ 0, (β 0 Λ u ) 1 )W(Λ u W 0, ν 0 ) p(µ v, Λ v ) = N W (µ v, Λ v µ 0, β 0, W 0, ν 0 ). (12) W(Λ W 0, ν 0 ) = 1 C Λ ν 0 D 1 exp( 1 1 Tr(W 0 2 Λ)) 16 / 34

Joint Distributions The joint distribution of the observations and the hidden variables can be written as: p(r, Θ, Θ H Θ 0 ) = p(r Θ)p(U)p(V )p(µ u, Λ u Θ 0 )p(µ v, Λ v Θ 0 ), (13) where Θ 0 = {µ 0, β 0, W 0, ν 0 } 17 / 34

Inference Evaluation of the joint distribution in Eq. (13) is intractable. However, all the model parameters are conditionally conjugate. So we Gibbs sampler has closed form updates. Replacing Eq. (8)-(12) in Eq. (13), the sampling distribution of u i can be written as follows: p(u i ) N ( u i µ, (Λ ) 1), (14) Λ = Λ u + τ v j v T j (15) j Ωi µ = (Λ ) 1 Λ u µ u + τ v j r ij (16) j Ωi 18 / 34

Results Figure 2: RMSE vs Iterations 19 / 34

Factorization Models Matrix factorization [2]. Tensor factorization [3]. Specific models like SVD++ [4], TimeSVD++ [5], FPMC [6], and BPTF [3], etc. have been developed. Several Learning technique like SGD, ALS, variational Bayes, MCMC Gibbs sampling have been developed. 20 / 34

Factorization Models Vs Feature Based Techniques Advantages of Factorization Model: Problem: Scalability and Performance. deriving inference techniques for each individual model is a time consuming task and requires considerable expertise. Advantages of Feature Based Techniques: Problem: Generic approach. Can be solved using standard tools like LIBSVM or SVMLight. Can not handle very sparse data. 21 / 34

Factorization Machine A Generic framework [7] proposed by Stephen Rendle. Combines advantages of both factorization models and feature based model. Can subsume many state of the art factorization model like SVD++, TimeSVD++, FPMC, PITF, etc. Performs well for sparse data where SVMs fails. 22 / 34

Feature Representation of Factorization Machine Example: (U1,M1,G1,2),(U1,M3,G2,5),(U2,M2,G3,4) and (U3,M1,G1,5) Feature x Target y User Movie Genre x1 1 0 0. 1 0 0. 1 0 0. 2 y1 x2 1 0 0. 0 0 1. 0 1 0. 5 y2 x3 0 1 0. 0 1 0. 0 0 1. 4 y3 x4 0 0 1. 1 0 0. 1 0 0. 5 y4 U1 U2 U3. M1 M2 M3. G1 G2 G3. 23 / 34

Factorization Machine Following is the equation for FM. D D y n = w 0 + w i x ni + D x ni x nj K i=1 i=1 j=i+1 k=1 Assumptions of FM are following: y n x n, θ N (ŷ(x n, θ), α 1 ) y n x n, θ Bernoulli(b(ŷ(x n, θ)) And L2 regularization on θ v ik v jk (17) 24 / 34

Mimic MF ŷ = w 0 + w u + w i + v T u v i (18) Feature representation of FM: D = U I x j = δ(j = u j = i) 25 / 34

Mimic SVD++ ŷ = w 0 + w u + w i + v T u v i + 1 N Feature representation of FM: D = U I L x j = 1 if j = u j = i = 1 N if j N u = 0 else l N u v T i v l (19) 26 / 34

Algorithms of Factorization Machine Stochastic gradient descent is the simplest algorithm to solve FM (SGD-FM). MCMC based Bayesian Factorization Machine gives state-of-the-art performance (MCMC-FM). Alternating least square and adaptive stochastic gradient descent. 27 / 34

Comparison of each Leaning Techniques SGD-FM Pros: SGD online algorithm and more scalable. Cons: Costly cross validation of parameters. MCMC-FM Pros: Performance. No cross validation required. 28 / 34

SGD Learning Following is the equation for FM. ŷ n = w 0 + Cost function is: D D D w i x ni + x ni x nj i=1 K i=1 j=i+1 k=1 v ik v jk (20) N (y n ŷ n ) 2 + L2 regularization (21) n=1 SGD update equations: v ik = v ik + ν 2 (y n ŷ n )x ni N l=1&l i x nl v nl 2λv ik (22) 29 / 34

MCMC-FM Likelihood: Prior: Hyperprior: y n N (y n ŷ n, τ) p(w 0 ) N (w 0 µ 0, σ 0 ) p(w i ) N (w i µ w, σ w ) p(v ik ) N (v ik µ k, σ k ) p(µ w ) N (µ w µ, σ w ν 0 ) p(σ w ) G(σ w α 0, β 0 ) p(µ k ) N (µ k µ, σ k ν 0 ) p(σ k ) G(σ k α 0, β 0 ) 30 / 34

Results 31 / 34

Datasets Yelp challenge Yelp datasets Users information Social information Location Time Ratings Reviews 32 / 34

References R. Salakhutdinov and A. Mnih, Bayesian probabilistic matrix factorization using markov chain monte carlo, in Proc. of ICML, pp. 880 887, 2008. R. Salakhutdinov and A. Mnih, Probabilistic matrix factorization, in Proc. of NIPS, 2007. L. Xiong, X. Chen, T. K. Huang, J. Schneider, and J. G. Carbonell, Temporal collaborative filtering with bayesian probabilistic tensor factorization, in Proc. of SDM, 2010. Y. Koren, Factorization meets the neighborhood: A multifaceted collaborative filtering model, in Proc. of KDD, pp. 426 434, 2008. Y. Koren, Collaborative filtering with temporal dynamics, in Proc. of KDD, pp. 447 456, 2009. S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme, Factorizing personalized markov chains for next-basket recommendation, in Proc. of WWW, pp. 811 820, 2010. S. Rendle, Factorization machines, in Proc. of ICDM, 2010. 33 / 34

THANK YOU 34 / 34