arxiv: v3 [cs.lg] 23 Nov 2016

Similar documents
arxiv: v2 [cs.lg] 5 May 2015

Temporal and spatial approaches for land cover classification.

Multi-Plant Photovoltaic Energy Forecasting Challenge: Second place solution

Scaling Neighbourhood Methods

arxiv: v1 [stat.ml] 12 May 2016

Factorization Machines with libfm

Leveraging Machine Learning for High-Resolution Restoration of Satellite Imagery

NetBox: A Probabilistic Method for Analyzing Market Basket Data

Machine Learning Analyses of Meteor Data

Factorization Models for Context-/Time-Aware Movie Recommendations

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

PCA and Autoencoders

Knowledge Tracing Machines: Families of models for predicting student performance

Abstention Protocol for Accuracy and Speed

Matrix Factorization and Factorization Machines for Recommender Systems

Personalized Ranking for Non-Uniformly Sampled Items

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

SQL-Rank: A Listwise Approach to Collaborative Ranking

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

Click-Through Rate prediction: TOP-5 solution for the Avazu contest

Learning Optimal Ranking with Tensor Factorization for Tag Recommendation

What is Happening Right Now... That Interests Me?

MATDAT18: Materials and Data Science Hackathon. Name Department Institution Jessica Kong Chemistry University of Washington

Term Paper: How and Why to Use Stochastic Gradient Descent?

Training neural networks for financial forecasting: Backpropagation vs Particle Swarm Optimization

On the Interpretability of Conditional Probability Estimates in the Agnostic Setting

Matrix and Tensor Factorization from a Machine Learning Perspective

Electronic Supporting Information Topological design of porous organic molecules

Neural Networks Architecture Evaluation in a Quantum Computer

Planet Labels - How do we use our planet?

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Generalized Linear Models in Collaborative Filtering

Large-scale Ordinal Collaborative Filtering

Matrix Factorization for Speech Enhancement

Scalable Bayesian Matrix Factorization

Content-based Recommendation

Scalable Logit Gaussian Process Classification

Recurrent Latent Variable Networks for Session-Based Recommendation

EXPONENTIAL MACHINES ABSTRACT 1 INTRODUCTION. Under review as a conference paper at ICLR Alexander Novikov 1,2

Pattern Recognition and Machine Learning

Decoupled Collaborative Ranking

Fast, Cheap and Deep Scaling machine learning

On the inconsistency of l 1 -penalised sparse precision matrix estimation

* Matrix Factorization and Recommendation Systems

Recent Advances in Bayesian Inference Techniques

CS 229 Final Report: Data-Driven Prediction of Band Gap of Materials

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

Convex Factorization Machines

Aggregated Temporal Tensor Factorization Model for Point-of-interest Recommendation

Ranking and Filtering

CS Homework 3. October 15, 2009

RaRE: Social Rank Regulated Large-scale Network Embedding

A Generic Coordinate Descent Framework for Learning from Implicit Feedback

Improved Bayesian Compression

SEA surface temperature, SST for short, is an important

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Clustering based tensor decomposition

arxiv: v1 [cs.lg] 3 Jan 2017

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach

Preference Relation-based Markov Random Fields for Recommender Systems

De-biasing of the velocity determination for double station meteor observations from CILBO

A Latent-Feature Plackett-Luce Model for Dyad Ranking Completion

CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February

Deep Convolutional Neural Networks for Pairwise Causality

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

CS249: ADVANCED DATA MINING

The Linearization of Belief Propagation on Pairwise Markov Random Fields

The Origin of Deep Learning. Lili Mou Jan, 2015

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018

An Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University

Collaborative topic models: motivations cont

Probabilistic Graphical Models

STA 4273H: Statistical Machine Learning

Personalized Multi-relational Matrix Factorization Model for Predicting Student Performance

Coupled Hidden Markov Models: Computational Challenges

Approximate Inference Part 1 of 2

arxiv: v1 [cs.ir] 15 Nov 2016

Logistic Regression. COMP 527 Danushka Bollegala

Approximate Inference Part 1 of 2

Collapsed Gibbs Sampling for Latent Dirichlet Allocation on Spark

Prediction of Citations for Academic Papers

Searching for Long-Period Comets with Deep Learning Tools

Maximum Margin Matrix Factorization

Brief Introduction of Machine Learning Techniques for Content Analysis

Online Collaborative Filtering with Implicit Feedback

arxiv: v1 [stat.co] 2 Nov 2017

Learning to Recommend Point-of-Interest with the Weighted Bayesian Personalized Ranking Method in LBSNs

Collaborative Filtering via Different Preference Structures

Mixed Membership Matrix Factorization

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Mini-project in scientific computing

Variable Selection in Data Mining Project

MULTILEVEL IMPUTATION 1

Probabilistic Traffic Models for Occupancy Counting

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Downloaded 09/30/17 to Redistribution subject to SIAM license or copyright; see

arxiv: v2 [cs.ir] 14 May 2018

Transcription:

Journal of Machine Learning Research 17 (2016) 1-5 Submitted 7/15; Revised 5/16; Published 10/16 fastfm: A Library for Factorization Machines Immanuel Bayer University of Konstanz 78457 Konstanz, Germany immanuel.bayer@uni-konstanz.de arxiv:1505.00641v3 [cs.lg] 23 Nov 2016 Editor: Cheng Soon Ong Abstract Factorization Machines (FM) are currently only used in a narrow range of applications and are not yet part of the standard machine learning toolbox, despite their great success in collaborative filtering and click-through rate prediction. However, Factorization Machines are a general model to deal with sparse and high dimensional features. Our Factorization Machine implementation (fastfm) provides easy access to many solvers and supports regression, classification and ranking tasks. Such an implementation simplifies the use of FM for a wide range of applications. Therefore, our implementation has the potential to improve understanding of the FM model and drive new development. Keywords: Python, MCMC, matrix factorization, context-aware recommendation 1. Introduction This work aims to facilitate research for matrix factorization based machine learning (ML) models. Factorization Machines are able to express many different latent factor models and are widely used for collaborative filtering tasks (Rendle, 2012b). An important advantage of FM is that the model equation ŷ F M (x) := w 0 + w 0 R, x, w R p, v i R k w i x i + i=1 i=1 j>i v i, v j x i x j (1) conforms to the standard notation for vector based ML. FM learn a factorized coefficient v i, v j for each feature pair x i x j (eq. 1). This makes it possible to model very sparse feature {}}{{}}{ interactions, as for example, encoding a sample as x = {, 0, 1, 0,, 0, 1, 0, } yields ŷ F M (x) = w 0 + w i + w j + vi T v j which is equivalent to (biased) matrix factorization R i,j b 0 + b i + b j + u T i v j (Srebro et al., 2004). Please refer to Rendle (2012b) for more encoding examples. FM have been the top performing model in various machine learning competitions (Rendle and Schmidt-Thieme, 2009; Rendle, 2012a; Bayer and Rendle, 2013) with different objectives (e.g. What Do You Know? Challenge 1, EMI Music Hackathon 2 ). fastfm includes solvers for regression, classification and ranking problems (see Table 1) and addresses the following needs of the research community: (i) easy interfacing for dynamic x i x j 1. http://www.kaggle.com/c/whatdoyouknow 2. http://www.kaggle.com/c/musichackathon c 2016 Immanuel Bayer.

Bayer and interactive languages such as R, Python and Matlab; (ii) a Python interface allowing interactive work; (iii) a publicly available test suite strongly simplifying modifications or adding of new features; (iv) code is released under the BSD-license allowing the integration in (almost) any open source project. 2. Design Overview The fastfm library has a multi layered software architecture (Figure 1) that separates the interface code from the performance critical parts (fastfm-core). The core contains the solvers, is written in C and can be used stand alone. Two user interfaces are available: a command line interface (CLI) and a Python interface. Cython (Behnel et al., 2011) is used to create a Python extension from the C library. Both, the Python and C interface, serve as reference implementation for bindings to additional languages. 2.1 fastfm-core fastfm (Py) Cython CLI fastfm-core (C) Figure 1: Library Architecture FM are usually applied to very sparse design matrices, often with a sparsity over 95 %, due to their ability to model interaction between very high dimensional categorical features. We use the standard compressed row storage (CRS) matrix format as underlying data structure and rely on the CXSparse 3 library (Davis, 2006) for fast sparse matrix / vector operations. This simplifies the code and makes memory sharing between Python and C straight forward. fastfm contains a test suite that is run on each commit to the GitHub repository via a continuous integration server 4. Solvers are tested using state of the art techniques, such as Posterior Quantiles (Cook et al., 2006) for the MCMC sampler and Finite Differences for the SGD based solvers. 2.2 Solver and Loss Functions fastfm provides a range of solvers for all supported tasks (Table 1). The MCMC solver implements the Bayesian Factorization Machine model (Freudenthaler et al., 2011) via Gibbs sampling. We use the pairwise Bayesian Personalized Ranking (BPR) loss (Rendle et al., 2009) for ranking. More details on the classification and regression solvers can be found in Rendle (2012b). Task Solver Loss Regression ALS, MCMC, SGD Square Loss Classification ALS, MCMC, SGD Probit (MAP), Probit, Sigmoid Ranking SGD BPR (Rendle et al., 2009) Table 1: Supported solvers and tasks 3. CXSparse is LGPL licensed. 4. https://travis-ci.org/ibayer/fastfm-core 2

fastfm: A Library for Factorization Machines 2.3 Python Interface The Python interface is compatible with the API of the widely-used scikit-learn library (Pedregosa et al., 2011) which opens the library to a large user base. The following code snippet shows how to use MCMC sampling for an FM classifier and how to make predictions on new data. fm = mcmc.fmclassification(init std=0.01, rank=8) y pred = fm.fit predict(x train, y train, X test) fastfm provides additional features such as warm starting a solver from a previous solution (see MCMC example). fm = als.fmregression(init std=0.01, rank=8, l2 reg=2) fm.fit(x train, y train) 3. Experiments libfm 5 is the reference implementation for FM and the only one that provides ALS and MCMC solver. Our experiments show, that the ALS and MCMC solver in fastfm compare favorable to libfm with respect to runtime (Figure 2) and are indistinguishable in terms of accuracy. The experiments have been conducted on the MovieLens 10M data set using the original split with a fixed number of 200 iterations for all experiments. The x-axis indicates the number of latent factors (rank), and the y-axis the runtime in seconds. The plots show that the runtime scales linearly with the rank for both implementations. The code snippet 1500 ALS 1500 MCMC runtime [s] 1000 500 1200 900 600 300 2 4 6 8 10 12 2 4 6 8 10 12 rank fastfm libfm Figure 2: A runtime comparison between fastfm and libfm is shown. The evaluation is done on the MovieLens 10M data set. below shows how simple it is to write Python code that allows model inspection after every iteration. The induced Python function call overhead occurs only once per iteration and is therefore neglectable. This feature can be used for Bayesian Model Checking as demonstrated in Figure 3. The figure shows MCMC summary statistics for the first order hyper parameter σ w. Please note that the MCMC solver uses Gaussian priors for the model parameter (Freudenthaler et al., 2011). 5. http://libfm.org 3

Bayer fm = mcmc.fmregression(n iter=0) # initialize coefficients fm.fit predict(x train, y train, X test) for i in range(number of iterations): y pred = fm.fit predict(x train, y train, X test, n more iter=1) # save, or modify (hyper) parameter print(fm.w, fm.v, fm.hyper param ) Many other analyses and experiments can be realized with a few lines of Python code without the need to read or recompile the performance critical C code. Trace of σ w 12 Density of σ w 5.60 9 5.55 6 5.50 3 5.45 0 25 50 75 100 Iterations 0 5.45 5.50 5.55 5.60 Figure 3: MCMC chain analysis and convergence diagnostics example for the hyperparameter σ w evaluated on the MovieLens 10M data set. 4. Related Work Factorization Machines are available in the large scale machine learning libraries GraphLab (Low et al., 2014) and Bidmach (Canny and Zhao, 2013). The toolkit Svdfeatures by Chen et al. (2012) provides a general MF model that is similar to a FM. The implementations in GraphLab, Bidmach and Svdfeatures only support SGD solvers and don t provide a ranking loss. It s not our objective to replace these distributed machine learning frameworks: but to be provide a FM implementation that is easy to use and easy to extend without sacrificing performance. Acknowledgments This work was supported by the DFG under grant Re 3311/2-1. References Immanuel Bayer and Steffen Rendle. Factor models for recommending given names. In ECML/PKDD 2013 Discovery Challenge Workshop, part of the European Conference 4

fastfm: A Library for Factorization Machines on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 81 89, 2013. Stefan Behnel, Robert Bradshaw, Craig Citro, Lisandro Dalcin, Dag Sverre Seljebotn, and Kurt Smith. Cython: The best of both worlds. Computing in Science & Engineering, 13 (2):31 39, 2011. John Canny and Huasha Zhao. Bidmach: Large-scale learning with zero memory allocation. In BigLearn Workshop, NIPS, 2013. Tianqi Chen, Weinan Zhang, Qiuxia Lu, Kailong Chen, Zhao Zheng, and Yong Yu. Svdfeature: a toolkit for feature-based collaborative filtering. The Journal of Machine Learning Research, 13(1):3619 3622, 2012. Samantha R Cook, Andrew Gelman, and Donald B Rubin. Validation of software for bayesian models using posterior quantiles. Journal of Computational and Graphical Statistics, 15(3), 2006. Timothy A Davis. Direct methods for sparse linear systems, volume 2. Siam, 2006. Christoph Freudenthaler, Lars Schmidt-thieme, and Steffen Rendle. Bayesian factorization machines. In Proceedings of the NIPS Workshop on Sparse Representation and Low-rank Approximation, 2011. Yucheng Low, Joseph E Gonzalez, Aapo Kyrola, Danny Bickson, Carlos E Guestrin, and Joseph Hellerstein. Graphlab: A new framework for parallel machine learning. arxiv preprint arxiv:1408.2041, 2014. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825 2830, 2011. Steffen Rendle. Social network and click-through prediction with factorization machines. In KDD-Cup Workshop, 2012a. Steffen Rendle. Factorization machines with libfm. ACM Trans. Intell. Syst. Technol., 3 (3):57:1 57:22, May 2012b. ISSN 2157-6904. Steffen Rendle and Lars Schmidt-Thieme. Factor models for tag recommendation in bibsonomy. In ECML/PKDD 2008 Discovery Challenge Workshop, part of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 235 243, 2009. Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. In UAI 09, pages 452 461, Arlington, Virginia, United States, 2009. Nathan Srebro, Jason Rennie, and Tommi S Jaakkola. Maximum-margin matrix factorization. In Advances in neural information processing systems, pages 1329 1336, 2004. 5