arxiv: v2 [cs.lg] 5 May 2015

Similar documents
arxiv: v3 [cs.lg] 23 Nov 2016

Temporal and spatial approaches for land cover classification.

Multi-Plant Photovoltaic Energy Forecasting Challenge: Second place solution

arxiv: v1 [stat.ml] 12 May 2016

Scaling Neighbourhood Methods

Leveraging Machine Learning for High-Resolution Restoration of Satellite Imagery

Factorization Machines with libfm

Factorization Models for Context-/Time-Aware Movie Recommendations

Machine Learning Analyses of Meteor Data

PCA and Autoencoders

NetBox: A Probabilistic Method for Analyzing Market Basket Data

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

SQL-Rank: A Listwise Approach to Collaborative Ranking

Personalized Ranking for Non-Uniformly Sampled Items

Knowledge Tracing Machines: Families of models for predicting student performance

Term Paper: How and Why to Use Stochastic Gradient Descent?

MATDAT18: Materials and Data Science Hackathon. Name Department Institution Jessica Kong Chemistry University of Washington

Abstention Protocol for Accuracy and Speed

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

Matrix Factorization and Factorization Machines for Recommender Systems

Fast, Cheap and Deep Scaling machine learning

Learning Optimal Ranking with Tensor Factorization for Tag Recommendation

On the Interpretability of Conditional Probability Estimates in the Agnostic Setting

Training neural networks for financial forecasting: Backpropagation vs Particle Swarm Optimization

Scalable Bayesian Matrix Factorization

Click-Through Rate prediction: TOP-5 solution for the Avazu contest

Neural Networks Architecture Evaluation in a Quantum Computer

* Matrix Factorization and Recommendation Systems

Convex Factorization Machines

Decoupled Collaborative Ranking

Matrix Factorization for Speech Enhancement

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

What is Happening Right Now... That Interests Me?

EXPONENTIAL MACHINES ABSTRACT 1 INTRODUCTION. Under review as a conference paper at ICLR Alexander Novikov 1,2

SEA surface temperature, SST for short, is an important

Matrix Factorization Techniques for Recommender Systems

CS249: ADVANCED DATA MINING

Clustering based tensor decomposition

Planet Labels - How do we use our planet?

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

Generalized Linear Models in Collaborative Filtering

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Recent Advances in Bayesian Inference Techniques

Recurrent Latent Variable Networks for Session-Based Recommendation

Collaborative topic models: motivations cont

Electronic Supporting Information Topological design of porous organic molecules

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

Matrix and Tensor Factorization from a Machine Learning Perspective

Logistic Regression. COMP 527 Danushka Bollegala

CS 229 Final Report: Data-Driven Prediction of Band Gap of Materials

arxiv: v2 [cs.ir] 14 May 2018

A Latent-Feature Plackett-Luce Model for Dyad Ranking Completion

On the inconsistency of l 1 -penalised sparse precision matrix estimation

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach

Aggregated Temporal Tensor Factorization Model for Point-of-interest Recommendation

LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING

Scalable Logit Gaussian Process Classification

Large-scale Ordinal Collaborative Filtering

Collaborative Filtering with Aspect-based Opinion Mining: A Tensor Factorization Approach

A Generic Coordinate Descent Framework for Learning from Implicit Feedback

Preference Relation-based Markov Random Fields for Recommender Systems

Ranking and Filtering

Pairwise Interaction Tensor Factorization for Personalized Tag Recommendation

Probabilistic Graphical Models

Content-based Recommendation

Searching for Long-Period Comets with Deep Learning Tools

Click Prediction and Preference Ranking of RSS Feeds

Multi-way Interacting Regression via Factorization Machines

Bayesian Deep Learning

AquaCat: Radar and Machine Learning for Fluid and Powder Identification

Polynomial Networks and Factorization Machines: New Insights and Efficient Training Algorithms

De-biasing of the velocity determination for double station meteor observations from CILBO

IMPROVING INFRASTRUCTURE FOR TRANSPORTATION SYSTEMS USING CLUSTERING

CS Homework 3. October 15, 2009

Collaborative Filtering via Different Preference Structures

Online Collaborative Filtering with Implicit Feedback

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Predicting Correctness of Protein Complex Binding Orientations

arxiv: v1 [cs.ir] 15 Nov 2016

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Personalized Multi-relational Matrix Factorization Model for Predicting Student Performance

Mixed Membership Matrix Factorization

6.034 Introduction to Artificial Intelligence

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

From safe screening rules to working sets for faster Lasso-type solvers

arxiv: v1 [cs.lg] 3 Jan 2017

Multiverse Recommendation: N-dimensional Tensor Factorization for Context-aware Collaborative Filtering

The Linearization of Belief Propagation on Pairwise Markov Random Fields

A Spectral Regularization Framework for Multi-Task Structure Learning

Latent Dirichlet Allocation (LDA)

Maximum Margin Matrix Factorization

Bayesian leave-one-out cross-validation for large data sets

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Pattern Recognition and Machine Learning

Optimizing Factorization Machines for Top-N Context-Aware Recommendations

Probabilistic Traffic Models for Occupancy Counting

LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates

Yo Home to Bel-Air: Predicting Crime on The Streets of Philadelphia

Deep Convolutional Neural Networks for Pairwise Causality

Transcription:

fastfm: A Library for Factorization Machines Immanuel Bayer University of Konstanz 78457 Konstanz, Germany immanuel.bayer@uni-konstanz.de arxiv:505.0064v [cs.lg] 5 May 05 Editor: Abstract Factorization Machines (FM) are only used in a narrow range of applications and are not part of the standard toolbox of machine learning models. This is a pity, because even though FMs are recognized as being very successful for recommender system type applications they are a general model to deal with sparse and high dimensional features. Our Factorization Machine implementation provides easy access to many solvers and supports regression, classification and ing tasks. Such an implementation simplifies the use of FM s for a wide field of applications. This implementation has the potential to improve our understanding of the FM model and drive new development. Keywords: Matrix Factorization, Recommender Systems, MCMC. Introduction This work aims to facilitate research for matrix factorization based machine learning models. Factorization Machines are able to express many different latent factor models that are widely used for collaborative filtering tasks Rendle (0b). An important advantage of FM s is that side information can be easily included. ŷ F M (x) := w 0 + w 0 R, x, w R p, v i R k w i x i + i= x i i= j>i v i, v j x i x j () {}}{{}}{ For example, encoding a sample as x = {,,,, } yields ŷ F M (x) = w 0 +w i + w j + vi T v j which is equivalent to (biased) matrix factorization R i,j b 0 + b i + b j + u T i v j Srebro et al. (004). Please refer to Rendle (0b) for more encoding examples. FM s are, however, more flexible than standard matrix factorization by learning all higher order interactions between features (implemented up to second order [eq. ]). It has been the top performing model in machine learning competitions Rendle and Schmidt-Thieme (009), Rendle (0a), Bayer and Rendle (0) with different objectives (e.g. What Do You Know? Challenge, EMI Music Hackathon ). This library includes solvers for regression, classification and ing problems (see table ) and addresses the following needs of the research community: x j. http://www.kaggle.com/c/whatdoyouknow. http://www.kaggle.com/c/musichackathon

Bayer (i) Easy interfacing for dynamic and interactive languages such as R, Python and Matlab. (ii) A Python interface that allows interactive work. (iii) A publicly available testsuite that strongly simplifies modifications or adding of new features. (iv) Code is released under the BSD-license which allows the integration in (almost) any open source project.. Design Overview The fastfm library has a multi layered software architecture (table ) that clearly separates the interface code from the performance critical parts (fastfm-core). The core contains the solvers, is written in C and can be used stand alone. Two user interfaces are available, a command line interface (CLI) and a Python interface. They serve as reference implementation for bindings to additional languages.. fastfm-core fastfm (Py) Cython CLI fastfm-core (C) Table : Library Architecture The strength of FM s is the ability to perform well on problems with very high dimensional categorical features as they frequently occur for example in click through rate prediction. This often leads to design matrices with more then 95% sparsity. We use the standard compressed row storage (CRS) matrix format as underlying data structure and rely on CXSparse for fast sparse matrix / vector operations Davis (006). This simplifies the code and makes memory sharing between Python and C easy, which is critical as memory copies would reduce performance for important operations. A test suite that provides high test coverage and routines that check that the code is Valgrind-clean ensures that the code is easy to extend. Solvers are tested using state of the art techniques such as Posterior Quantiles (Cook et al., 006) for the MCMC sampler or Finite Differences for SGD based solvers.. Solver and Loss Functions Table lists the supported tasks and the solvers available for each. The MCMC solver implements the Bayesian Factorization Machine model (Freudenthaler et al.) via Gibbs sampling. We use the Bayesian Personalized Ranking (BPR) pairwise loss (Rendle et al., 009) for ing. More details on the classification and regression solver can be found in Rendle (0b). Task Solver Loss Regression ALS, MCMC, SGD Square Loss Classification ALS, MCMC, SGD Probit (MAP), Probit, Sigmoid Ranking SGD BPR Rendle et al. (009) Table : Supported Solvers and Tasks.. CXSparse is LGPL licensed.

. Python Interface The Python Interface is compatible to the API of the widely-used SCIKIT-LEARN library Pedregosa et al. (0) which opens the library to a large user base. The following code snippet shows how to use MCMC sampling for a FM classifier and make predictions on new data. fm = mcmc.fmclassification(init std=0.0, =8) y pred = fm.fit predict(x train, y train, X test) Additional features are available such as warm starting a solver from a previous solution. fm = als.fmregression(init std=0.0, =8, l reg=) fm.fit(x train, y train) fm.fit(x train, y train, n more iter=0). Experiments We first show that the ALS and MCMC solver are comparable with libfm 4 with respect to runtime and convergence (figure ). A fixed number of 50 iterations is used for all comparisons. The x-axis indicates the number of latent factors () and the upper plots compare the implementations with respect to decrease in rmse error. The lower plots show how the runtime scales with the. The figure indicates that fastfm has competitive performance, with both the CLI and the high-level Python interface. Figure () compares MCMC chain traces using warm-start (dashed) and retraining the model (solid) line. Many other analysis are simplified by interactive access to solvers and model parameter. The experimental details behind figures and can be found in the on-line documentation 5. 4. Related Work Factorization Machines are available in the large scale machine learning libraries GraphLab (Low et al., 04) and Bidmach (Canny and Zhao, 0). The toolkit Svdfeatures by Chen et al. (0) provides a general MF model that is similar to a FM. The implementations in GraphLab, Bidmach and Svdfeatures only support SGD solvers and can t be used for ing problems. libfm contains all the solvers available in our fastfm implementation but lack s an interactive user interface and a publicly available test-suit. fastfm s main objective is to be easy to use and extend without sacrificing performance. Acknowledgments This work was supported by the DFG under grant Re /-. 4. http://libfm.org 5. Source code and on-line documentation is available at https://github.com/ibayer/fastfm.

Bayer rmse 0.80 0.75 0.70 0.65 ALS rmse 0.950 0.945 0.940 0.95 MCMC sec ALS sec MCMC fastfm fastfm core libfm Figure : Runtime and convergence comparison between the two fastfm interfaces and libfm. The evaluation is done on the movielens ml-00-k dataset and train error is reported. 0.985 0.980 0.975 0.970 0.965 0.960 0.955 0.950 test rmse.90.85.80.75.70.65.60.55.50 alpha 5.4 5. 5.0 4.8 4.6 4.4 4. 4.0 lambda_w.8 5 0 5 0 5 0 5 40 45 50 90 80 70 60 50 40 mu_w 0 5 0 5 0 5 0 5 40 45 50 Figure : MCMC chain analysis and convergence diagnostics example. 4

References Immanuel Bayer and Steffen Rendle. Factor models for recommending given names. ECML PKDD Discovery Challenge, page 8, 0. John Canny and Huasha Zhao. Bidmach: Large-scale learning with zero memory allocation. In BigLearn Workshop, NIPS, 0. Tianqi Chen, Weinan Zhang, Qiuxia Lu, Kailong Chen, Zhao Zheng, and Yong Yu. Svdfeature: a toolkit for feature-based collaborative filtering. The Journal of Machine Learning Research, ():69 6, 0. Samantha R Cook, Andrew Gelman, and Donald B Rubin. Validation of software for bayesian models using posterior quantiles. Journal of Computational and Graphical Statistics, 5(), 006. Timothy A Davis. Direct methods for sparse linear systems, volume. Siam, 006. Christoph Freudenthaler, Lars Schmidt-thieme, and Steffen Rendle. Bayesian factorization machines. Yucheng Low, Joseph E Gonzalez, Aapo Kyrola, Danny Bickson, Carlos E Guestrin, and Joseph Hellerstein. Graphlab: A new framework for parallel machine learning. arxiv preprint arxiv:408.04, 04. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, :85 80, 0. Steffen Rendle. Social network and click-through prediction with factorization machines. 0a. Steffen Rendle. Factorization machines with libfm. ACM Trans. Intell. Syst. Technol., ():57: 57:, May 0b. ISSN 57-6904. Steffen Rendle and Lars Schmidt-Thieme. Factor models for tag recommendation in bibsonomy. In ECML/PKDD 008 Discovery Challenge Workshop, part of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 5 4, 009. Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ing from implicit feedback. UAI 09, pages 45 46, Arlington, Virginia, United States, 009. Nathan Srebro, Jason Rennie, and Tommi S Jaakkola. Maximum-margin matrix factorization. In Advances in neural information processing systems, pages 9 6, 004. 5