NetBox: A Probabilistic Method for Analyzing Market Basket Data

Similar documents
Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Scaling Neighbourhood Methods

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

732A61/TDDD41 Data Mining - Clustering and Association Analysis

Association Rule Mining on Web

Data Analytics Beyond OLAP. Prof. Yanlei Diao

SQL-Rank: A Listwise Approach to Collaborative Ranking

Expectation Propagation for Approximate Bayesian Inference

Associa'on Rule Mining

STA 4273H: Statistical Machine Learning

Data mining, 4 cu Lecture 5:

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Algorithmisches Lernen/Machine Learning

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

An Introduction to Statistical and Probabilistic Linear Models

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Unsupervised Learning. k-means Algorithm

Andriy Mnih and Ruslan Salakhutdinov

Naïve Bayes classification

Modified Entropy Measure for Detection of Association Rules Under Simpson's Paradox Context.

An Introduction to Bayesian Machine Learning

Large-scale Ordinal Collaborative Filtering

Selecting a Right Interestingness Measure for Rare Association Rules

STA 4273H: Statistical Machine Learning

Mining Positive and Negative Fuzzy Association Rules

Outline. Fast Algorithms for Mining Association Rules. Applications of Data Mining. Data Mining. Association Rule. Discussion

CS5112: Algorithms and Data Structures for Applications

Matrix and Tensor Factorization from a Machine Learning Perspective

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Learning in Undirected Graphical Models

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Mining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany

Black-box α-divergence Minimization

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05

Decoupled Collaborative Ranking

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Recent Advances in Bayesian Inference Techniques

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Matrix Factorization Techniques for Recommender Systems

Bayesian Machine Learning

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Introduction to Machine Learning Midterm Exam

COMP 5331: Knowledge Discovery and Data Mining

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

CS 584 Data Mining. Association Rule Mining 2

CSE 5243 INTRO. TO DATA MINING

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Generative Clustering, Topic Modeling, & Bayesian Inference

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Statistical Learning. Philipp Koehn. 10 November 2015

Generative Models for Discrete Data

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti

Bayesian Approaches Data Mining Selected Technique

Machine Learning Techniques for Computer Vision

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

arxiv: v2 [cs.lg] 5 May 2015

Gaussian Process Vine Copulas for Multivariate Dependence

Collaborative topic models: motivations cont

Frequent Itemset Mining

Recommendation Systems

Factorization Models for Context-/Time-Aware Movie Recommendations

Lecture : Probabilistic Machine Learning

LEARNING WITH BAYESIAN NETWORKS

Collaborative Filtering Applied to Educational Data Mining

CHAPTER 2: DATA MINING - A MODERN TOOL FOR ANALYSIS. Due to elements of uncertainty many problems in this world appear to be

1. Data summary and visualization

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

COMP 5331: Knowledge Discovery and Data Mining

Factor Analysis (10/2/13)

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

CSE 5243 INTRO. TO DATA MINING

Bayesian Learning in Undirected Graphical Models

Association Rules. Acknowledgements. Some parts of these slides are modified from. n C. Clifton & W. Aref, Purdue University

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Lecture 5: Clustering, Linear Regression

Click Prediction and Preference Ranking of RSS Feeds

A Tutorial on Learning with Bayesian Networks

Improved Bayesian Compression

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Data Mining Techniques

Introduction to Machine Learning Midterm Exam Solutions

Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology

Collaborative Filtering. Radek Pelánek

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

PMR Learning as Inference

Latent Dirichlet Conditional Naive-Bayes Models

Statistical Data Mining and Machine Learning Hilary Term 2016

Probabilistic Matrix Factorization with Non-random Missing Data

Correlation Preserving Unsupervised Discretization. Outline

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Transcription:

NetBox: A Probabilistic Method for Analyzing Market Basket Data José Miguel Hernández-Lobato joint work with Zoubin Gharhamani Department of Engineering, Cambridge University October 22, 2012 J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data 22, 2012 1 / 25

Market Basket Data A store sells a large set of products P = {p 1,..., p d }. A transaction (basket) t i P contains the products bought by a customer during a particular visit to the store. The transactions t 1,..., t n can be encoded as a binary matrix X. X can be very large, e.g. 10 8 10 4. J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data 22, 2012 2 / 25

Market Basket Analysis (MBA) and Association Rules MBA allows us to identify patterns in customer purchases. Ideally we would like to answer questions like: What products are usually bought together? What products may benefit from promotion? What are the best cross-selling opportunities? Association Rules is a popular method for MBA [Agrawal et al. 1994]. Generates rules of the form A B, where A, B P and A B =. A B means that if A t holds, then we should expect B t to hold also, with high probability. {peanut butter, jelly} {bread} Problem: The number of possible rules grows exponentially with d. Solution: filter the rules using minimum support and confidence thresholds. support(a B) = P(A B t). confidence(a B) = P(B t A t). J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data 22, 2012 3 / 25

Some Disadvantages of Association Rules (ARules) No obvious procedure for selecting support and confidence values. Too large and many interesting associations can be missed. Too small and we obtain an explosion of non-significant rules. Arules usually generates a very large number of rules. Identifying the few interesting rules among the many obvious or redundant ones can be difficult. Importantly, ARules, as an unsupervised learning method, is usually outperformed by other techniques when making predictions. This means that there are some patterns in the data which are not fully captured by ARules. J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data 22, 2012 4 / 25

NetBox: A Probabilistic Method for MBA I NetBox addresses the previous disadvantages of ARules as follows: NetBox follows a Bayesian approach. Any hyper-parameter value is either marginalized out or tuned automatically to the data without any human supervision. Instead of rules, NetBox generates a network of products [Raeder and Chawla, 2011]. The networks generated often contain several connected compoments or clusters of products. By focusing on these clusters, we avoid to examine huge lists with many redundant or non-interesting rules. NetBox has better predictive performance than ARules and it is competitive or better than alternative state-of-the-art methods at a lower computational cost. J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data 22, 2012 5 / 25

NetBox: A Probabilistic Method for MBA II Let P x be an ideal distribution such that any arbitrary row x = (x 1,..., x d ) T of the transaction matrix X is sampled from P x. We want to specify a model for P x that can be adjusted to the available data. For this, we follow the framework of dependency networks [Heckerman et al. 2001] and attempt to learn the conditional distributions P(x 1 x 1 ),..., P(x d x d ). We assume that each conditional P(x i x i ) is a mixture of the predictive distributions of different models. In its current form, NetBox mixes the prediction of two models: A sparse binary classifier (NetBox-SBC). A conditional model based on matrix factorizations (NetBox-CMF). J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data 22, 2012 6 / 25

NetBox-SBC P(x i w, ɛ, x i ) = ɛ + (1 2ɛ)Θ[(2x i 1)(x i w d + w d )], P(w z) = d i=1 [z in (w i 0, v) + (1 z i )δ(w i )], P(z) = d i=1 Bern(z i p i ), P(ɛ) = Beta(ɛ a 0, b 0 ), where a 0 = 1, b 0 = 9, p 1,..., p d 1 = 0.5 and p d = 1. The posterior distribution is approximated by Q(w, ɛ, z) = Beta(ɛ ã, b) d i=1 [N (w i m i, ṽ i )Bern(z i p i )] using assumed density filtering [Opper, 1998]. J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data 22, 2012 7 / 25

NetBox-CMF P(X U, V) = n i=1 d j=1 N (x i,j u i v T j, σ2 ), P(U) = n i=1 k j=1 N (u i,j 0, t U j ), P(V) = d i=1 k j=1 N (v i,j 0, s V j ), The posterior distribution is approximated by [ n ] [ Q(U, V) = k i=1 j=1 N (u i,j m i,j U, ṽ i,j U ) d ] k i=1 j=1 N (v i,j m i,j V, ṽ i,j V ) using variational Bayes and the analytic method of Nakajima et al. 2010. The conditional is modeled assuming P(x i x i, w ) = N (x i x i w, σ 2 ). The posterior of w is approximated with Q(w ) = d 1 i=1 N (w i m i, ṽ i ). by matching the predictive mean and variance of the MF model. J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data 22, 2012 8 / 25

Model Mixing We compute the average log marginal likelihood on the available data: l SBC i = n 1 n j=1 log [x j,ip SBC (x j,i = 1 x j, i ) + (1 x j,i )(1 P SBC (x j,i = 1 x j, i ))] l CMF i = n 1 n j=1 log P CMF(x j,i x j, i ) Let π i be the mixing weight for NetBox-SBC. Then, we estimate π i as ˆπ i = exp(l SBC i )[exp(l SBC i ) + exp(l CMF i )] 1, Finally, we generate predictions using P NetBox (x i = 1 x i ) = ˆπ i P SBC (x i = 1 x i ) + (1 ˆπ i )x T i m. J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data 22, 2012 9 / 25

Generating a Network of Products We assign a weight w(j, i) to the edge connecting products j and i as w(j, i) = P NetBox (x i = 1 x j = 1, x j = 0) P NetBox (x i = 1 x i = 0). We identify the relevant connections using a statistical test: We generate X rand with the same marginals as X but independent entries. NetBox is run on X Rand to obtain a collection of weights w Rand (j, i). Critical values are obtained by fitting a GPD to {w Rand (k, i) : k = 1,..., d}. We set to zero the non-significant weights. Finally, we prune edges to maximize the number of connected components in the network. Density 0 20 40 60 80 100 0.02 0.00 0.01 0.02 0.03 0.04 J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 10 / 25

Evaluation of the Prediction Accuracy of NetBox Data split into disjoint sets of training and test transactions. A 15% of the products in the test transactions are eliminated. We try to identify the products missing from each test transaction. Preformance measure: recall at 10. Benchmark methods: Association rules (Arules). Asymetric matrix factorization (AMF) [Pan et al, 2009]. Rank optimized matrix factorization (ROMF) [Rendle et al, 2009]. Ranking based on frequency (Freq). J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 11 / 25

Results J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 12 / 25

Networks of Products J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 13 / 25

Rules Generated by ARules in the Small Netflix Dataset ARules generated more than 100,000 rules. We list the top rules according to lift. J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 14 / 25

More Rules... J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 15 / 25

And More Rules... J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 16 / 25

Top Connected Components NetBox Netflix Dataset J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 17 / 25

Top Frequent Itemsets MaxEnt Netflix Dataset J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 18 / 25

Top Connected Components NetBox Pubmed Dataset J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 19 / 25

Top Frequent Itemsets MaxEnt Pubmed Dataset J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 20 / 25

Top Connected Components NetBox Books Dataset J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 21 / 25

Top Frequent Itemsets MaxEnt Books Dataset J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 22 / 25

Conclusions NetBox is a probabilistic method for market basket analysis which: Follows a Bayesian approach and does not require the user to specify any hyper-parameter value. Produces a network of products in which related items are connected to each other. These networks are easier to interpret than a list of rules. Obtains very good predictive performance. Identifies patterns whose support is too low to be identified by frequent itemset methods based on entropy measures. J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 23 / 25

References Agrawal, Rakesh and Srikant, Ramakrishnan. Fast algorithms for mining association rules in large databases. In VLDB, pp. 487 499, 1994. Raeder, Troy and Chawla, Nitesh. Market basket analysis with networks. Social Network Analysis and Mining, 1:97 113, 2011. Pan, Rong and Scholz, Martin. Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering. In KDD, pp. 667 676, 2009. Heckerman, David, Chickering, David Maxwell, Meek, Christopher, Rounthwaite, Robert, and Kadie, Carl. Dependency networks for inference, collaborative filtering, and data visualization. The Journal of Machine Learning Research, 1:4975, 2001. Opper, Manfred. On-line learning in neural networks. chapter A Bayesian approach to on-line learning, pp. 363 378. Cambridge University Press, New York, NY, USA, 1998. Nakajima, Shinichi, Sugiyama, Masashi, and Tomioka, Ryota. Global analytic solution for variational Bayesian matrix factorization. In NIPS, pp. 17681776, 2010. S. Rendle, C. Freudenthaler, Z. Gantner, and S.-T. Lars. BPR: Bayesian personalized ranking from implicit feedback. In UAI, pages 452461, 2009. T. De Bie. Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23:407446, 2011. J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 24 / 25

Thank you for your attention! J. M. Hernández-Lobato (UC) NetBox: A Probabilistic Method for Analyzing Market Basket October Data22, 2012 25 / 25