arxiv: v1 [stat.ml] 5 Dec 2016

Similar documents
Graphical Models for Query-driven Analysis of Multimodal Data

A Brief Overview of Nonparametric Bayesian Models

Spatial Normalized Gamma Process

Bayesian Nonparametrics for Speech and Signal Processing

Large-scale Ordinal Collaborative Filtering

Non-parametric Clustering with Dirichlet Processes

Mixed Membership Matrix Factorization

Hierarchical Dirichlet Processes

Lecture 3a: Dirichlet processes

Mixed Membership Matrix Factorization

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes

Collapsed Variational Inference for HDP

Latent Dirichlet Allocation (LDA)

CSC 2541: Bayesian Methods for Machine Learning

Unified Modeling of User Activities on Social Networking Sites

Contents. Part I: Fundamentals of Bayesian Inference 1

Topic Models and Applications to Short Documents

Gentle Introduction to Infinite Gaussian Mixture Modeling

Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process

Lecture 13 : Variational Inference: Mean Field Approximation

arxiv: v1 [stat.ml] 8 Jan 2012

Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models

Online Bayesian Passive-Aggressive Learning"

Online Bayesian Passive-Agressive Learning

Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures

Non-Parametric Bayes

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Advanced Machine Learning

Distance dependent Chinese restaurant processes

Hierarchical Bayesian Nonparametrics

IPSJ SIG Technical Report Vol.2014-MPS-100 No /9/25 1,a) 1 1 SNS / / / / / / Time Series Topic Model Considering Dependence to Multiple Topics S

Dirichlet Processes: Tutorial and Practical Course

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Online Bayesian Passive-Aggressive Learning

Latent Dirichlet Allocation (LDA)

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Bayesian Nonparametric Models

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

arxiv: v1 [cs.cl] 1 Apr 2016

16 : Approximate Inference: Markov Chain Monte Carlo

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

Tree-Based Inference for Dirichlet Process Mixtures

Bayesian Methods for Machine Learning

Nonparametric Bayesian Models for Sparse Matrices and Covariances

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Two Useful Bounds for Variational Inference

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Bayes methods for categorical data. April 25, 2017

Dependent hierarchical processes for multi armed bandits

Bayesian Hidden Markov Models and Extensions

Bayesian Nonparametric Models on Decomposable Graphs

Approximate Inference using MCMC

Bayesian Nonparametrics: Dirichlet Process

Introduction to Probabilistic Machine Learning

Latent Dirichlet Allocation Based Multi-Document Summarization

Learning to Learn and Collaborative Filtering

Bayesian nonparametric models for bipartite graphs

Distributed ML for DOSNs: giving power back to users

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Topic Learning and Inference Using Dirichlet Allocation Product Partition Models and Hybrid Metropolis Search

Latent Dirichlet Bayesian Co-Clustering

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Machine Learning Overview

Pattern Recognition and Machine Learning

Algorithms other than SGD. CS6787 Lecture 10 Fall 2017

Lecture 2: Priors and Conjugacy

Image segmentation combining Markov Random Fields and Dirichlet Processes

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Introduction to Machine Learning. Lecture 2

Bayesian non parametric approaches: an introduction

Hierarchical Dirichlet Processes with Random Effects

Lecturer: David Blei Lecture #3 Scribes: Jordan Boyd-Graber and Francisco Pereira October 1, 2007

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Interpretable Latent Variable Models

Gaussian Mixture Model

28 : Approximate Inference - Distributed MCMC

Nonparametric Spherical Topic Modeling with Word Embeddings

13: Variational inference II

STA 4273H: Statistical Machine Learning

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

RaRE: Social Rank Regulated Large-scale Network Embedding

Evaluation Methods for Topic Models

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

CSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Closed-form Gibbs Sampling for Graphical Models with Algebraic constraints. Hadi Mohasel Afshar Scott Sanner Christfried Webers

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Predictive Discrete Latent Factor Models for large incomplete dyadic data

The Doubly Correlated Nonparametric Topic Model

Dirichlet Process. Yee Whye Teh, University College London

Machine Learning using Bayesian Approaches

Machine Learning Summer School, Austin, TX January 08, 2015

Exponential Families

Latent Dirichlet Allocation Introduction/Overview

Transcription:

A Nonparametric Latent Factor Model For Location-Aware Video Recommendations arxiv:1612.01481v1 [stat.ml] 5 Dec 2016 Ehtsham Elahi Algorithms Engineering Netflix, Inc. Los Gatos, CA 95032 eelahi@netflix.com Abstract We are interested in learning customers video preferences from their historic viewing patterns and geographical location. We consider a Bayesian latent factor modeling approach for this task. In order to tune the complexity of the model to best represent the data, we make use of Bayesian nonparameteric techniques. We describe an inference technique that can scale to large real-world data sets. Finally we show results obtained by applying the model to a large internal Netflix data set, that illustrates that the model was able to capture interesting relationships between viewing patterns and geographical location. 1 Introduction In a web application we are provided with a rich view of each user. For example in a video streaming application, like Netflix, we can observe not only their preference for different types of content but also how those preferences change with respect context, such as time of day, day of week, device, and so on. An important contextual variable that influences a customer s preferences is their geographical location. It is reasonable to assume that customers who live in close proximity may have similar viewing preferences. Hence, a model is required that can capture not only a customer s latent viewing preferences, but also the relationship between those and their location. To capture both these aspects we seek to model them in a unified model so that both location and viewing behavior can take advantage of information in each modality. For this task, we employ a nonparametric latent factor model to jointly model a customer s viewing history and their geographical location. Nonparametric mixed membership style techniques have shown great promise in modeling large collections of documents [1]. Given that there is more information available for a document (author, date of publishing, metadata etc. than just its content, it seems natural to extend these approaches to model all these modalities in a unified approach. Hence there have been many attempts in applying nonparametric latent factor modeling for such data sets [2]. Our approach uses a similar model structure as [2] which attempts to model document-level features along with the content of documents. For the problem under consideration, we view a customer s viewing history as an unordered collection of discrete view events from Netflix s video catalog. Geographical locations of customers are expressed in longitudes and latitudes. The geographical locations can be viewed as points on a 2-sphere. Therefore we use an approach similar to [3] using Von Mises-Fisher distribution to describe geographical data. The full model combines these sub-components (viewing history and geographical location and is able to learn embeddings for customers viewing history data, geographical location data, and the interactions between the two. The following sections detail how we model these components, how we infer that model (in a way that scales to large-scale data sets, and finally results from an internal Netflix data set that illustrates 30th Conference on Neural Information Processing Systems (NIPS 2016, Barcelona, Spain.

that the model is indeed able to capture interactions between geographical information and viewing preferences. 2 Model Details The component of our model which describes customers streaming history data is a nonparametric mixed membership model that uses a hierarchical dirichlet process to learn latent video factors; each of which are multinomial distributions over content catalog. Similarly, the component of our model that models geographical locations uses hierarchical dirichlet process to learn latent factors for geographical locations; each of which are Von Mises-Fisher distributions over a 2-sphere. Finally the relationship between the two latent spaces is expressed through a dirichlet process over the interaction of video and location latent factor spaces. We summarize our modeling assumptions as follows and then comment on different components of the model. 1: φ 0 DP(. α φ0, H(µ, c, π 0 DP(. α π0, H(β 2: ω DP(. α ω, H(DP(. α φ, φ 0 x DP(. α π, π 0 3: for customer d in data set D do 4: (φ d, π d ω 5: µ d, c d φ d 6: location d Von Mises-Fisher(. µ d, c d 7: for j d in video history J d do 8: β jd π d 9: v jd Multinomial(. β jd, 1. 10: end for 11: end for 2.1 Modeling Location Data For geographical location data of customers, we need a distribution which can express the spherical nature of the data. We make use of Von Mises-Fisher distribution for modeling locations. We use the following parameterization of Von Mises-Fisher distribution: Pr(x µ, c = C D (c exp(cµ T x (1 c where C D (c = 0.5D 1 (2π 0.5D I 0.5D 1 (c ; µ and c are the parameters of the distributions; I 0.5D 1(c is modified Bessel function of first kind with order 0.5D 1 computed at c. This parameterization requires locations to be expressed in Euclidean coordinates. Hence, we convert geo-spherical coordinates to Euclidean system. The prior distributions for µ and c are: Pr(µ µ 0, c 0 = Von Mises-Fisher(µ µ 0, c 0 (2 Pr(c m c, σ c = lognormal(c m c, σ c (3 The prior distribution of µ is chosen to be a Von Mises-Fisher Distribution itself which is conjugate to Von Mises-Fisher likelihood. The concentration parameter c does not have a conjugate prior. We use a log normal prior for c similar to [3]. 2.2 Modeling Video History Data As mentioned above, we view customers videos streaming history as unordered collections of videos watched from the Netflix s catalog. We use a Dirichlet-Multinomial conjugate model for representing video streaming history of customers: Pr(v β = Multinomial(v β, 1 (4 Pr(β γ = Dirichlet(β γ (5 Multinomial(v β, 1 represents a single draw from the multinomial distribution on a video catalog of size V. 2

2.3 Modeling Interaction of Video and Location Latent Factors The interaction of video and geographical latent spaces is modeled by a dirichlet process with a product base measure DP(. α φ, φ 0 x DP(. α π, π 0 i-e the base measure is on atoms which are pairs of dirichlet processes drawn from the dirichlet process on location and video latent factors respectively. This construction allows the model to flexibly learn as many interactions between video preferences and geo-locations as needed to best express the data. 2.4 Inference We use a sampling based approach for posterior inference. Due to dirichlet-multinomial conjugacy in the video component of the model, we collapse out β for each latent video factor. For the location component of the model, prior distribution of µ (Von Mises-Fisher is conjugate to Von Mises-Fisher likelihood, hence we collapse out µ as well for each latent location factor. The prior distribution of c (log-normal is not conjuage to Von Mises-Fisher likelihood, hence we use Metropolis-Hasting algorithm to sample c for each latent location factor. For the nonparametric components, we make use of the direct assignment scheme described in [1]. Hence, instead of sampling atoms, we sample indicators to those atoms. Specifically, t d (taking values in t = 1,..., is the indicator to the atom (φ td, π td, s d (taking values in s = 1,..., is the indicator to the atom (µ sd, c sd, and z jd (taking values in z = 1,..., is the indicator to the atom β zjd. Additionally, we sample the global dirichlet processes φ 0 and π 0 according to the direct assignment scheme in [1]. The sampling distributions for these latent variables are as follow: Pr(t d = t... (n d t ( n d t,s d + α φ φ Jd ( 0,sd n d n d 1 t,. + α φ j d =1 t,z jd + α π π 0,zjd n d 2 t,. + α π J d Pr(t d = t new... (α ω (φ 0,sd (π 0,zjd (7 j d =1 Pr(s d = s... (n d t d,s + α φ φ 0,s C D (c s C D c s l:l d,s l =s location l + c 0 µ 0 C cs D l:s l =s location l + c 0 µ 0 (8 (6 Pr(s d = s new C D c 0 µ 0... (α φ φ 0,s newc D (c s new C D c s newlocation d + c 0 µ 0 ( v jd n z,v jd + γ vjd Pr(z jd = z... (n j d t d,z + α π π 0,z n vj d z,. Pr(z jd = z new... (α π π 0,z new ( γvjd V v=1 γ v + V v=1 γ v (9 (10 (11 (C D (c s ns C D (c 0 Pr(c s... log-normal(c s m c, σ c C D ( cs d:s d =s location d + c 0 µ 0 (12 Above, Pr(variable... represents the complete conditional distribution of the variable. Notations like n d t,s d represent conditional counts; count of variables t and s d ignoring customer d for example. Notations like n d 1 t,. and n d 2 t,. represent marginal counts; marginal counts of variable t, marginalizing over s d and z jd respectively for all customers except d (subscripts 1 and 2 are used to differentiate the two marginals involving t. 3 Experiments In order to scale our sampling based posterior inference, we use an approximate parallel gibbs sampling approach as described in [4]. For our experiment we use an internal data set that contains video viewing history for one million Netflix customers along with their geographical locations. We include some of the examples of latent video and geographical factor learned by our model as well 3

as the top three video topics for the two geographical latent factors found in the United States of America. (a Romantic Shows Topic (b Documentaries Topic Figure 1: Video Latent Factors capturing Romantic Shows and Documentaries Figure 2: Two Example Geographical Latent Factors Found in the United States of America. Figure 3: Top Video Topics for the geographical latent factor Figure 4: Top Video Topics for the geographical latent factor 4 Conclusion We use bayesian non-parameteric machinery to combine geographical and viewing behavior information of customers of Netflix for location-aware video recommendations. The approach presented can also be helpful in situations where the viewing history data is sparse or cold-start scenario. 4

References [1] Teh, Y.W., Jordan, M.I.,Beal, M.J., & Blei, D.M. (2006 Hierarchical Dirichlet Process. Journal of the American Statistical Association, 101, 1566-1581. [2] Nguyen, V., Phung, D., Nguyen, X.,Venkatesh, S. & Bui, H.H. (2014 Bayesian Nonparametric Multilevel clustering with group-level contexts. Proceedings of the ICML [3] Gopal, S. & Yang, Y. (2014 Von Mises-Fisher Clustering Models. Proceedings of the ICML. [4] Newman, D., Asuncion, A., Smyth, P. & Welling, M. (2009 Distributed Algorithms for Topic Models. Journal of Machine Learning,10(Aug:1801-1828. 5