Recurrent Latent Variable Networks for Session-Based Recommendation

Similar documents
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Lecture 16 Deep Neural Generative Models

Probabilistic & Bayesian deep learning. Andreas Damianou

Variational Autoencoders

Density Estimation. Seungjin Choi

Auto-Encoding Variational Bayes

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

Variational Bayesian Logistic Regression

Variational Inference via Stochastic Backpropagation

Bayesian Semi-supervised Learning with Deep Generative Models

Variational Autoencoder

Reading Group on Deep Learning Session 1

Bayesian Deep Learning

STA414/2104 Statistical Methods for Machine Learning II

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

STA 4273H: Statistical Machine Learning

The connection of dropout and Bayesian statistics

Scaling Neighbourhood Methods

Gaussian Process Approximations of Stochastic Differential Equations

Deep Sequence Models. Context Representation, Regularization, and Application to Language. Adji Bousso Dieng

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Non-Parametric Bayes

Collaborative topic models: motivations cont

Large-scale Ordinal Collaborative Filtering

Probabilistic Graphical Models

Latent Variable Models

Reinforcement Learning and NLP

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Discrete Latent Variable Models

Default Priors and Effcient Posterior Computation in Bayesian

Dropout as a Bayesian Approximation: Insights and Applications

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Natural Gradients via the Variational Predictive Distribution

* Matrix Factorization and Recommendation Systems

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

NCDREC: A Decomposability Inspired Framework for Top-N Recommendation

Statistical learning. Chapter 20, Sections 1 3 1

Chapter 20. Deep Generative Models

arxiv: v2 [cs.cl] 1 Jan 2019

Variational Dropout via Empirical Bayes

Probabilistic Graphical Models

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

Machine Learning Final Exam May 5, 2015

STA 4273H: Statistical Machine Learning

Probability and Information Theory. Sargur N. Srihari

VCMC: Variational Consensus Monte Carlo

Recent Advances in Bayesian Inference Techniques

Variational Inference for Monte Carlo Objectives

Variational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain

Variational Inference and Learning. Sargur N. Srihari

Auto-Encoding Variational Bayes. Stochastic Backpropagation and Approximate Inference in Deep Generative Models

CS Homework 3. October 15, 2009

PATTERN RECOGNITION AND MACHINE LEARNING

Lecture : Probabilistic Machine Learning

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Diversity-Promoting Bayesian Learning of Latent Variable Models

Development of a Deep Recurrent Neural Network Controller for Flight Applications

Deep Feedforward Networks

RECURRENT NETWORKS I. Philipp Krähenbühl

Probabilistic Matrix Factorization

Probabilistic Reasoning in Deep Learning

VIME: Variational Information Maximizing Exploration

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Machine Learning Techniques for Computer Vision

Advances in Variational Inference

Click Prediction and Preference Ranking of RSS Feeds

Approximate Bayesian inference

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions

Introduction to Machine Learning

Statistical learning. Chapter 20, Sections 1 4 1

STA 4273H: Statistical Machine Learning

Greedy Layer-Wise Training of Deep Networks

Machine Learning Basics: Maximum Likelihood Estimation

Probabilistic Graphical Models for Image Analysis - Lecture 4

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

ECE521 Lectures 9 Fully Connected Neural Networks

The Success of Deep Generative Models

ECE521 week 3: 23/26 January 2017

Reinforcement Learning as Variational Inference: Two Recent Approaches

Gaussian Process Approximations of Stochastic Differential Equations

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

Lecture 13 : Variational Inference: Mean Field Approximation

Linear classifiers: Overfitting and regularization

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Doubly Stochastic Inference for Deep Gaussian Processes. Hugh Salimbeni Department of Computing Imperial College London

Dynamic Poisson Factorization

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

Bayesian Regression Linear and Logistic Regression

Statistical learning. Chapter 20, Sections 1 3 1

NOWADAYS, Collaborative Filtering (CF) [14] plays an

Notes on Machine Learning for and

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Variational Autoencoders (VAEs)

Deep Latent-Variable Models of Natural Language

Learning Gaussian Process Models from Uncertain Data

Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server

Transcription:

Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 1 / 23

Overview 1 Motivation 2 Proposed Approach 3 Experiments Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 2 / 23

Motivation Recommender Systems (RS) constitute a key part of modern e-commerce websites. Recent study on RS has mainly focused on matrix factorization (MF) methods and neighborhood search-type models. Session-based recommendation cannot be properly addressed by such conventional methodologies. In session-based RS, recommendation is based only on the actions of a user during a specific browsing session. This gives rise to hard challenges that stem from the unavailability of rich user profiles (data sparsity). On the other hand, user session data constitute action sequences potentially entailing rich, complex, and subtle temporal dynamics. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 3 / 23

Motivation We posit that, effective extraction of these underlying dynamics may facilitate addressing the problems stemming from data sparsity. In the past, Markov Decision Process (MDP) models have been used to this end, with only limited success. Recently, the Deep Learning breakthrough, especially, advances in Recurrent Neural Network (RNN) models, has inspired new, immensely promising lines of research in the field. Characteristically, Hidasi et al. (2016) employed gated recurrent unit (GRU) RNNs that are presented with data regarding the items a user selects and clicks on during a given session. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 4 / 23

Proposed Approach: Main Rationale We seek to increase the capability of RNN-driven session-based RS to ameliorate the negative effects of data sparsity. To this end, we attempt to account for uncertainty, by means of Bayesian inference. Our proposed approach is founded upon the fundamental assumption that the hidden units of the postulated RNNs constitute latent variables imposed some appropriate prior distribution. Then, we use the training data to infer the corresponding posteriors. For scalability, we employ amortized variational inference (AVI): (i) Parameterizes the inferred posteriors via conventional neural networks (inference networks); and (ii) casts inference as optimization. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 5 / 23

Proposed Approach: ReLaVaR ReLaVaR Formulation We frame the session-based recommendation problem as a sequence-based prediction problem. We denote as {x i } n i=1 a user session; here, x i is the ith clicked item, which constitutes a selection among m alternatives, and is encoded in the form of an 1-hot representation. We formulate session-based recommendation as the problem of predicting the score vector y i+1 = [y i+1,j ] m j=1 of the available items with respect to the following user action, where y i+1,j R is the predicted score of the jth item. At each time point, we select the top-k items (as ranked via y) to recommend to the user. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 6 / 23

Proposed Approach: ReLaVaR ReLaVaR Formulation Formally, the recurrent units activation vector, h, of a GRU are updated at time i according to the following expression: h i = (1 z i ) h i 1 + z i ĥi (1) where z i = τ(w z x i + U z h i 1 ) (2) ĥ i = tanh(w x i + U(r i h i 1 )) (3) and τ() is the logistic sigmoid function. r i = τ(w r x i + U r h i 1 ) (4) ReLaVaR extends upon these design principles by rendering the developed GRU model amenable to Bayesian inference. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 7 / 23

Proposed Approach: ReLaVaR ReLaVaR Formulation We consider that the component recurrent unit activations are stochastic latent variables; we impose a simple prior distribution: p(h i ) = N (h i 0, I ) (5) Then, we seek an efficient means of inferring the corresponding posterior distributions, given the available training data. We draw inspiration from AVI; we postulate that the sought posteriors, q(h), approximately take the form of Gaussians with means and isotropic covariance matrices parameterized via GRU networks. q(h i ; θ) = N (h i µ θ (x i ), σ 2 θ (x i)i ) (6) Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 8 / 23

Proposed Approach: ReLaVaR ReLaVaR Formulation Hence, we have: [µ θ (x i ), log σ 2 θ (x i)] = (1 z i ) [µ θ (x i 1 ), log σ 2 θ (x i 1)] + z i ĥi (7) where z i = τ(w z x i + U z [µ θ (x i 1 ), log σ 2 θ (x i 1)]) (8) ĥ i = tanh(w x i + U(r i [µ θ (x i 1 ), log σ 2 θ (x i 1)])) (9) and r i = τ(w r x i + U r [µ θ (x i 1 ), log σ 2 θ (x i 1)]) (10) while [ξ, ζ] denotes the concatenation of vectors ξ and ζ. Then, the values of the latent variables (stochastic unit activations) h i are drawn as samples from the inferred posterior density (6). Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 9 / 23

Proposed Approach: ReLaVaR ReLaVaR Formulation We impose a suitable Multinoulli distribution over the generated output variables of our model, y i+1, conditional on the corresponding latent vectors, h i. p(y i+1,j = 1 h i ) τ(w j y h i ) (11) where W y = [w j y ] m j=1 are trainable parameters of the output layer of the model. This selection can be viewed as giving rise to a pointwise ranking criterion, with the negative conditional log-likelihood from (11), L s, corresponding to the Categorical Cross-Entropy loss function. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 10 / 23

Proposed Approach: ReLaVaR ReLaVaR Training Algorithm Let us consider a training dataset D, which comprises a number of click sequences (sessions), pertaining to a multitude of users. AVI for ReLaVaR consists in maximization of the evidence lower-bound (ELBO) expression: log p(d) { KL [ q(h i ; θ) p(h i ) ] } E[L s ] i (12) Here, KL [ q p ] is the KL divergence between q( ) and p( ). Unfortunately, the posterior expectation E[L s ] cannot be computed analytically; hence, its gradient becomes intractable. Approximating this expectation via Monte-Carlo (MC) samples would result in estimators with unacceptably high variance. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 11 / 23

Proposed Approach: ReLaVaR ReLaVaR Training Algorithm We ameliorate this problem by means of a smart reparameterization of the MC samples drawn from the Gaussian posterior density: h γ = µ θ ( ) + σ θ ( ) ɛ γ (13) where ɛ γ is white random noise with unitary variance, i.e. ɛ γ N (0, I ). By adopting this reparameterization, the drawn posterior MC samples are expressed as differentiable functions of the sought parameter sets, θ, and some random noise variable, with low variance. Then, we employ the gradient of the reparameterized ELBO [via (13)], in the context of Adagrad, to train the model. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 12 / 23

Proposed Approach: ReLaVaR Prediction Generation Recommendation generation in the context of a user session can be performed by computing the predicted ratings y i+1, and selecting the top-k of them to recommend to the user. To this end, we sample the latent variables h i from the corresponding variational posterior distributions, under a standard MC-type rationale. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 13 / 23

Experiments We exploit the RecSys Challenge 2015 benchmark. This comprises click-stream data pertaining to user sessions with an e-commerce website. We split the available data into one set comprising 7,966,257 sessions (with a total of 31,637,239 click actions), and another one comprising the remainder 5,324 sessions (with a total of 71, 222 click actions); we use the former for training and the latter for testing. Both sets entail a total of 37,483 items that a user may select to click on. Thus, we are dealing with a very sparse dataset. Our source codes have been developed in Python/Theano. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 14 / 23

Experiments ReLaVaR Model Configuration We perform Glorot Normal initialization of the model parameters. For regularization purposes, we apply Dropout on the component GRUs, with a rate equal to 0.5. To facilitate convergence, Nesterov momentum is additionally applied during training. We do not perform any tedious data augmentation procedure or model pretraining. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 15 / 23

Experiments Evaluation Metrics Recall@20 metric expresses the frequency at which the desired item in the test data appears in the 20 highest ranked items suggested by the evaluated approach. MRR@20 describes the average predicted score of the desired items in the test data, with the score values set to zero if the desired item does not make it to the top-20 list of ranked items. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 16 / 23

Experiments Table: Best performance results of the evaluated methods. Method Model Size Recall@20 MRR@20 BPR-MF - 0.2574 0.0618 GRU w/ BPR Loss 1000 0.6322 0.2467 GRU w/ TOP1 Loss 1000 0.6206 0.2693 M2 100 0.7129 0.3091 M4 1000 0.6676 0.2847 ReLaVaR 1500 0.6507 0.3527 Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 17 / 23

Experiments Table: ReLaVaR model performance for different selections of the employed loss function, L s (results correspond to best network configuration). Loss Function TOP1 Cross-Entropy # Latent Units 1000 1500 Step Size 0.1 0.05 Momentum Weight 0 0 Recall@20 0.6250 0.6507 MRR@20 0.2727 0.3527 Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 18 / 23

Experiments Table: Comparison of computational times (in seconds), for various selections of the employed loss function. Results pertain to network configurations yielding best accuracy. Method Network Size Training time (total) Prediction time per click event (average) GRU w/ BPR Loss 1000 Units 48692.48 0.683 GRU w/ TOP1 Loss 1000 Units 44716.60 0.627 ReLaVaR w/ TOP1 Loss 1000 Units 42357.84 0.595 ReLaVaR w/ Cross-Entropy Loss 1500 Units 60109.86 0.844 Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 19 / 23

Experiments Figure: ReLaVaR performance fluctuation with the number of latent variables. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 20 / 23

Experiments Figure: ReLaVaR performance fluctuation with the number of latent variables: Use of the TOP1 loss function. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 21 / 23

Questions...? Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 22 / 23

The project leading to this research has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 692251. Panayiotis Christodoulou (C.U.T.) ReLaVaR 27/8/2017 23 / 23