A Spatial-Temporal Probabilistic Matrix Factorization Model for Point-of-Interest Recommendation

Similar documents
Aggregated Temporal Tensor Factorization Model for Point-of-interest Recommendation

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends

Exploiting Geographic Dependencies for Real Estate Appraisal

Scaling Neighbourhood Methods

Learning to Recommend Point-of-Interest with the Weighted Bayesian Personalized Ranking Method in LBSNs

Location Regularization-Based POI Recommendation in Location-Based Social Networks

Regularity and Conformity: Location Prediction Using Heterogeneous Mobility Data

Collaborative Location Recommendation by Integrating Multi-dimensional Contextual Information

Time-aware Point-of-interest Recommendation

Click Prediction and Preference Ranking of RSS Feeds

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation

Downloaded 09/30/17 to Redistribution subject to SIAM license or copyright; see

A Survey of Point-of-Interest Recommendation in Location-Based Social Networks

Andriy Mnih and Ruslan Salakhutdinov

Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University Bethlehem, PA

Restricted Boltzmann Machines for Collaborative Filtering

Friendship and Mobility: User Movement In Location-Based Social Networks. Eunjoon Cho* Seth A. Myers* Jure Leskovec

Predicting the Next Location: A Recurrent Model with Spatial and Temporal Contexts

Graphical Models for Collaborative Filtering

STA 4273H: Statistical Machine Learning

Mixed Membership Matrix Factorization

* Matrix Factorization and Recommendation Systems

Mixed Membership Matrix Factorization

Inferring Friendship from Check-in Data of Location-Based Social Networks

Circle-based Recommendation in Online Social Networks

Large-scale Ordinal Collaborative Filtering

Recommendation Systems

Collaborative topic models: motivations cont

Yahoo! Labs Nov. 1 st, Liangjie Hong, Ph.D. Candidate Dept. of Computer Science and Engineering Lehigh University

Discovering Geographical Topics in Twitter

Collaborative Recommendation with Multiclass Preference Context

Mining Personal Context-Aware Preferences for Mobile Users

Bayesian Methods for Machine Learning

Introduction to Machine Learning

Scalable Bayesian Matrix Factorization

HBGG: a Hierarchical Bayesian Geographical Model for Group Recommendation

Large-scale Collaborative Ranking in Near-Linear Time

Decoupled Collaborative Ranking

Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data

Impact of Data Characteristics on Recommender Systems Performance

User Preference Learning with Multiple Information Fusion. restaurant recommendations.

A Modified PMF Model Incorporating Implicit Item Associations

Recurrent Latent Variable Networks for Session-Based Recommendation


Content-based Recommendation

Time-aware Collaborative Topic Regression: Towards Higher Relevance in Textual Item Recommendation

GeoMF: Joint Geographical Modeling and Matrix Factorization for Point-of-Interest Recommendation

Data Mining Techniques

Service Recommendation for Mashup Composition with Implicit Correlation Regularization

CoSoLoRec: Joint Factor Model with Content, Social, Location for Heterogeneous Point-of-Interest Recommendation

Bayesian Contextual Multi-armed Bandits

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

Lecture 16 Deep Neural Generative Models

Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains

Recent Advances in Bayesian Inference Techniques

SQL-Rank: A Listwise Approach to Collaborative Ranking

UAPD: Predicting Urban Anomalies from Spatial-Temporal Data

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

Probabilistic Local Matrix Factorization based on User Reviews

Matrix Factorization Techniques for Recommender Systems

Mixture-Rank Matrix Approximation for Collaborative Filtering

Exploiting Local and Global Social Context for Recommendation

Collaborative Filtering. Radek Pelánek

STA 4273H: Sta-s-cal Machine Learning

Recommender System for Yelp Dataset CS6220 Data Mining Northeastern University

STA 4273H: Statistical Machine Learning

ECS289: Scalable Machine Learning

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

Recommendation Systems

Rating Prediction with Topic Gradient Descent Method for Matrix Factorization in Recommendation

arxiv: v2 [cs.ir] 14 May 2018

Joint user knowledge and matrix factorization for recommender systems

Demand and Trip Prediction in Bike Share Systems

Algorithms for Collaborative Filtering

Gentle Introduction to Infinite Gaussian Mixture Modeling

Study Notes on the Latent Dirichlet Allocation

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Collaborative Filtering Applied to Educational Data Mining

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Latent Dirichlet Allocation

Personalized POI Recommendation on Location-Based Social Networks. Huiji Gao

NOWADAYS, Collaborative Filtering (CF) [14] plays an

EM-algorithm for motif discovery

Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

Lecture 13 : Variational Inference: Mean Field Approximation

Collaborative Topic Modeling for Recommending Scientific Articles

A Novel Click Model and Its Applications to Online Advertising

Modeling User Rating Profiles For Collaborative Filtering

CPSC 540: Machine Learning

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

RaRE: Social Rank Regulated Large-scale Network Embedding

Activity Identification from GPS Trajectories Using Spatial Temporal POIs Attractiveness

Multi-Attribute Bayesian Optimization under Utility Uncertainty

STA414/2104 Statistical Methods for Machine Learning II

Probabilistic Matrix Factorization

Pattern Recognition and Machine Learning

VCMC: Variational Consensus Monte Carlo

Transcription:

Downloaded 9/13/17 to 152.15.112.71. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php Abstract A Spatial-Temporal Probabilistic Matrix Factorization Model for Point-of-Interest Recommendation Huayu Li Richang Hong + Zhiang Wu Yong Ge With the rapid development of Location-based Social Network (LBSN) services, a large number of Point-of-Interests (POIs) have been available, which consequently raises a great demand of building personalized POI recommender systems. A personalized POI recommender system can significantly help users to find their preferred POIs and assist POI owners to attract more customers. However, due to the complexity of users checkin decision making process that is influenced by many different factors such as POI distance and region s prosperity, and the dynamics of user s preference, POI recommender systems usually suffer from many challenges. Although different latent factor based methods (e.g., probabilistic matrix factorization) have been proposed, most of them do not successfully incorporate both geographical influence and temporal effect together into latent factor models. To this end, in this paper, we propose a new Spatial-Temporal Probabilistic Matrix Factorization () model that models a user s preference for POI as the combination of his geographical preference and other general interest in POI. Furthermore, in addition to static general interest of user, we capture the temporal dynamics of user s interest as well by modeling checkin data in a unique way. To evaluate the proposed model, we conduct extensive experiments with many state-of-the-art baseline methods and evaluation metrics on two real-world data sets. The experimental results clearly demonstrate the effectiveness of our proposed model. Keywords: POI recommendation, Spatial-Temporal Probabilistic Matrix Factorization 1 Introduction Recent years have witnessed the rapid prevalence of location-based social network (LBSN) services such as Foursquare, Jiepang, and Facebook Places that can significantly facilitate users outdoor activities by providing a large number of nearby Point-of-Interests (POIs) in a real-time fashion. A variety of user interaction da- Computer Science Department, UNC Charlotte, {hli38,yong.ge}@uncc.edu. + Hefei University of Technology, hongrc@hfut.edu.cn. Nanjing University of Finance and Economics, zawu@seu.edu.cn. ta with these LBSN services such as searching locations, providing checkin information and posting tips after visiting a POI have been accumulated that provide a good opportunity for developing personalized POI recommender systems. Indeed, the accurate and personalized POI recommendation is a crucial demand in LBSN services. First, given the massive locations, it is very difficult for users to find their preferred ones in an efficient way. A personalized POI recommender system would help users easily find relevant POIs without spending much time on searching, particularly when a user is in a new region. Also, it is very challenging for POI owners to deliver right POIs to various users. A personalized POI recommender system is able to not only ease the burden, but also attract more customers with the recommended POIs. While developing personalized POI recommender systems could greatly benefit both users and POI owners, it is a very challenging problem due to two main factors. First, a user s checkin decision making process is very complex and could be influenced by many different kinds of factors. For example, the distance of a POI might have influence on user s preference for it. A user would like to prefer a nearby POI rather than a far away one. Meanwhile, a user may choose a POI which is located at a relatively prosperous region for the sake of high quality services and crowd human activities. In addition, whether a user would check in a POI or not may depend on his specific purpose. For instance, when people want to have lunch, they would like to choose those POIs relevant to food rather than sights. Some people may be used to going a certain region for the same purpose, such as food. These factors play an important role in user s preference for a POI. Second, a user s preference may change over time. For example, a user likely chooses a restaurant next to his work place for convenience in week days, but he possibly goes to a bar for entertainment at weekends. These factors make building POI recommender systems become difficult. In the literature, some related works on POI recommendations have been proposed [11, 6, 12, 24, 1, 27]. For example, [23] leveraged the power law property between the checkin probability and the distance of pair- 117

Downloaded 9/13/17 to 152.15.112.71. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php wise visited locations for POI recommendation. [15] proposed to model a user s preference for a location as a multiplication of his interest in POI, the popularity of POI, and the distance between him and POI based on Bayesian non-negative matrix factorization. Moreover, [6] assumed each individual user had different latent preference vector in each temporal state in order to capture the temporal effect. However, these models have not incorporated both geographical information and temporal effects together into latent factor models such as probabilistic matrix factorization. In addition, the geographical information used in these models is limited. In other words, only the distance of POI is leveraged to predict a user s preference for a POI. In fact, as mentioned earlier, there are many other geographical factors implicitly affecting a user s checkin decision making process. To address these limitations, in this paper, we propose a novel model that incorporates multiple geographical features and temporal characteristics into matrix factorization for POI recommendation. To this end, we first propose a new framework that incorporates multiple geographical features into matrix factorization. The preference of a user for a POI is modeled as the combination of his geographical preference for the POI and his general interest in the POI. User s geographical preference for a POI is modeled based on three types of geographical features including distance, prosperity and category. Unlike traditional matrix factorization methods that consider user s general interest as static, in this paper we also model the temporal dynamics of user s general interest. Specifically, we propose to represent each checkin event with a tuple including POI category and checkin time. We analogize a checkin event as a term and one user s historical checkins as a document, and then adapt topic modeling to model all observed checkins. Each user s historical checkins will be characterized with a set of topics, each of which is essentially a distribution over all tuples and represents a particular kind of checkin pattern. As checkin time is included in each tuple, the learned topic distribution for each user naturally captures the temporal dynamics of user s general interest. Consequently, we propose our Spatial-Temporal Probabilistic Matrix Factorization () model that takes into account all these factors. To evaluate the proposed model, we conduct extensive experiments with two real-world data sets and compare our method with many state-of-the-art baseline methods based on different validation metrics. The experimental results clearly demonstrate the effectiveness of our model. 2 Method Let us first introduce the formal definition of Geographical Feature and Checkin Event, and then present a general framework to incorporate geographical features into matrix factorization and learn user preference in a more robust way. Finally, we present the specific Spatial- Temporal Probabilistic Matrix Factorization () model and our estimation method. 2.1 Definitions. Recent works on POI recommendation have revealed that the distance of POI has great influence on user s preference for POIs [23, 2, 5, 26, 14, 15, 13]. In fact, there are other more geographical factors that also significantly affect user s checkin decision making process. For example, the business prosperity of a region where a POI is located might affect user s preference. Users are likely to choose a POI where its surrounding environment is relatively prosperous as it often indicates high-credibility and high-quality services and crowd human activities. In addition, users are often used to visiting a certain region with the same purposes such as shopping or eating. In fact, this has been reflected in many real-world checkin data, where each POI belongs to one or more categories such as food, sights and etc. For example, we pick up a user s historical checkins from our data and show them in Figure 1(b), where red marker represents category nightlife and black marker indicates category sights. Two regions are observed: one top red box area and one bottom red box area where there are about 2689 and 125 services in their neighbors respectively. We can see that: (1) He would like to go to the top region for visiting sights and prefer the bottom region for nightlife; (2) He likes those POIs in rather prosperous regions. Thus, a group of geographical features are defined as follows. Definition 1. (Geographical Features) Geographical features of a POI include the business prosperity of the region where the POI is located, the distance of the POI, and the number of user s historical POIs in this region associated with each category. Then geographical features of POI j for user i could be represented with a multi-dimensional vector: (2.1) f ij = (prosperity(j), d(i, j), n i1,..., n ic ) T, where prosperity(j) is the prosperity of region where POI j is located and could be measured by the number of business services in this region. d(i, j) represents the distance between the home of user i and POI j. n ic is the number of POIs within category c visited by user i in this region. Note that more other geographical information could be included into f ij if it is available for a particular data set. User s temporal preference for checkins follows a cyclic pattern [6, 7, 5]. Therefore, we divide the time into two parts: time slot in a day and day in a week. Specifically, time slot in a day includes morning, noon, 118

Downloaded 9/13/17 to 152.15.112.71. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php afternoon, evening and midnight. While traditional methods simply regard each checkin as a location, we represent a checkin event with category and temporal information as follows. Definition 2. (Checkin Event) A checkin event is a tuple consisting of the category of POI, the checkin day in a week, and the checkin time slot in a day. For example, an checkin event can be represented as Restaurant, Saturday, Evening, where Restaurant is the category. As can be seen, a checkin event essentially describes what a user is doing at a particular time. In above example, it indicates that the user is eating at a restaurant on Saturday evening. λv Vj j = 1...M fij Rij λ φt T lin Ui Wi gi λg β n = 1...Ni zin θi U s i i = 1...N (a) (b) Figure 1: (a)the graphical model of. (b) An example of a user s historical check-ins. Red box represents a region, and each color indicates a category. α λu λs λw 37.79 37.785 37.78 37.775 37.77 37.765 37.76 37.755 122.43 122.425 122.42 122.415 122.41 122.45 122.4 122.395 122.39 2.2 The General Framework. We propose a general framework to generate accurate POI recommendation for N users over M locations. In reality, there are many latent factors affecting users preference for POIs. Particularly, different from consuming movies, when users visit or check in POIs, most of them would take into account the geographical factors [23] such as distance and region prosperity, in addition to other general factors such as intrinsic property of POIs. Thus, based on matrix factorization method, we propose to model the rating (i.e. checkin frequency) denoted as ˆR ij of user i for POI j as a combination of the general preference and the geographical preference of user i for POI j as: (2.2) ˆR ij = Ui T V j + g i Wi T f ij, where f ij represents the observed geographical features defined in section 2.1. W i is the i-th user s latent preference for geographical features and is assumed to be drawn from a Gaussian distribution with zeromean and covariance matrix as λ 1 w I G ) (where G is the dimension of geographical feature). g i is a weight indicating how much the i-th user s checkin decision is affected by geographical factors. The item Wi T f ij captures the geographical effect on check-in decision. U i and V j are the K-dimensional user and location specific latent vectors respectively, both of which are drawn from Gaussian distribution. Ui T V j are assumed to capture the i-th user s preference for the j-th POI s other intrinsic property, such as environment, price and service, that also affect the checkin decision. Furthermore, user s general preference may change over time. Thus we argue that user s preference characteristics may consist of two parts: stationary preference which has nothing to do with time, and temporal preference that changes over time. Specifically, the i-th user s general preference could be decomposed into stationary preference Ui s and temporal preference Ui b as follows: U i = Ui s + Ui b + K, where K is the K-dimensional Guassian noise. In other words, U i can be regarded to be drawn from the Gaussian distribution with Ui s +Ub i as mean and λ 1 u I K as covariance matrix as the following: (2.3) U i N (Ui s + Ui b, λ 1 u I K ). In this paper, we will adapt topic modeling technique to discover the temporal preference Ui b based on the defined checkin events, which will be discussed in details in the following section. In addition, as mentioned in [22, 6], the location characteristics captured with V j are inherent properties and do not change much as time goes. Therefore, we only model the dynamics for user preference. Ui s is assumed to be drawn from a Gaussian distribution while Ui b is a latent vector which could have various modeling methods according to application circumstance. We want to emphasize that the proposed model for incorporating geographical information is different from [13, 15, 23]. [15, 23] only model POI distance and [13] only considers an area s influence, all of which are not able to take other more geographical factors into consideration for users checkin decision making process. However, our proposed framework is much more general, and could incorporate different kinds of geographical features into matrix factorization. 2.3 The Model. Two recent works have been proposed to incorporate temporal effect into matrix factorization for improving the accuracy of recommendation [1, 6]. However, there are some shortcomings for these two methods. First, both of them utilize multiple user-specific latent vectors to model each individual s preference at different temporal state. Each temporal state usually corresponds to a time window. For example, [1] considers each day as a different temporal state. And as cyclic pattern is taken into account, [6] regards a pair of hour and day as a temporal state. Both of them have many different temporal states for each user. As a result, the number of observed ratings for fitting each user latent vector at each temporal state would significantly decrease, and learning so many userspecific latent vectors for each individual makes recom- 119

Downloaded 9/13/17 to 152.15.112.71. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php mender systems very inefficient. Second, both methods could not predict the rating of one user well when the corresponding temporal state does not appear in training set. But in fact, for the location checkin application, only a small set of ratings per user are observed that belong to a very limited number of temporal states. Thus these methods could not make reasonable predictions for users at new temporal states. All these limitations motivate us to find a more feasible and effective way to capture user s temporal preference. If we take a close look at user s checkin records, we can find that there are some underlying topics that could explain the reason why users check in POIs at particular times. For example, users are very likely to frequently check in some similar locations such as bars or restaurants around evening at weekend for a similar purpose such as party. In other words, there will be many checkin events that have similar categories and close times and often appear together in user s checkin records. And a set of such checkin events essentially describe a purpose of checkin behavior. As each checkin event has both category and time information, such a checkin purpose (e.g., party) provides a possible interpretation for the reason why users would like to check in these POIs at a particular time. And each user will have different preference for checkin purposes. Based on this assumption, for the user i, we analogize a checkin event l in as a term and his historical checkins {l in n = 1,..., N i } as a document (N i is the number of checkin events for user i), and then employ topic modeling to model all observed checkins. The learned topic will be a distribution over all checkin events. Each individual s checkins are a mixture over a set of interpretable topics. As a checkin event includes checkin time, each individual s topic distribution naturally captures his temporal preference. Consequently, we develop a specific method namely Spatial-Temporal Probabilistic Matrix Factorization () model. Its generative process and graphical model are shown in Table 1 and Figure 1(a) respectively. In the model, ϕ t is the distribution over checkin events for the t-th topic, and drawn from Dirichlet(β) prior. Each checkin event l in is drawn from ϕ corresponding to its hidden topic z in being drawn from the individual-specific mixture weights θ i over T topics. Thus, based on Eq.(2.3), we assume that the user s general preference consists of a static one and temporal one, and is assumed to be drawn from the Gaussian distribution with Ui s + Xθ i as the mean vector: (2.4) U i N (Ui s + Xθ i, λ u I K ), where X is a matrix that is used to transfer user topical space into the user latent space, and Xθ i indicate user s temporal preference. Also, we place the zero-mean Table 1: The generative process for model. 1. For global context, draw each entry in matrix X ij N (, λ 1 x ). 2. For each user i, a. Draw topic proportion θ i Dir(α). b. Draw user static preference U s i N (, λ 1 s I K). c. Draw user general preference U i N (U s i + Xθ i, λ 1 u I K). d. Draw user graphical preference W i N (, λ 1 w I G). e. Draw user graphical influence g i N (, λ 1 g ) f. For each check-in event l in, i. Draw topic z in Mult(θ i ). ii. Draw check-in event l in Mult(ϕ zin ). 3. For each topic t, draw ϕ t Dir(β). 4. For each location j, a. Draw location latent vector V j N (, λ 1 v I K). 5. For each user-location pair (i, j), a. Draw rating R ij N (U T i V j + g i W T i f ij, λ 1 ). spherical Gaussian prior on each entry of X. There are three advantages to adopt this method to capture users temporal preference: (1) The extracted topic provides a novel interpretation for a user s checkin event; (2) The process of extracting topics from large numbers of checkin events is a kind of dimensionality deduction. Thus, we do not need to model multiple user-specific preferences at different temporal states for each individual; (3) It is able to address the prediction problem mentioned before, because each user has a different topic distribution learned in training data and not explicitly related to each specific temporal state. Thus unlike [1, 6], even though one users some temporal states may be not observed in the training set, their learned topic distributions could be still used as their dynamic preference for the prediction. 2.4 Parameter Estimation. We employ the Maximum-A-Posteriori (MAP) for the parameter estimation of model. Given the observed rating R, geographical feature f and hyperparameters, maximizing the posterior is equivalent to maximizing the log-likelihood of latent parameters. Thus, the log-likelihood is obtained as follows: (2.5) N M L = λ I ij(r ij U T i V j g iw T i fij)2 i j N N i T N T + log ϕ t,lin θ it + (α t 1) log θ it i n i t T L + (β l 1) log ϕ tl. 2 F, t l s.t. k, < ϕ il, θ it 1, L l=1 ϕ il = 1, T θit = 1, where I ij is the indicator function that is equal to 1 if the i-th user checks in the j-th location and equal to otherwise.. 2 F is denoted as:. 2 F = λ u U U s Xθ 2 F + λ s U s 2 F + λ v V 2 F + λw W 2 F + λx X 2 F + λg g 2 F. As in Equation (2.5) the summation over t for item ϕ t,lin θ it is within the log function, it is intractable for us to estimate θ by maximizing the log-likelihood. To overcome this problem, We adopt the similar method proposed in [21] to obtain the relaxed lower-bound. Specifically, we first define q(z in = t) = ψ int, and then 12

Downloaded 9/13/17 to 152.15.112.71. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php apply Jensen s inequality. Finally, the object function with respect to θ has the following lower-bound: N N i T N N i T log ϕ t,lin θ it >= ψ int [log θ it ϕ t,lin log ψ int ], i n i n s.t. t, < ψ int 1, and T ψint = 1. Therefore, we can relax the object function by using Variational Inference method for latent variable Z. The lower-bound for the object function is obtained by integrating above equation back into Eq(2.5): (2.6) N M L = λ I ij (R ij U T i V j g i W T i f ij) 2 i j N N i T + { ψ int [log θ it ϕ t,lin log ψ int ] i n t T T L + (α t 1) log θ it } + (β l 1) log ϕ tl Θ 2 F, t t l s.t. k, < ϕ il, θ it, ψ int 1, L ϕ il = 1, T {θ it, ψ int } = 1. l=1 Alternating Least Squares (ALS) is a popular optimization method leading to more accurate parameter estimation and faster convergence. Thus, we utilize ALS method to compute each latent variable by fixing the other variables when maximizing the relaxed loglikelihood in Eq(2.6). The updating equation for each variable of interest shown in Table 2 is obtained by setting its gradient of the relaxed log-likelihood to zero. However, it is difficult to get the close-form of variable θ. Therefore, we utilize the gradient-based searching algorithm to assist with searching its optimal solutions. By fixing other variables, we could obtain the object function with respect to θ as follows: ˆθ it = argmin θ it {λ u (U i U s i Xθ i) T (U i U s i Xθ i) (2.7) N i ψint log θit (αt 1)logθit}, n=1 s.t. t, < θ it 1, and T θit = 1. To optimize θ, the gradient descent is employed to find the optimal solution with the following gradient: L(θ it ) (2.8) = 2λ u(xθ i + U s i θ Ui)(X)t it t Ni n=1 ψ int + α t 1 θ it, where (X) t represents the t-th column in matrix X. More details of the algorithm is shown in Table 2. In addition, C i is a diagonal matrix with λi ij (j = 1,..., M) as its diagonal elements, denoted as C i = (λi ij ) M j=1. Similarly, we denote C j = (λi ij ) N i=1, F i = (g i f ij ) G j=1, R i,\uv = (R ij U T i V j) M j=1, and R i,\w f = (R ij g i W T i V j) M j=1. 3 Experimental Results 3.1 Datasets. In this paper, we use two Foursquare data sets to evaluate our proposed model: (1) Data 1 [9] contains the checkin history of users who live in the California, ranging from December 29 to January 213; (2) Data 2 [4] contains the checkin data from September 21 to January 211. Each checkin record in both two data sets includes user ID, location ID and timestamp, where each location has latitude, longitude and category information. After Table 2: The Algorithm for Model. Step 1: Randomly initialize variables. Step 2: Execute E-Step and M-Step in each iteration repeatedly until the log-likelihood in Eq(2.6) converges: E-Step: V j = ( ju T + λ vi K ) 1 jr j,\w f U i = (V C i V T + λ u I K ) 1 [V C i R i,\w f + λ u (U s i + Xθ i)] W i = (F i C i F T i + λ w I G ) 1 F i C i R i,\uv U s i = λ u (Ui Xθi) λu+λs X = (λ uθθ T + λ xi T ) 1 λ u(u U s )θ T ψ int = θ it ϕ t,lin Normalize ψ ijt. ϕ tl = N,Ni i,n ψ int I(l in = l) + β l 1 Normalize ϕ tl. Update θ it: Use gradient descent with constraints to find the optimal solution of θ it with its object function shown in Eq(2.7) and its gradient as Eq(2.8), where the constraint is s.t. t, < θ it 1, and T θit = 1. M-Step: g i = λ M j=1 I ij (R ij U T i V j )W T i f ij λg +λ M j=1 I ij (W T i f ij )2 Step 3: After obtaining optimal Û, ˆV, Ŵ and ĝ, the predicted rating on i-th user for j-th location is calculated by: R ij = Û T ˆV i j + ĝ i Ŵ T i f ij. merging the overlapped categories, we totally obtain 6 (or 7) different kinds of categories in data 1 (or data 2). Meanwhile, we remove users who have visited less than 5 locations and locations which are visited by less than 2 users. The data statistics for two data sets are reported in Table 3. We can observe that Data 2 is nearly 1 times sparser than Data 1. Table 3: Statistics of Data Sets Data Set #User #Location #Checkin Sparsity Data 1 3,375 36,866 312,568.25% Data 2 74,343 198,161 3,51,68.24% In addition, we divide the space into a set of grids based on latitude and longitude. Each grid is regarded as a region. In our experiment, the size of region is about 5km 5km (i.e..45 by.45 square degrees). We then crawl all venues for each individual region with Foursquare API. As Foursquare API returns at most 5 venues (locations) with a specified center point and searching radius, we propose a simple method to crawl all venues in each region as follows. First, we get all venues in one specified region if Foursquare API returns less than 5 venues, and we end up searching. Otherwise we further divide this region into four small areas and repeat the above steps in each small area until each area contains less than 5 venues or its area size is less than 1m 1m. Second, we merge all crawled venues in the same region according to the venue ID. We use the number of venues in each region to indicate its prosperity degree.moreover, in the experiments we adopt the method [4] to locate user s home from his all historical checkins for calculating the POI distance. In recommendation system, we aim to recommend 121

Downloaded 9/13/17 to 152.15.112.71. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php those unvisited locations for users. Thus, we split the training and testing data as follows: for each user, (1) aggregating the checkins for each individual location; (2) sorting the location according to the first time that user checks in; (3) selecting the earliest 8% to train the model, and using the next 2% as testing. 3.2 Experimental Settings. Parameters λ and α are set as.1 and 1.1 respectively. Meanwhile, other parameters are specified as. 3.3 Evaluation Metrics. The model is quantitatively evaluated in terms of prediction accuracy, ranking accuracy and Top-K recommendation performance. Prediction Accuracy. RMSE [19] is exploited to directly measure the prediction accuracy for the unobserved ratings in the testing. Ranking Accuracy. Both Kendall Tau Coefficients (Tau) and Normalized Discounted Cumulative Gain (ndcg) are used to measure the overall ranking accuracy. Suppose each estate i is associated with a groundtruth rating x i and a predicted rating y i. There are n pairs. Each pair < i, j > is said to be concordant, if both x i > x j and y i > y j or if both x i < x j and y i < y j. Also, it is said to be discordant, if both x i > x j and y i < y j or if both x i < x j and y i > y j. Tau is then defined as (#conc #disc)/( 1 2n(n 1)), where #conc is the number of concordant pairs and #disc is the number of discordant pairs. DCG is defined as N i=1 2rel i 1 log 2 (1+i), where rel i is the testing rating. Given the ideal DCG (IDCG), i.e. DCG of the groundtruth ratings, ndcg is defined as DCG/IDCG. Top-K Recommendation. Precision@K and Recall@K would be used to evaluate the Top-K recommendation performance. Meanwhile we also adopt MAP metric, mean of average precision (AP). AP is N j=1 p(j) rel(j) computed as AP i = #relavant locations. Precision@K and Recall@K are calculated as P recision@k = L i L T K(L i ) L L i L R K(L i ) and Recall@K = i L T K(L i ) #relavant locations, respectively. Specifically, R K (L i ) are the top-k locations recommended to user i, T K (L i ) denotes all truly relevant locations among R K (L i ), L represents the set of locations in the testing, j is the position in the rank list, N is the number of returned items in the list, p(j) is the precision of a cut-off rank list from 1 to j, and rel(j) is an indicator function. The term relevant locations is defined as the locations in the testing. 3.4 Baseline Methods. To demonstrate the effectiveness of our model, we compare it with five baseline methods: (1) [6] that assumes users have different preference in each temporal state and use multiple user-specific latent vectors for different temporal states; (2) [23], that considers both geographical effect based on power law characteristics, and user interest based on user-based collaborative filtering for POI recommendation; (3) [19], that assumes the user and location latent vectors to be drawn from Gaussian distribution and estimates a user s preference on a location as the dot product of user-specific and location-specific latent vector; (4) [2], a Bayesian non-negative matrix factorization algorithm that places exponential priors on latent vectors and utilizes the Gibbs sampler to approximate the posteriors of latent variables; (5) F, user-based collaborative filtering with Pearson correlation as the similarity measurement. RMSE Kendall Tau ndcg.61.6.59.58.57.56.55.54.3.25.2 5 1 2 3 4 5 6 7 8 9 1 (a) RMSE on Data 1 F 2 3 4 5 6 7 8 (c) Kendall Tau on Data 1.924.922.92.918.916.914.912 F 2 3 4 5 6 7 8 RMSE Kendall Tau.584.5835.583.5825.582.5815.581.585.58.5795.579.25.2 5 1 2 3 4 5 6 7 8 9 1 (b) RMSE on Data 2 F 1 2 3 4 5 6 7 8 9 1 (d) Kendall Tau on Data 2 1 2 3 4 5 6 7 8 9 1 (e) ndcg on Data 1 (f) ndcg on Data 2 Figure 2: Performance comparison in terms of prediction accuracy and ranking performance. 3.5 Performance Comparison. We will evaluate our proposed model with the baseline methods. Specifically, the topic number is set to be the same value as the latent feature number, and we will provide more discussions about it in Section 3.7. 3.5.1 Prediction Accuracy Performance. We examine the prediction accuracy performance for different models. Figure 2(a) and 2(b) show the comparison of RMSE for our proposed model verse baseline methods on two data sets. Since only outputs the proba- ndcg.946.945.944.943.942.941.94.939 F 122

Downloaded 9/13/17 to 152.15.112.71. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php.3.25.2 5 2.8.6.4.2 (a) Precison@5 on Data 1..35.3.25.2 5 (b) Precison@1 on Data 1. 2.8.6.4.2.2 5 (c) Recall@5 on Data 1..2 5 (d) Recall@1 on Data 1. (e) Precison@5 on Data 2. (f) Precison@1 on Data 2. (g) Recall@5 on Data 2. (h) Recall@1 on Data 2. Figure 3: Performance comparisons in term of precision and recall with top 5 and 1 on two data sets. bility for a unvisited location, we do not compare it in this part. The RMSE of F on Data 1 and Data 2 is.6443 and.6591, and the RMSE of on Data 2 is ranging from.5929 to.611. As they are far larger than other models, we do not plot them in the figure. The lower RMSE indicates the better prediction to groundtruth ratings. From the result, we can see that achieves the best performance on both of two data sets, while F performs the worst. The superior performance of is due to the usage of geographical information and the dynamic characteristics. As the observed data is very sparse, it makes the possibility of finding users who have similar POI preferences become much smaller, which as a result leads to a poor prediction performance on F. also could model the temporal effects for user preference, but when the data set becomes much sparser, the number of the observed ratings distributed to model each individual s different user-specific latent vector in each temporal s- tate would decrease significantly. Thus it becomes hard to fit the model very well. It exactly explains why performs better than on data 1 but slightly worse than on data 2, where data 2 is much sparser than data 1. Based on the result, we also can observe that performs not very well on RMSE metric probably. The reason is that is difficult to optimize the model s latent vectors on checkin data. 3.5.2 Ranking Accuracy Performance. Figure 2(c) (Figure 2(d)) and Figure 2(e) (Figure 2(f)) show the overall ranking performance of and baseline methods in terms of Tau and ndcg on data 1 (data 2). From the results, we can find that is much.25.2 5.25.2 5 superior than the baseline methods on both two data sets in terms of two metrics. Since could accurately model one user s checkin decision making process, it obviously leads to a better result. By exploiting temporal effect to model user-specific preference, is better than, and F. The performance of and is similar, which shows the importance of geographical and temperate effects in location prediction. However, we also can see that and perform much worse than, because (1) is hard to fit each user-specific latent vector with such less observed ratings; (2) only utilizes the distance characteristic. The sparseness of data makes F achieve the poor performance, especially in data 2 where the s- parseness is more evident, F has a much worse result. fails to work well due to that more other explicit factors would affect user s checkin decision making process. Also, the sparseness causes both and to perform very worse. 3.5.3 Top-K Recommendation Performance. The performance of Precison@K, Recall@K and MAP for different models on two data sets are reported in Figure 3, 4(a) and 4(b). Totally, performs the best among all methods on these three metrics; while, and F are nearly the worst. It occurs likely because they do not model the geographical and temporal effects. is superior than specifically on Data 2 due to (1) Most of users would visit a nearby location, which could be modeled with distance property. When recommending top K locations among all locations, would have more accurate results. (2) When data becomes much sparser, each user s user-specific la- 123

Downloaded 9/13/17 to 152.15.112.71. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php 5 5 2 4 6 8 (a) (b) (c) (d) Figure 4: (a) (b) Performance comparison in terms of MAP on Data 1 and Data 2. (c) Performance comparison with different g i value. (d) Relationship between user latent feature and topic number. tent vector for each different state in would be fitted by much less observed ratings. It also illustrates the geographical effects play an more important role in POI recommendation. appropriately leverages multiple geographical features to learn user s interests, as a result it improves the recommendation accuracy. It can be observed that the performance on data 1 is worse than the one on data 2 (which is much sparser) possibly due to that a much smaller set of the observed data is employed to train the model in Data 1. 3.6 Geographical Influence. The numeric value g i indicates how much the check-in decision of user i at a POI is affected by the geographical influence. In our model, g i could be any manually specified value or automatically learned by optimization algorithm (denoted as Geo-Learned). Therefore we construct the following experiments to discuss the geographical influence on location prediction. We fix g i with value,.5 and 1, denoted as Geo-, Geo-.5 and Geo- 1, respectively. The result based on Tau in Data 1 is shown in Figure 4(c). Based on the results, we could obtain the following observations. First, our proposed method always outperforms. In particular, as the value of g i increases, the improvement becomes more obvious. When only models the dynamic of user s general preference (i.e. g i is ), it outperforms. It directly indicates that modeling one user s general temporal preference by leveraging his topic distribution is superior than by using multiple user-specific latent vectors on different temporal states. Second, with the increase of the value of g i, performs much better. It illustrates that user pays more attention on geographical information when he chooses a POI. Third, the performance of the learned g i is slightly worse than that of g i = 1, but better than other values. It happens due to that it is difficult to find a very accurate optimal solution for g i with the limited observed ratings. However, automatically learning geographical influence is meaningful as the geographical influence on the checkin decision for each Kendall Tau.26.24.22.2 8 6 Geo Learned Geo 1 Geo.5 Geo Kendall Tau.24.23.22.21.2 9 8 7 individual user is very different. 2 4 6 8 1 3.7 User Latent- and Topical- Space Relationship. In the model, we utilize the learned topic distribution to capture the dynamics of user s general preference. It meaningful to examine the relationship between user latent space and user topical space, which could be reflected from the relationship between user latent feature number and topic number. Therefore, we show the ranking accuracy performance in terms of Tau on Data 1 for with different topic numbers in Figure 4(d) and the latent feature dimension is set to 3. The Tau performance of is also reported as comparison. Based on the results, we can see that no matter what the topic number is, always performs better than. It indicates that incorporating geographical influence and modeling the dynamic of user s general preference would benefit the POI recommendations. Moreover, we could observe that when the topic number is 3, which has the same value as the specified feature number, the performance is nearly superior than others. It illustrates that the topical space dimension is supposed to be consistent with the user latent feature dimension. If the topical dimension is larger than the user latent feature dimension, it becomes more difficult to align the topical space with user latent feature space. As a result, the performance is not good as the one with matches between two spaces. 4 Related Work Related works can be grouped into two categories. The first category is about Matrix Factorization [18, 2]. For example, [19] assumed that users and items were modeled by latent factors with Gaussian prior, and the observed ratings were drawn from a Gaussian distribution with the mean as the dot product of latent user and location factors. [2] regarded both user and item latent vectors are non-negative with exponential prior and proposed to exploit Gibbs sampler to approximate the posteriors of latent factors. The second category is about POI Recommendation. The related works about POI recommenda- 124

Downloaded 9/13/17 to 152.15.112.71. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php tion mainly consist of two sub-categories. The first subcategory throws light on modeling geographical influence. For example, [23] proposed to leverage a power law distribution to estimate the check-in probability with the distance of any pair of visited POIs, due to the spatial clustering phenomenon exhibited in LBSNs. However, to model the geographical distance, [5, 2, 26] further adopted the Gaussian based model and [26, 14] utilized the Kernel density estimation method. In addition, [15] proposed to exploit the framework of. Based on the weighted matrix matrix factorization, [13] modeled the POI s geographical influence with user s activity area and POI s influence area, and [16] considered two types of geographical characteristics: geographical neighborhood and region effect. The second sub-category focuses on modeling temporal effect [3, 17, 22, 8]. For example, [6] distinguished user s latent factors in different temporal state and then took several strategies to aggregate user check-in preference in each temporal state. [7, 5] employed multicenter Gaussian mixture model to capture a user s temporal preference. Moreover, [25] adopted the summation to fuse user s interest for POI and graphical influence, where the user s interest for POI could be learned by a User-based Collaborative Filtering extension with temporal state. However, our work is different from these existing works: (1) The geographical features taken into account for POI recommendation could be extended to include multiple different factors. (2) It is novel to use topic modeling similar method to capture the temporal effect in Point-of-Interest recommendation. 5 Conclusions In this paper, we proposed a novel model for POI recommendation that incorporates both geographical influence and temporal effect into matrix factorization. We modeled user s preference on a POI as the summation over his geographical preference for the POI and his general interest in the POI. User s geographical preference is captured by leveraging multiple geographical features. Meanwhile user s general preference is decomposed into a static one and a dynamic one. Specifically, we analogized a checkin event as a term and one user s checkins as a document, and employed topic modeling to model all observed checkins. As the checkin event includes the checkin time and each topic is characterized by a distribution over checkin events, the learned topic distribution naturally captures the dynamics of user s general interest. To this end, a specific model namely was proposed. Finally, the experimental results on real-world data sets demonstrate the improvements of our proposed model. 6 Acknowledgements This research was supported in part by NIH (1R21AA23975-1), NSFC (7157193, 71372188, 6157232), and National Center for International Joint Research on E-Business Information Processing (213B135). References [1] J. Bao, Y. Zheng, and M. F. Mokbel. Location-based and preference-aware recommendation using sparse geo-social networking data. In SIGSPATIAL, pages 199 28, 212. [2] C. Cheng, H. Yang, I. King, and M. R. Lyu. Fused matrix factorization with geographical and social influence in locationbased social networks. In AAAI, 212. [3] C. Cheng, H. Yang, M. R. Lyu, and I. King. Where you like to go next: Successive point-of-interest recommendation. In IJCAI, pages 265 2611, 213. [4] Z. Cheng, J. Caverlee, K. Lee, and D. Z. Sui. Exploring millions of footprints in location sharing services. In ICWSM, 211. [5] E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: User movement in location-based social networks. In SIGKDD, pages 182 19, 211. [6] H. Gao, J. Tang, X. Hu, and H. Liu. Exploring temporal effects for location recommendation on location-based social networks. In RecSys, 213. [7] H. Gao, J. Tang, X. Hu, and H. Liu. Modeling temporal effects of human mobile behavior on location-based social networks. In CIKM, 213. [8] S. Hanhuai and B. Arindam. Generalized probabilistic matrix factorizations for collaborative filtering. In ICDM, 21. [9] G. Huiji and L. Huan. Location-based social network data repository, 214. [1] Y. Koren. Collaborative filtering with temporal dynamics. In KDD, pages 447 456, 29. [11] H. Li, R. Hong, S. Zhu, and Y. Ge. Point-of-interest recommender systems: A separate-space perspective. In ICDM, pages 231 24, 215. [12] X. Li, G. Cong, X.-L. Li, T.-A. N. Pham, and S. Krishnaswamy. Rank-geofm: A ranking based geographical factorization method for point of interest recommendation. In SIGIR, 215. [13] D. Lian, C. Zhao, X. Xie, G. Sun, E. Chen, and Y. Rui. Geomf: Joint geographical modeling and matrix factorization for pointof-interest recommendation. In KDD, pages 831 84, 214. [14] M. Lichman and P. Smyth. Modeling human location data with mixture of kernel densities. In KDD, 214. [15] B. Liu, Y. Fu, Z. Yao, and H. Xiong. Learning geographical preferences for point-of-interest recommendation. In KDD, 213. [16] Y. Liu, W. Wei, A. Sun, and C. Miao. Exploiting geographical neighborhood characteristics for location recommendation. In CIKM, pages 739 748, 214. [17] QuanYuan, G. Cong, and A. Sun. Graph-based point-of-interest recommendation with geographical and temporal influences. In CIKM, 214. [18] S. Ruslan and M. Andriy. Bayesian probabilistic matrix factorization using markov chain monte carlo. In ICML, 28. [19] R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In NIPS, pages 1257 1264, 27. [2] M. N. Schmidt, O. Winther, and L. K. Hansen. Bayesian nonnegative matrix factorization. In ICASS, 29. [21] C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific articles. In KDD, 211. [22] M. Ye, K. Janowicz, C. Mlligann, and W.-C. Lee. What you are is when you are: The temporal dimension of feature types in location-based social networks. In GIS, pages 12 111, 211. [23] M. Ye, P. Yin, W.-C. Lee, and D.-L. Lee. Exploiting geographical influence for collaborative point-of-interest recommendation. In SIGIR, 211. [24] H. Yin, Y. Sun, B. Cui, Z. Hu, and L. Chen. Lcars: A locationcontent-aware recommender system. In KDD, 213. [25] Q. Yuan, G. Cong, Z. Ma, A. Sun, and N. Magnenat-Thalmann. Time-aware point-of-interest recommendation. In SIGIR, 213. [26] J. Zhang and C. Chow. igslr: Personalized geo-social location recommendation - a kernel density estimation approach. In SIGSPATIAL, pages 334 343, 213. [27] Y. Zheng and X. Xie. Learning travel recommendations from user-generated gps traces. In TIST, 211. 125