A Modified PMF Model Incorporating Implicit Item Associations

A Modified PMF Model Incorporating Implicit Item Associations Qiang Liu Institute of Artificial Intelligence College of Computer Science Zhejiang University Hangzhou 31007, China Email: 01dtd@gmail.com Chengwei Wang Institute of Artificial Intelligence College of Computer Science Zhejiang University Hangzhou 31007, China Email: packywang@gmail.com Congfu Xu Institute of Artificial Intelligence College of Computer Science Zhejiang University Hangzhou 31007, China Email: xucongfu@zju.edu.cn Abstract As a state-of-the-art recommendation technique, collaborative filtering (CF) methods compute recommendations by leveraging a historical data set of users ratings for items. So far, the best performing CF methods are latent factor models. Probabilistic matrix factorization (PMF) model, as a widely used latent factor model, offers a probabilistic foundation for regularization. In this paper, we present a novel CF method by incorporating implicit relationship between items into the basic PMF model. Firstly we mine the implicit correlation between items based on a matrix factorization model by utilizing contextual information, and then generalize recommendations by incorporating the obtained item relationship into the basic PMF model. We validate our approach on two datasets, and the experimental results show that the proposed method outperforms several existing CF models. Keywords-Recommender Systems; Collaborative filtering; Probabilistic matrix factorization; Contextual information; I. INTRODUCTION Recommender systems aim at suggesting information items (books, movies, music, etc.) that are likely to suit a user s taste. As the most popular methods in recommender systems, collaborative filtering (CF) approaches make recommendations based on the ratings by a set of users whose rating profiles are most similar to that of the active user. Traditional CF methods suffer from limitations in many cases, as they take into account only the binary events of user actions, and generate recommendations by purely mining the user-item rating matrix. In order to provide more personalized and accurate recommendations to users, researchers start to employ additional information such as contextual information and social trust information to improve standard CF models, and thus the context-aware and trust-aware recommender systems have been broadly studied. However these methods suffer from several inherent weaknesses. (1) As to the context-aware methods, the first challenge is data sparsity; with the introduction of context, the transaction information to be processed becomes high-dimensional, and thus the data sparsity problem is aggravated. The second challenge is the computational complexity problem; to process high-dimensional data, context-aware methods, such as tensor-factorization-based model, often suffer from high computational complexity. () As regards the trustaware methods, the acquisition of social information is a challenge. In most traditional e-commerce websites, it is difficult to obtain explicit user social information, thus the trust-aware methods can be hardly implemented. Moreover, incorporating user social information would significantly increase the computational complexity, as the number of users in most applications is huge. In this paper, we present a modified PMF model which incorporates implicit contextualized item associations. Our work is based on the initiations that (1) In a recommendation system, the correlation between two items can be quite different in different contexts. () Contextualized item profile can be characterized by latent features extracted by factorizing the item-context rating matrix. (3) By measuring the similarity between contextualized item profiles, we can establish the implicit association between items. (4) The implicit association can be incorporated into standard PMF model to generate more accurate recommendations. Based on these intuitions, we propose an extensible method to mine implicit contextualized item associations and a modified PMF model to make recommendations. Experimental results shows that our method outperforms state-of-the-art collaborative filtering models. The rest of this paper goes like this: we review related works in Section. Section 3 briefly reviews the the base PMF model. We provide a detailed description of the implicit association mining algorithm and the modified PMF model in section 4. Our experiments are reported in section 5. Finally, we formulate our conclusions and discuss future work in section 6. II. RELATED WORK The matrix factorization (MF) model is currently the best performing method for traditional CF recommendations. MF models map both users and items to a joint latent factor space of dimensionality f, such that user-item interactions are modeled as inner products in that space[1].the singular value decomposition (SVD), is a well-established technique for identifying latent semantic factors, which is applied in recent works about MF [][3], and these models are

learned by fitting the previously observed ratings, while overfitting is avoided by regularizing the learned parameters. In[4], Salakhutdinov and Mnih presented Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations. Based on the intuition that users can be easily influenced by the friends they trust and prefer their friends recommendations, trust-aware[5][6] [7] CF methods are designed to exploit user social information to improve the recommendation accuracy. However, as mentioned, the trust-aware methods can be hardly implemented in most traditional e-commerce websites, as these websites have no explicit social network and trust information is difficult to obtain. Moreover, as the number of users in most applications is typically huge, incorporating user social information would significantly increase the computational complexity. Other researchers tried to utilize the contextual information to improve recommendation accuracy. The so called context-aware methods [8][9] [10] are based on the intuition that although users general interests can be relatively stable, the exact evaluation of an item can be influenced by many additional and varying contextual factors. Although the context-aware CF methods can provide more personalized and accurate recommendations in many situations, they suffer from two drawbacks: (1) The introduction of additional contextual dimensions exacerbate data sparsity problem. () To deal with high-dimensional data, context-aware models typically suffer from high computational complexity. III. PRELIMINARY Typically in a collaborative filtering scenario, we have a set of users U = {u 1, u,...u m } and a set of items I = {i 1, i,...i n }, and the user-item rating matrix R = [R ui ] m n whose rows are the lists of m users and columns are the lists of n items, with the value r ui denotes the rating of item i provided by user u. In Probabilistic Matrix Factorization(PMF), the conditional distribution over the observed ratings is defined as follows: m p(r U, V, σr) = [N (R ui Uu T V i, σr)] IR ui (1) where N (x µ, σ ) is the probability density function of Gaussian distribution with mean µ and variance σ, Iui R is the indicator function that is equal to 1 if user u rated item i and equal to 0 otherwise. The zero-mean spherical Gaussian priors are also placed on user and item feature vectors: m p(u σu ) = N (U u 0, σu I), p(v σv ) = N (V i 0, σv I) IV. OUR MODEL A. Mining contextualized relationship between items In the field of recommender systems, context can be any information that is relevant to the interaction between a () user and an item. In this section, we present an algorithm for mining the implicit association between items. This algorithm is based on the intuition that the relevance of two items can be different under different context. For example, when considering movie genre context, Love Letter and Titanic 3D may have relatively high correlation degree, as they are both romance movie. While if time context is taken into account, the latter may have closer relationship with Wrath of the Titans as both of them are recently released at cinemas. Assume C = {C 1,..., C k,..., C K } is the set of contexts taken into consideration, the size of C is denoted as K = C, and the instance space size of C is D = K C k (3) k=1 For example, in a movie recommender scenario, if C = {C 1 = T ime, C = MovieGenre, C 3 = Mood }, the value space of C 1, C and C 3 are {January,..., December}, {Action, Romance, Comedy} and {Happy, Depression, Calm} respectively, then K = 3, D = 108, and a particular context may look like: c = {c 1 = { March }, c = { Action, comedy }, c 3 = { Happy }}. Assume that the item-context rating matrix is R R I D, then the element on ith row and cth column can be denoted as r ic which represents the rating for item i under context c.we approximate the matrix R as the sum product of series of matrices: ˆR = P C k Q C k (4) C k C where P C k R I F,Q C k R F C k and F is the number of latent factors. Each context C k corresponds to such two latent factor matrices, and for simplicity we use the same F value for all considered contexts. More specifically we use the following formula to predict r ic : ˆr ic = b i + (P C k i 1 Q Ck j ) (5) c k j c k C k C where b i is the bias for item i, and P C k i, Q C k j are latent factor vectors. Note that c k is a set, as a particular context variable C k is multi-valued. For example, a movie may belong to more than one genre, if context C k = MovieGenre, then a specific value of C k may be c k = { Romance, Comedy }. By minimizing the following objective function: L = 1 ˆric r ic + λ 1 b + λ ( P Ck F ro + Q Ck F ro) C k (6)

we obtain matrices P C1,..., P C k,..., P C K. Further we use the ith rows of these matrices to form a profile for item i: P rof(i) = (P C1 i,..., P C k i,..., P C K i ) (7) where P C k i is row vector, P rof(i) R 1 L and L = C F. Thus the contextualized correlation between item i and item j can be defined as: S ij = cos(p rof(i), P rof(j)) (8) the physical meaning of S ij can be interpreted as the degree of similarity between item i and item j taking into consideration the contextual factors in C. Details of the mining process is shown in algorithm 1. Algorithm 1 Mining Implicit Item Associations Require: Rating matrix R, Latent Feature Size F, Learning Rate η, Regularization Coefficient λ 1, λ Ensure: Association matrix S 1: Initializing matrices P C k, Q C k (k = 1,,...,K) and bias vector b with small random number. Initializing association matrix S = 0. : while error on validation set decrease do 3: for all i, c R do 4: ˆr ic = b i + (P C k C k C 5: e = r ic ˆr ic ; 6: b i + = η (e λ 1 b i ); 7: for k = 1,,..., K do 8: for each j c k do 9: for f = 1,,..., F do i 1 c k Q C k j ) j c k 10: P C k if + = η (e 1 c k QC k jf λ P C k 11: Q C k jf + = η (e 1 c k P C k if 1: end for 13: end for 14: end for 15: end for 16: end while 17: for each i I do 18: for each j I do 19: P rof(i) = (P C1 i,..., P C k i,..., P C K i ); 0: P rof(j) = (P C1 j,..., P C k j,..., P C K j ); 1: S ij = cos(p rof(i), P rof(j)); : end for 3: end for 4: return S if ); λ Q C k jf ); B. A Modified Probabilistic Matrix Factorization Model In this section we present a modified PMF model by incorporating contextualized implicit item associations obtained via the algorithm discussed in the previous subsection. The corresponding graphical model is shown in figure 1 and the conditional distribution over the observed ratings is the same as the equation 1 formulated in PMF model. Figure 1. A Modified PMF Model In our model, item i s feature vector is the weighted average of the feature vectors of its neighbors. In vector form, we get: j N ˆV i = i V j S ij (9) j N i S ij where N i is the set of item i s neighbors, and S ij is the implicit correlation degree between item i and j obtained from algorithm 1. In our model, the conditional distribution over the observed ratings is the same with that of basic PMF model, see equation (1). Comparing with basic PMF model, the major difference is that an item s latent feature vector is affected by its neighbors in our model. The conditional distribution of item latent features is: p(v S, σ V, σ S) p(v S, σ S) p(v σ V ) (10) where p(v S, σs ) is the conditional distribution of item latent features given the latent features of its neighbors which can be described as: p(v S, σ S) = N (V i j N i V j S ij, σ SI) (11) Hence, the posterior probability of the latent variables U and V can be obtained as: p(u, V R, S, σ R, σ S, σ U, σ V ) p(r U, V, σr)p(u σ U )p(v S, σv, σs) m m = [N (R ui Uu T V i, σr)] IR ui N (U u 0, σu I) N (V i j N i V j S ij, σ SI) N (V i 0, σv I) (1)

The logarithm of the posterior distribution is: ln p(u, V R, S, σr, σs, σu, σv ) = 1 m n σr Iui(R R ui Uu T V i ) 1 m σu Uu T U u 1 n σ Vi T V i V 1 n σs ((V i V j S ij ) T (V i V j S ij )) j N i j N i 1 m ( n Iui)lnσ R R 1 (m f)lnσ U 1 (n f)(lnσ V + lnσ S) + C (13) where C is a constant that does not depend on parameters, and maximizing the log-posterior over two latent features is equivalent to minimizing the following sum of squared errors objective functions with quadratic regularization terms: F = 1 m n I R ui(r ui Uu T V i ) + λ U + λ S m U u F ro + λ n V V i F ro n ((V i V j S ij ) T (V i V j S ij )) j N i j N i (14) where λ U = λ R /λ U, λ V = λ R /λ V, λ S = λ R /λ S, F ro denotes the Frobenius norm. and a local minimum of the objective function given by equation (14) can be found by performing gradient descent on U u and V i : F U u = F V i = n IuiV R i (Uu T V i R ui ) + λ U U u (15) m IuiU R u (Uu T V i R ui ) + λ V V i + λ S (V i j N i V j S ij ) λ S j i N j (V j x N j V x S jx ) (16) the training process of the modified PMF model is described in algorithm. C. Disscussion Our model has the ability to provide context-aware recommendations. Two strategies that can be used in the phase of mining implicit item associations are as follows: (1) Filter: The Filter strategy retains only the most influencing contextual factors while filters out all other contextual factors. For example, if we are convinced that Time and Location Algorithm A Factorization Model Incorporating Implicit Item Associations Require: Rating Matrix R R m n, Latent Feature Size F,Learning Rate η, Regularization Coefficient λ u, λ v, λ s 1: Initializing matrices U,V with small random number : while error on validation set decrease do 3: for each i I do 4: x = (V i j N i V i S ij ); 5: y = j i N j (V j k N j V k S jk ); 6: for each u U u, i R do 7: ˆr ui = U u V i ; 8: e = r ui ˆr ui ; 9: for k = 1,,..., K do 10: U uk = η (e V ik + λ u U uk ); 11: V ik = η (e U uk +λ v V ik +λ s x k λ s y k ); 1: end for 13: end for 14: end for 15: end while have the greatest impact on recommendations, then in the Filter strategy only Time and Location factors are taken into account when mining implicit item associations; () Weight: The Weight strategy takes into account all contextual factors that have an effect on the recommendation results, giving them different weights. Again, if Time and Location are the main factors influencing recommendations, then in the Weight strategy they will get higher weights than other contextual factors. In our method, we establish the implicit correlation between items by leveraging contextual information. Unlike other mainstream context-aware CF model, we do not model context as additional dimensions thus high computational cost and aggravated data sparsity problem are avoided. In addition, our model has good extensibility in terms of context, that is, any contextual factor that affects recommendations can be included into set C. The trust-aware CF models (e.g. the Social MF model) explore the impact of explicit social information on recommendations. By contrast, our model explores the impact of implicit item associations on recommendations. Our method has better expansibility considering that in many application areas (e.g. the e-commerce websites) it is a great challenge to obtain explicit social information while mining the implicit item associations is relatively easier. Note that the proposed implicit correlation mining algorithm can be used as a novel similarity computation method. By utilizing the contextual information, we can mine the implicit associations between either items or users. Comparing with traditional similarity computation methods utilized in memory-based CF, our approach can measure correlation between two entities more accurate and have

better explanation by employing various contextual factors. V. EXPERIMENTS In order to show the performance improvement of our model, we also implement the classic item based KNN[11] method and the baseline MF approach[1] for comparison. A. Datasets 1) The MovieLens Dataset: We use the MovieLens 10M dataset to conduct our experiments. The whole MovieLens dataset contains 10,000,054 ratings applied to 10,681 movies by 71,567 users of the online movie recommender service MovieLens. In our experiments, we sort the ratings made by each user temporally, and pick out the most recent 0 percent to form a validation set. ) The Yahoo! Music Dataset: The dataset 3 comprises 6,810,175 ratings of 64,961 music items by 1,000,990 users collected during 1999-010. In order to conduct our experiments, we sampled 46,870,551 ratings applied to 8,76 albums by 518,819 users from the original train data set. And we further split the obtained subset into local training and test set, as was done to the MovieLens dataset. 3) Contextualized Rating Aggregation: In our experiments, we take into account two types of contextual factors namely Time and Genre. Thus the set C = {C 1 = T ime, C = Genre }. To mine the implicit item associations, we need firstly aggregate the original rating matrix to form the item-context rating matrix R. We consider month-level time information, for example, the year interval of the Movielens dataset is 15(form 1995 to 009), and each year has 1 moths, thus the size of C 1 s value space is C 1 = 180. Both of the two datasets provide item genre information. In the Movielens dataset, a movie can has one or more genres, and the size of the genre value space is C = 18. While the Yahoo! Music dataset has a relatively larger genre value space C = 99. Details about the value space corresponding to each context is shown in table I. Table I DATASET STATISTICS Statistics MovieLens Yahoo!Music Time C 1 180 10 Genre C 18 99 Each rating element r ic in R is obtained by calculating the mean rating for item i under context c. For example, if c = {{1996.09}, {Action, Adventure}}, then the rating r ic is the mean rating for item i in September, 1996. http://www.grouplens.org/node/73 3 http://kddcup.yahoo.com/ B. Experimental Results In this section, we report the experimental results of our approach and make some comparisons with other CF methods. In order to make the results reproducible, we provide detailed parameters. When evaluating the approach on MovieLens dataset, we used the following values for the meta parameters: η = 0.001, λ U = λ V = 0.0, λ S = 0.00. Table II summarizes the experiment performance over the MovieLens dataset for different methods. Prediction accuracy is measured by RMSE for varying value of factor number F. Note that parameter Table II PERFORMANCE COMPARISON ON MOVIELENS DATASET Method F = 0 F = 50 F = 100 F = 00 KNN 0.918 0.918 0.918 0.918 SVD 0.816 0.81 0.810 0.809 Our Model 0.803 0.799 0.798 0.797 F doesn t exist in the knn model; when the neighbor size k is set to 0, the item-based knn approach gets its best result. As the table shows, our model achieves better RMSE performance than other two CF methods. When F = 0, our model reduces RMSE by 1.6% compared with the knn model, and by 1.60% compared with the SVD model. When F = 00, our method reduces RMSE by 13.% compared with the knn model, and by 1.49% compared with the SVD model. Table III reports the experimental results on the Table III PERFORMANCE COMPARISON ON YAHOO! MUSIC DATASET Method F = 0 F = 50 F = 100 F = 00 KNN 7.793 7.793 7.793 7.793 SVD 7.677 7.35 7.154 7.01 Our Model 5.894 5.759 5.715 5.687 Yahoo! Music dataset. We set η = 0.0001, λ U = λ V = 0.8, λ S = 0.0 as the default parameters. On Yahoo! Music Dataset, when the neighbor size k is set to 10, the knn method gets its best result. When F = 0, our model reduce RMSE by 6.84% compared with the knn model, and by 6.45% compared with the SVD model. When F = 00, we reduce RMSE by 7.58% compared with the knn model, and by 4.94% compared with the SVD model. Figure and figure 3 describe the convergence process of our model and provide more detailed comparisons between our method and other two CF methods. Similar to other factorization models(e.g. the SVD model), a larger latent factor size F tends to make the results of our model better. With the same F value, our model get better results than the SVD model, which indicates that the introduction of implicit item relationship helps our model to capture the latent characteristics of the items more accurately.

Figure. RMSE 0.9 0.9 0.88 0.86 0.84 0.8 0.8 knn (k = 0) F = 0 F = 00 Our Model (F = 0) Our Model (F = 00) MF (F = 00) knn (k = 0) MF (F = 00) 0.78 50 100 150 00 50 epochs Our approach with different values of F on Movielens dataset () Explore the effectiveness of our model on different types of users. (3) Apply the proposed implicit association mining algorithm in other scenarios (e.g. memory-based CF methods) where measuring the similarity between two entities is needed. (4) Validate our approach in practical applications. ACKNOWLEDGMENT This research is supported by the Natural Science Foundations of China (No. 60970081) and the National Basic Research Program of China (No. 010CB37903). REFERENCES [1] Y. Koren, R. Bell, and C. Volinsky, Matrix factorization techniques for recommender systems, Computer, vol. 4, no. 8, pp. 30 37, 009. 8.5 8 knn (k = 10) Our Model (F = 0) Our Model (F = 00) MF (F = 00) knn (k = 10) [] A. Paterek, Improving regularized singular value decomposition for collaborative filtering, in Proceedings of KDD Cup and Workshop, vol. 007, 007, pp. 5 8. RMSE 7.5 7 6.5 F = 0 MF (F = 00) [3] Y. Koren, Factorization meets the neighborhood: a multifaceted collaborative filtering model, in Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 008, pp. 46 434. Figure 3. 6 F = 00 5.5 40 60 80 100 10 140 epochs Our approach with different values of F on Yahoo! Music dataset [4] R. Salakhutdinov and A. Mnih, Probabilistic matrix factorization, Advances in neural information processing systems, vol. 0, pp. 157 164, 008. [5] H. Ma, I. King, and M. Lyu, Learning to recommend with social trust ensemble, in Proceedings of the 3nd international ACM SIGIR conference on Research and development in information retrieval. ACM, 009, pp. 03 10. VI. CONCLUSION AND FUTURE WORK In this work, we proposed a CF method aiming at providing more accurate recommendations. Our model mainly consists of two parts: (1) An extensible method to mine implicit contextualized item correlations. () An improved PMF model to make personalized recommendations. We validate our approach on two real datasets and the results demonstrate our method outperforms several existing CF models. Comparing with the trust-aware CF model and the contextaware CF model, our method has better expansibility: (1) Mining the implicit item associations is relatively easier than obtaining social relations or trust information because many application areas (e.g. the e-commerce websites) even have no explicit social networks. () Any contextual factor (time, weather, mood, etc.) that has an impact on recommendation can be included into the mining process to affect the item associations, and unlike other mainstream context-aware CF model, we do not model context as additional dimensions thus we avoid high computational cost and do not have to face aggravated data sparsity problem. Future work can be carried out from the following aspects: (1) Further explore how different contextual factors affect the performance of our model, and propose a selection standard. [6] H. Ma, M. Lyu, and I. King, Learning to recommend with trust and distrust relationships, in Proceedings of the third ACM conference on Recommender systems. ACM, 009, pp. 189 196. [7] M. Jamali and M. Ester, A transitivity aware matrix factorization model for recommendation in social networks, in Twenty-Second International Joint Conference on Artificial Intelligence, 011. [8] G. Adomavicius and A. Tuzhilin, Context-aware recommender systems, Recommender Systems Handbook, pp. 17 53, 011. [9] A. Karatzoglou, X. Amatriain, L. Baltrunas, and N. Oliver, Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering, in Proceedings of the fourth ACM conference on Recommender systems. ACM, 010, pp. 79 86. [10] U. Panniello and M. Gorgoglione, A contextual modeling approach to context-aware recommender systems, in Workshop on Context-Aware Recommender Systems (CARS-011), Chicago, IL (USA), October, vol. 3, 011. [11] G. Linden, B. Smith, and J. York, Amazon. com recommendations: Item-to-item collaborative filtering, Internet Computing, IEEE, vol. 7, no. 1, pp. 76 80, 003.