Constrained Preference Embedding for Item Recommendation

Size: px

Start display at page:

Download "Constrained Preference Embedding for Item Recommendation"

Jacob Ellis
6 years ago
Views:

1 Proceedings o the Twenty-Fith International Joint Conerence on Artiicial Intelligence (IJCAI-6) Constrained Preerence Embedding or Item Recommendation in Wang, Congu u, Yunhui Guo, Hui Qian College o Computer Science and Technology, Zhejiang University, China {cswangxinm, xucongu, gyhui, qianhui}@zju.edu.cn Abstract To learn users preerence, their eedback inormation is commonly modeled as scalars and integrated into matrix actorization (MF) based algorithms. Based on MF techniques, the preerence degree is computed by the product o user and item vectors, which is also represented by scalars. On the contrary, in this paper, we express users eedback as constrained vectors, and call the idea constrained preerence embedding (CPE); it means that we regard users, items and all users behavior as vectors. We ind that this viewpoint is more lexible and powerul than traditional MF or item recommendation. For example, by the proposed assumption, users heterogeneous actions can be coherently mined because all entities and actions can be transerred to a space o the same dimension. In addition, CPE is able to model the eedback o uncertain preerence degree. To test our assumption, we propose two models called CPE-s and CPE-ps based on CPE or item recommendation, and show that the popular pair-wise ranking model -MF can be deduced by some restrictions and variations on CPE-s. In the experiments, we will test CPE and the proposed algorithms, and prove their eectiveness. Introduction How to represent customers behavior is an important aspect or designing item recommendation algorithms. Unortunately, there are no general and ideal solutions or dierent application scenarios so ar. As or modeling users explicit eedback such as rating scores, a successul assumption is to represent them as dierent integers. The primarily methods or leveraging them are matrix actorization (MF) techniques, according to which, users and items are represented by lowrank latent actors (i.e., numeric vectors), and preerence degree is computed by the product o related vectors. However, absolutely correlating scalars with users eedback may lead to some problems. For example, although the rating scores are uniormly distributed, the preerence degree may not be Corresponding author liner. As we know, users attitudes tend to be ollowing a long-tail distribution, which means most users preer giving 3, 4 and 5 stars, and hence the dierence between 3 stars and star should be more obvious than the dierence between 5 stars and 3 stars. What s more, in applications, users behavior is not limited to rating scores, but can be heterogeneous actions. For example, a user may give some tags to a avorite book, add a pair o shoes into a shopping cart and visit some pages about children s clothing. Traditional MF based approaches may ace two problems in this situation. First, it is diicult to ascertain the preerence degree o those actions. Dierent rom modeling rating scores, translating giving some tags and adding an item into a shopping cart into real numbers is a diicult task because we cannot exactly assign some values to the preerence degree. The situation is even worse when we are not sure whether one type o behavior is more positive than another one. Most MF based algorithms ignore those problems and directly express all the heterogeneous eedback as integer and the unobserved correlations as, and hence ail to capture dierentiated inormation rom each type o eedback. Second, or most MF based item recommendation algorithms, dierent kinds o preerence inormation is inally translated into the same user and item latent space, hence the value in each dimension is hard to explain. Simply embedding heterogeneous inormation into two types o vectors is inadequate and inlexible when we import more and more kinds o inormation rom e-commerce sites into recommendation algorithms. To deal with the discussed problems, in this paper, we introduce a novel method called constrained preerence embedding (CPE) to model users behavior. For CPE, we no longer regard the behavior inormation as numerical values, but embed them in a high-dimensional space together with users and items. In other words, all entities and eedback are represented by d dimensional vectors, e.g., the rating scores rom one to ive are expressed as ive vectors. Then, a modiied add approximation is employed to model (user, item) correlations. This process is similar to some knowledge relationship mining methods or relationship discovery[bordes et al., 23; Chen et al., 23], but we emphasize on ine grained actions with degree inormation. For CPE, the L 2 -norm o the eedback vectors is used to represent preerence degree in a relative way, and hence it avoids correlating them with explicit 239

2 numbers. Another advantage o the proposed idea is that, users behavior can share an isomorphic structure and be congruently modeled by a uniied method. Because we express preerence degree in a relative way, each type o behavior will be assigned an auto-adjusted satisaction value according with datasets rather than an absolute number. In addition, by embedding users eedback, CPE can translate dierent kinds o behavior inormation to eedback vectors instead o restricting it in user and item space. In section 2, we start by discussing matrix actorization (MF) techniques and talk about some related MF based item recommendation approaches. Then, we introduce our constrained preerence embedding (CPE) method in section 3. Based on CPE, we propose two item recommendation algorithms CPE-s and CPE-ps in section 4. In section5, we test CPE s perormance on real-world datasets. Finally, we draw conclusion in section 6. 2 Background 2. Matrix Factorization Based on the low-rank assumptions, the matrix actorization methods[koren et al., 29; Koren, 28] represent each user and item as a d dimensional vector. The preerence o user u or item i is represented as a scalar r u,i 2 R. I we denote the vector o u as v u 2 R d, and the vector o i as v i 2 R d,, the preerence o u or i can be predicted by vu T v i. In order to learn the actors, a probabilistic version o MF (PMF)[Salakhutdinov and Mnih, 28] assumes r u,i to be sampled rom Gaussian distribution with the mode o vu T v i, and tries to optimize the maximum posterior on all observed (u, r u,i,i) triplets. 2.2 Preerence Ranking To learn a ranked list o items, some related point-wise, pairwise and list-wise preerence ranking algorithms have been proposed based on matrix actorization assumptions. For example, imf[hu et al., 28; Lin et al., 24] and OCCF[Pan et al., 28] consider both observed and unobserved (user, item) correlations in a point-wise way and try to optimize the ollowing loss unction: (vu T v i r u,i) 2 + () (u,i)2d [D where D and D are observed set and unobserved set, and is the regularization term u v u i v i 2 2. The MF based pair-wise methods are similar to imf or OCCF, but they adopt dierent preerence comparing structures. For example, the state-o-the-art pair-wise algorithm -MF[Rendle et al., 29; Pan and Chen, 23] assumes user u may preer an observed item i than an unobserved item j, and directly optimizes their relationships by logistic regression with the parameter (vu T v i vu T v j ). Finally, we maximize the ollowing equation: ln (u,i)2d,(u,j)2d (vu T v i vu T v j)+ (2) For MF based list-wise algorithms, each action is compared with a list o actions. For example, CoiRank[Weimer User User 2 User 3 User 4 User 5 Users Items Scores Rating matrix Illustration o CPE: 5 2 Item Item 2 Item 3 Item 4 Item 5 Latent actors Traditional MF express users and items as vectors and predict the rating scores by the product o users and items vectors: T v u v i r u, i Rating scores are scalar values CPE regard rating scores as vectors, and directly mine the three types o vectors in the high dimensinal space by add approximation approach and preerence constraint assumption: vu + v vi Rating scores are vectors v 2 > v 2 > v 2 > v 2 > v 2 { V, V,..., V 2 u u u n V, V,..., V 2 m i i i V, V,..., V { v u v 5 v i5 d = 3 v v i2 Figure : Illustration o CPE. et al., 27] is proposed to directly optimize NDCG[Valizadegan et al., 29] scores based on maximum margin matrix actorization models, and ListRank[Cao et al., 27; Shi et al., 2] tries to optimize a unction o cross entropy o item lists based on PMF. As we can see rom the above loss unctions, those MF based algorithms are powerul or modeling isomorphic eedback with assigned preerence degree inormation, but some o them may be inlexible and ineective when the ollowing situations occur: () It is diicult to determine the preerence degree p() o eedback (e.g., {p( a )=?,p( b )=?,p( c )=?}). (2) We are only given the relative positiveness o some types o eedback(e.g., {p( a ) <p( b ),p( c ) <p( d )}). (3) We try to leverage heterogeneous eedback and combine them in a single model (e.g., {p( a )=,p( b )= 2,p( c )=?,p( d ) <p( e ),p( g )=?,p( h )=?,...}). where, p( ) denotes preerence degree. Note that, here is a type o eedback, e.g., giving 2 stars, click or browse. For example, i we denote giving two stars as eedback b, then p( b ) may be 2, i.e., p( b )=r i,j i r i,j =2. 3 Constrained Preerence Embedding In this section, we introduce our proposed method constrained preerence embedding (CPE) and discuss some o its characteristics. We denote a kind o eedback as v 2 R d,. Thereore, the rating scores {, 2, 3, 4, 5} are represented as {v ( star), v (2 stars), v (3 stars), v (4 stars), v (5 stars) }, and users preerence can be optimized by the ollowing unction: min L u2u,2f,i2i (v u, v, v i ) (3) where U, I and F are user, item and eedback set, v u and v i are user vector and item vector. In this paper, we assume users, items and preerence obey add approximation rule, which means v u + v should be close to v i i u gives to i.the idea is illustrated in Figure. Based on the rating matrix and add approximation, we should optimize v u + v star! v i2, v u + v 5 stars! v i5, 24

3 v u3 + v star! v i3 and v u4 + v 2 stars! v i5, where! indicates approximates. For item recommendation, we also need to introduce collaborative iltering[sarwar et al., 2] eatures and preerence degree inormation by giving some constraints on users eedback. We assume that i u has strong preerence or i, v u and v i should be close; i u has similar preerence or i and i 2, v i and v i2 should be close; i u and u 2 have similar preerence or i, v u and v u2 should be close. To achieve this, we control the L 2 -norm (i.e., vector norm) o v to make sure that it is smaller i is more positive. Because the square root o L 2 -norm o v (i.e., v 2 ) is the Euclidean distance, based on the ormer rules, the stronger the preerence o u or i, the shorter the distance between v u and v i. It means that the correlated (user, item), (user, user) or (item, item) should have similar direction and length in terms o their vectors. 4 CPE or Item Recommendation In this section, we propose two item recommendation methods called CPE-s and CPE-ps based on our CPE assumption. For CPE-s, we adopt add approximation rule and pair-wise preerence comparison strategy, and optimize a loss unction with sot constraints. CPE-ps is similar to CPE-s, but it is based on vector projection. In this paper, we primarily consider rating scores and unobserved (user, item) correlations (denoted as unobserved eedback ) or studying our models. 4. CPE-s We assume that i user u has a strong preerence or item i, she will give it a higher rating score. Thereore, our task is to minimize the dierence between v u + v and v i or each triplet (u,, i). Given users rating matrix, the loss unction on the overall triplets is as ollows: L = d(v u + v v i) 2 + (u,,i)2d (4) s.t. {v p 2 2 < v q 2 2 p>q}, p, q 2 F where (u,, i) 2 D denotes the observed and unobserved triplets; d( ) is Euclidean distance; controls the norm o the parameters. For the rating scores, v k is the vector o giving k star(s) action and comes rom the set {v k k =, 2, 3, 4, 5}, and the unobserved eedback is denoted as v. For our model, p is assumed to be more positive than q or all p>q, which means the correlation o observed triplets are stronger than the unobserved ones. All the vectors are randomly constructed. The possible v u, v i and v in the above loss unction are optimized with L 2 - regularization terms. This constraint is important or CPE-s because it prevents the learning algorithm to trivially minimize the optimization unction by artiicially increasing the norms o v u, v i and v. Instead o directly optimizing the loss unction with constraints on preerence, we convert them to the ollowing sot unconstrained unction: L = d(v u + v v i) 2 + (u,,i)2d p>q w p, q e ln C(v q 2 2 v p 2 2) (5) where p and q belong to F ; e is hyperparameter; w p, q denotes the weight o the preerence comparison between p and q ; it is computed by the product o p s and q s requency in the triplets, and guarantees that the eedback priority can be adjusted according to the datasets instead o some predetermined values. Here C(x) can be some monotonic increasing unctions. We here adopt the sigmoid unction. 4.2 CPE-ps For CPE-s, i both u and u 2 give i the same rating score, v u and v u2 should be very similar, which means CPE-s may suppress exploiting personalized inormation. The reason leading to the consequence is that or the elements in F (i.e., { k k =,, 2, 3, 4, 5}), we keep the same user and item vectors. To overcome the shortcomings, we adopt a projection method inspired by [Wang et al., 24b] to model vector relationships. We create a plane P with the normal vector w or each type o eedback v. Then, or each triplet (v u, v, v i ), we project v u and v i to P, denoting the projected vectors as v?,u and v?,i respectively. Finally, we optimize (v?,u, v, v?,i ) similar to CPE-s, where v?,u and v?,i are computed according to the ollowing equations: v?,u = v u w T v u w (6) v?,i = v i w T v i w Thereore, the loss unction can be described as d(v?,u + v v?,i ) and is illustrated in Figure 2(a). The L 2 -norm o w is restricted to to control v?,u and v?,i. Based on our CPE assumption, the loss unction is L = d(v?,u + v v?,i ) 2 s.t. 8 >< >: (u,,i)2d {v p 2 2 < v q 2 2 p>q}, p, q 2 F w 2 2 =, w T v /v 2 apple, 2 F 2 F where d( ) is the Euclidean distance; w T v /v 2 apple guarantees that v is in the translated plane. Similar to CPEs, we do not directly optimize the above unction but convert it to the ollowing sot unconstrained loss unction: L = (u,,i)2d w p, q e d(v?,u + v ln C(v q 2 2 p>q v?,i ) 2 + w e v p 2 2)+ (w T v ) 2 2F v 2 2 where k w 2 2 is constrained to in the learning procedure; w is the weight o. CPE-ps is better than CPE-s because it can help exploit more personalized inormation. For example, v u and v u2 can be dierent even i v?,u equals to v?,u2. That is, users preerence can be shared through projected correlations, while some personalized inormation can be remained. 4.3 Learning and Item Recommendation The proposed algorithms are carried out by stochastic gradient descent (SGD) to optimize v u, v and v i. Speciically, (7) (8) 24

4 v u d = 3 d = 2 v i5 v 5 v i2 v v u v i5 v a v e v i2 =? ( a ) ( b) Figure 2: Illustration o CPE-ps and SCPE. or preerence embedding and positiveness learning, in each iteration, we randomly sample t triplets according to the distribution o each eedback in D and t 2 unobserved eedback, and optimize them according to E.q.(5) and E.q.(8) until they converge to a stable state. Note that or item recommendation in the experiments, we directly update the related v u and v i on ln C(v q 2 2 v p 2 2) part or better learning eiciency and perormance. The orm can be expressed by E.q.(3), and we will discuss it later. The u s preerence or i is computed according to E.q.(9) or CPE-s, and E.q.() or CPE-ps, and the top k items with the greatest p u,i are recommended to u. p u,i =/(v i v u ) (9) p u,i =/(v?,i v?,u ) () where p denotes preerence degree. 4.4 Discussion In this section, we discuss some eatures o CPE and explain why they are lexible or preerence learning. Semi-constrained preerence embedding (SCPE). An advantage o CPE is that it can model users heterogeneous actions in a uniied way. In applications, although the ratings can be directly modeled by the scores, it is hard to speciy some types o implicit eedback (e.g., click and browse ) to a certain preerence degree, because we are not sure, or example, whether click and browse are more positive than giving 2 stars. The MF based algorithms cannot directly model the latter inormation well; a compromise is to represent all types o users behavior as implicit eedback, and correlate them with (e.g., -MF). Those approaches may have some shortcomings because the diversity o behavior inormation is lost. On the contrary, a modiied CPE method called semi-constrained preerence embedding (SCPE) may provide a possible ine grained solution. Speciically, or SCPE, all the elements in F are considered in the add approximation part, but the L 2 -norm o the unknown eedback is not constrained in the learning process. For example, given the set { k k = a, b, c, d, e} and the experience that p( a ) >p( b ), p( c ) >p( d ), and p( e )=?, the loss unction can be min L(v u, v i, v a, v b, v c, v d, v e ) C (v s.t. b 2 v a 2 2) C 2 (v d 2 2 v c 2 2) () where e is modeled in L( ) to help correlate (user, item) pairs and predict users potential preerence, which is illustrated in Figure 2(b). We also ind that besides beneiting item recommendation, SCPE can also help learn preerence degree o e by the help o i2a,b,c,d. Thereore, besides or item recommendation, SCPE may also be used to study whether a user s behavior is positive or not according to some assistant behavior. We will study it in the next section. Relation between CPE-s and -MF. As we discussed above, -MF is an eective pair-wise ranking algorithm or item recommendation, and its loss unction is represented in E.q.(2). For our CPE-s, i we assume C( ) in E.q.(5) as logistic regression, and denote p as positive eedback and n as negative eedback, the main part o the sot preerence restriction can be expressed as ln (v n 2 v p 2 ) (2) With pair-wise CPE assumption, or two triplets (u, p,i) and (u, n,j), the distance between v u and v i should be shorter than the distance between v u and v j. Based on add approximation, we replace v p with v i v u and v n with v j v u, and constrain L 2 -norm o item vectors to a constant c. Hence, our task is to maximize the ollowing unction: ln (vj v u 2 2 v i v u 2 2)+ = ln (2[v T u v i v T u v j]+v i 2 2 v j 2 2)+ = ln (2[v T u v i v T u v j]) + s.t. v i 2 = v j 2 = c (3) As we can see rom E.q.(3), by some restrictions and variations, we can deduce -MF with a constraint o item L 2 - norm by pair-wise CPE. This reveals the potential relation between CPE-s and -MF, and also shows why it is reasonable to express p u,i as /(v i v u ). 5 Experiments 5. Datasets and Evaluation Metrics In this section, we study CPE on some real-world datasets in dierent domains and categories. The number o users, the number o items and the sparsity inormation is listed in Table. The irst 5 datasets are rom Amazon.com[McAuley and Leskovec, 23]; the movies, books and music datasets are rom DouBan.com[Wang et al., 24a]. Each dataset is subdivided into three parts; 8% o it is used or training, % is used or validation and the last % is let or test. The evaluation metrics we used are NDCG[Valizadegan et al., 29],Precision, and F[Wang et al., 24a] scores. 5.2 Baselines and Parameter Settings To comprehensively study our proposed models, we compare them with the ollowing state-o-the-art ranking methods: 242

5 Table : Datasets used in the experiments. dataset #users #items sparsity beauty (Amazon) 67,725 29, tools&games (Amazon) 283,54 5, clothing&accessories(amazon) 28,794 66, shoes (Amazon) 73,59 48, industrial&scientiic (Amazon) 29,59 22, movies (Douban) 5,664, books (Douban),24, music (Douban),28, beauty shoes.2. tools movies clothing... books point-wise: imf (explicit, implicit) pair-wise: (implicit), G (implicit) list-wise: ListRank (explicit), CoiRank (explicit) The baselines are careully chosen to make sure that each algorithm is typical in each discussed class. ListRank [Cao et al., 27; Shi et al., 2] and CoiRank [Weimer et al., 27] are mainly or optimizing observed (user, item) correlations, while imf [Hu et al., 28; Lin et al., 24], [Rendle et al., 29] and G [Pan and Chen, 23] consider both observed and unobserved correlations. Some details o those methods are discussed in the previous sections. For all the approaches, the learning rate is set to.5, and the latent dimension is set to (i.e., d = ). The regularization coeicient is selected rom {,.,.,.}; t = t 2 =2; Because the adopted weighted sampling method or SGD, we set e, w p, q and w to. For G-MF, the group size is Analysis o Preerence Embedding We reconstruct the L 2 -norm o the eedback vectors (i.e., {v i i =,, 2, 3, 4, 5}) ater learning CPE-s or CPE-ps, and illustrate them by column charts in Figure 3. Due to the limited space, we only provide the results learned by CPE-s. According to the igure, v 2 is much greater than v i2{,2,3,4,5} 2 or all datasets; it implies that whether a user gives a rating score or not to an item is more signiicant than which score she chooses. The results are due to the act that the number o (u,,i) triplets is much larger than other triplets, and the assumption that the members in { i, 2, 3, 4, 5} should be more positive than. Hence, the dierence between v 2 and v i2{,2,3,4,5} 2 has an important inluence on model optimization. With the same reason, we can also explain why the dierence between v i2{4,5} 2 and v j2{,2} 2 is greater comparing with v 4 2 v 5 2 or v 2 v 2 2. Hence, the length o eedback vectors learned by our models can intuitively relect the behavior distribution on the real-world datasets. 5.4 Analysis o Item Recommendation The item recommendation results on 8 real-world datasets are listed in Table 2. It is interesting that the selected point-wise and pair-wise algorithms are better than the list-wise ranking methods, which is inconsistent with our intuition. The reason or the outcomes is that the compared list-wise ranking methods mainly consider explicit rating scores and ignore unobserved (user, item) pairs. For our sparse datasets, CoiRank Figure 3: L 2 -norm o the learned eedback vectors. Here U represents the unobserved (user, item) pairs, and the numerical numbers denote the eedback about related rating scores, with the bar indicating their vector length d( vector : ) =? d( vector : ) =? p( ) < p( ) < p( ) < p( ) Outer iterations [beauty] (a) d( vector : ) =? d( vector : ) =? p( ) < p( ) < p( ) < p( ) Outer iterations [beauty] (b) Figure 4: L 2 -norm comparison o two eedback vectors learned by CPE with some assistant eedback. and ListRank cannot take ull advantage o the available data due to the shortage o the ormer inormation. It is obvious that our proposed models are better than and G on most datasets. On average, CPE-s (or CPE-ps) can help improve 4.75% precision-5, 5.4% recall-5, and 6.7% NDCG-5 due to the preerence embedding assumptions. First, CPE-s and CPE-ps consider 6 dierent levels o users eedback, but and G primarily ocus on binary inormation. Thereore, our models can exploit more inormation than and G. Second, our methods transer each type o behavior to its related vector rather than restrict it in user and item space. Thereore, they are more eective comparing with other baselines. The latent dimension d 2{3, 5,, 5} is changed to test model stabilities o the compared approaches. The outcomes o NDCG-5 on beauty and clothing&accessories are shown in Figure 6. We ind that when d is large, the advantage o CPE is obvious; it is because the eedback vectors may help capture more preerence related inormation as well as loose the low-rank assumption when d is bigger. The beneit is reducing with the decreasing o d, but due to the eectiveness o preerence embedding, the perormance o CPE is still better than the baselines. Finally, we study NDCG-K, precision- K and recall-k scores with dierent recommendation size K 2{, 3, 5, 7}, and show some selected results in Figure 6. It is clear that CPE-s and CPE-ps are stable when K varies

6 Table 2: Prediction, recall and NDCG scores on 8 real-world datasets. The size o recommendation list is 5. tools& metric method beauty[a] games [a] clothing& accessories [a] shoes[a] industrial [a] &scientiic movies[d] books[d] music[d] imf G Precision-5 ListRank CoiRank CPE-s CPE-ps imf G Recall-5 ListRank CoiRank CPE-s CPE-ps imf G NDCG-5 ListRank CoiRank CPE-s CPE-ps NDCG G d [beauty] NDCG G d [clothing&accessories] NDCG K G K [beauty] NDCG K G K [clothing&accessories] Figure 5: Perormance comparison with dierent latent dimension d. 5.5 Analysis o Semi-CPE (SCPE) We use rating scores and unobserved correlations to study SCPE s ability on identiying behavior s positiveness. Speciically, two types o eedback p and q are randomly selected rom { k k =,, 2, 3, 4, 5} to simulate unknown behavior, then every eedback vectors are embedded by SCPE-s, with p and q not constrained in the optimizing procedure. We will test whether SCPE-s can correctly rank p( p ) and p( q ). In the experiments, we choose the ollowing two cases: {p =3,q =} and {p =4,q =2}. v 2 and v 3 2 learned by SCPE-s on beauty dataset are plotted in Figure 4(a), and v 2 2 and v 4 2 are shown in Figure 4(b). It is clear that when the algorithm converges, 2 is persistently greater than 3 2, and 2 2 is greater than 4 2, which means that, with other assistant eedback, our SCPE-s can automatically iner that giving 3 scores is more positive than giving score, and giving 4 scores is more positive comparing with giving 2 scores. The results are in line with what we expected, and can be explained by collaborative iltering eatures o SCPE discussed beore. Thereore, SCPE may provide a possible way or analyzing and comparing users heterogeneous eedback or e-commerce websites. Precision K G K [books] Recall K G K [books] Figure 6: Perormance comparison with dierent recommendation list size. 6 Conclusion In this paper, we introduced the constrained preerence embedding (CPE) assumption and two models (i.e., CPE-s and CPE-ps) or preerence learning. We discussed semiconstrained preerence embedding (SCPE) and showed its eectiveness on modeling users eedback o uncertain preerence degree. Finally, we demonstrated the relationship between CPE and -MF. In the experiments, CPE is proved eective rom dierent perspectives. Acknowledgements This work is supported by National Natural Science Foundation o China (Grant No: and Grant No: ). We thank Dr. Zhongyuan Wang in MSRA or helpul discussions on embedding techniques. 244

7 Reerences [Bordes et al., 23] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings or modeling multi-relational data. In Advances in Neural Inormation Processing Systems, NIPS 3, pages , 23. [Cao et al., 27] Zhe Cao, Tao Qin, Tie-Yan Liu, Ming- Feng Tsai, and Hang Li. Learning to rank: From pairwise approach to listwise approach. In Proceedings o the 24th International Conerence on Machine Learning, ICML 7, pages ACM, 27. [Chen et al., 23] Danqi Chen, Richard Socher, Christopher D Manning, and Andrew Y Ng. Learning new acts rom knowledge bases with neural tensor networks and semantic word vectors. ariv preprint ariv:3.368, 23. [Hu et al., 28] Yian Hu, Yehuda Koren, and Chris Volinsky. Collaborative iltering or implicit eedback datasets. In Proceedings o the 8th International Conerence on Data Mining, ICDM 8, pages IEEE, 28. [Koren et al., 29] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix actorization techniques or recommender systems. Computer, 42(8):3 37, 29. [Koren, 28] Yehuda Koren. Factorization meets the neighborhood: a multiaceted collaborative iltering model. In Proceedings o the 4th ACM SIGKDD International Conerence on Knowledge Discovery and Data Mining, SIGKDD 3, pages ACM, 28. [Lin et al., 24] Christopher H. Lin, Ece Kamar, and Eric Horvitz. Signals in the silence: Models o implicit eedback in a recommendation system or crowdsourcing. In Proceedings o the 28th AAAI Conerence on Artiicial Intelligence, AAAI 4, pages 22 28, 24. [McAuley and Leskovec, 23] Julian McAuley and Jure Leskovec. Hidden actors and hidden topics: understanding rating dimensions with review text. In Proceedings o the 7th ACM Conerence on Recommender Systems, Rec- Sys 3, pages ACM, 23. [Pan and Chen, 23] Weike Pan and Li Chen. Gbpr: Group preerence based bayesian personalized ranking or oneclass collaborative iltering. In Proceedings o the 23th International Joint Conerence on Artiicial Intelligence, IJCAI 3, pages AAAI Press, 23. [Pan et al., 28] Rong Pan, Yunhong Zhou, Bin Cao, Nathan Nan Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. One-class collaborative iltering. In Proceedings o the 8th International Conerence on Data Mining, ICDM 8, pages IEEE, 28. [Rendle et al., 29] Steen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking rom implicit eedback. In Proceedings o the 25th Conerence on Uncertainty in Artiicial Intelligence, pages AUAI Press, 29. [Salakhutdinov and Mnih, 28] Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix actorization. In Advances in Neural Inormation Processing Systems, volume 2 o NIPS 8, 28. [Sarwar et al., 2] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Item-based collaborative iltering recommendation algorithms. In Proceedings o the th International Conerence on World Wide Web, WWW, pages ACM, 2. [Shi et al., 2] Yue Shi, Martha Larson, and Alan Hanjalic. List-wise learning to rank with matrix actorization or collaborative iltering. In Proceedings o the 4th ACM Conerence on Recommender Systems, RecSys, pages ACM, 2. [Valizadegan et al., 29] Hamed Valizadegan, Rong Jin, Ruoei Zhang, and Jianchang Mao. Learning to rank by optimizing ndcg measure. In Advances in Neural Inormation Processing Systems, NIPS 9, pages , 29. [Wang et al., 24a] in Wang, Weike Pan, and Congu u. Hgm: Hierarchical group matrix actorization or collaborative recommendation. In Proceedings o the 23rd ACM International Conerence on Conerence on Inormation and Knowledge Management, pages ACM, 24. [Wang et al., 24b] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In Proceedings o the 28th AAAI Conerence on Artiicial Intelligence, AAAI 4, pages 2 9. Citeseer, 24. [Weimer et al., 27] Markus Weimer, Alexandros Karatzoglou, Quoc Viet Le, and Alex Smola. Maximum margin matrix actorization or collaborative ranking. In Advances in Neural Inormation Processing Systems, NIPS 7,

ParaGraphE: A Library for Parallel Knowledge Graph Embedding

ParaGraphE: A Library for Parallel Knowledge Graph Embedding Xiao-Fan Niu, Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer Science and Technology, Nanjing University,