Incorporating Heterogeneous Information for Personalized Tag Recommendation in Social Tagging Systems

Size: px
Start display at page:

Download "Incorporating Heterogeneous Information for Personalized Tag Recommendation in Social Tagging Systems"

Transcription

1 Incorporating Heterogeneous Information for Personalized Tag Recommendation in Social Tagging Systems Wei Feng Tsinghua niversity Beijing, China Jianyong Wang Tsinghua niversity Beijing, China ABSTRACT A social tagging system provides users an effective way to collaboratively annotate and organize items with their own tags. A social tagging system contains heterogeneous information like users tagging behaviors, social networks, tag semantics and item profiles. All the heterogeneous information helps alleviate the cold start problem due to data sparsity. In this paper, we model a social tagging system as a multi-type graph. To learn the weights of different types of nodes and edges, we propose an optimization framework, called OptRank. OptRank can be characterized as follows: Edges and nodes are represented by features. Different types of edges and nodes have different set of features. 2 OptRank learns the best feature weights by maximizing the average AC Area nder the ROC Curve of the tag recommender. We conducted experiments on two publicly available datasets, i.e., Delicious and Last.fm. Experimental results show that: OptRank outperforms the existing graph based methods when only <user, tag, item> relation is available. 2 OptRank successfully improves the results by incorporating social network, tag semantics and item profiles. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval Information Filtering, Retrieval Models, Selection Process General Terms Algorithms Keywords Recommender System, Social Tagging System. INTRODCTION In social tagging systems, users can annotate and organize items with their own tags for future search and sharing. For Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD 2, August 2 6, 202, Beijing, China. Copyright 202 ACM /2/08...$5.00. u u 2 ::: u jj sers t t 2 ::: t jtj Tags i i 2 ::: i jtj Items Figure : Social Tagging System example, users can annotate and share Web pages in Delicious. Besides Delicious, there are many other social tagging system like Last.fm 2 and YouTube 3 in entertainment domain and CiteLike 4 in the research domain. Personalized tag recommendation is the key part of a social tagging system. When a user wants to annotate an item, the user may have his/her own vocabulary to organize items. Personalized tag recommendation tries to find the tags that can precisely describe the item with the user s vocabulary. A social tagging system, as shown in Figure, contains heterogeneous information and can be modeled as a graph: sers, tagst and itemsi co-exist in the graph. Inter-relation. Edges between users, tags and items can be derived from annotation behaviors <user, tag, item>. Suppose we have u and t T, the weight of <u, t> is the times of tag t being used by user u. The same rule applies to <u, i> and <i, t> i I. Intra-relation. Social network among users.2 Tag semantic network based on semantic relatedness. 3 Item network based on content similarities. While the inter-relation has been well studied in previous work [4, 5, 2, 3, 6, 7], few work tries to incorporate all the intra-relation into a unified model. Incorporating the intra-relation may solve the cold start problem due to data sparsity. sers in a social network may influence each other by sharing some annotated items. Semantically related tags may co-occur to describe an item. Items that have similar contents may be annotated with the same tag. When a user u wants to annotate an item i, the recommended tags should meet two requirements: Highly relevant to user u because users have their own way to organize items. 2 Highly relevant to item i because tags should precisely describe the item. To rank the tags, we can perform

2 a random walk with restart at user u and item i to assign each tag a visiting probability, which is used as the ranking score. Only tags that are both relevant to u and i can get high scores. However, two problems arise when the random walk is performed on the multi-type graph: Different types of edges have different meanings and thus are measured in different metrics. For example, the edge weights of a social network may be binary and they have completely different meanings from other types of edges, such as the edges formed by tagging behaviors <user, tag, item>. To perform a random walk, they need to be measured under the same metric. The random walker can either restart from the user u or the item i. The probabilities of restart at u and at i should be estimated. To solve the above two problems, we propose an optimization framework called OptRank. OptRank can be characterized as follows: Edges are represented by features. Different types of edges have different set of features. For example, <u, u 2> u, u 2 in a social network is represented by the feature set {the number of common tags, the number of common items}. The edge <t, u> u, t T is represented by the feature {the times of t being used by u}. Each feature has a feature weight. The edge weight is decided by both the features and the feature weights. ser u and item i for recommendation are represented by a constant feature but their feature weights are learned separately. OptRank learns the feature weights by maximizing the average AC Area under the ROC Curve of the tag recommender. Although graph based methods have been studied in the field of personalized tag recommendation by many researchers [4, 5, 7], most of them belong to the unsupervised approach, in which the edge weights and restart probabilities of u and i are empirically assigned. Inspired by the recent development of semi-supervised learning [3] and graph-based learning [], we are able to turn the existing unsupervised graph-based methods into supervised methods. More specifically, we extend the supervised random walk proposed in [] for link prediction into the setting of personalized tag recommendation. This paper has two major differences from [] : The graph in our setting contains different types of edges, each of them has their own set of features and the corresponding feature weights are learned separately. 2 Since we have two nodes for restart, we further introduce node features. To summarize, our contributions are as follows: To solve the cold start problem due to data sparsity, we are among the first to explore the three new relations: social network, tag semantic relatedness and item content similarities. We propose a graph model and extend the random walk with restart to the multi-type graph to handle different types of relations uniformly. We propose an optimization framework to learn the best edge weights and node weights by maximizing the average AC of the tag recommender. The remainder of this paper is organized as follows. The problem we addressed is formulated in Section 2. Graph model and random walk with restart are introduced in Section 3. Our optimization framework OptRank is introduced in Section3. Experimental study is described in Section 5. Related work is introduced in Section 6. We conclude the paper and discuss the future work in Section PROBLEM STATEMENT AND BASIC FR- AMEWORK Personalized Tag Recommendation. Given a user u and an item i, personalized tag recommendation tries to find tags to describe or classify the item i precisely according to u s vocabulary. Inter-relations and intra-relations among users, items and tags are considered, which makes the graph as a multi-type graph as shown in Figure. Highly ranked tags should be relevant to both u and i. To achieve this goal, a random walk with restart is performed on the multi-type graph with restart at user u and item i. Only tags that are both near to u and item i can get a high visiting probability. Formally, the random walk with restart is performed according to the following equation: p p T p I t+ = αa p p T p I t + α q 0 q I where α is the restart probability. - α means that the random walker has the probability of -α to perform a random jump based on his current state. p T = p T,p T T,p T I is a vector of visiting probabilities of all nodes. p T contains the ranking scores for each tag. A is the transition matrix that stores graph structure information. A is obtained by normalizing each column of the adjacency matrix A to sum to. q T = q T,0 T,q T I is the preference vector that contains the restart probabilities of each node. q is obtained by normalizing the node weight vector q to sum to. The transition matrix A and the preference vector q will be introduced in detail in Section 3. Optimization Framework To get a good ranking by following Equation, the transition matrix A and the preference vector q need to be carefully assigned. Thus we develop an optimization framework called OptRank. Given a user u and an item i for personalized tag recommendation, suppose u has finally annotated i with tags t, t 2,...t k, these tags are defined to be positive tags, denoted by PT. The rest tags are defined to be negative tags, denoted by NT. In other words, the whole tag set T is divided into two parts, i.e. T = P T NT. A good ranking function defined by Equation should rank all the positive tags higher than the negative tags. For a randomly picked positive tag t and a negative tag t 2, a good ranking function has a high probability of ranking t higher than t 2. This is the idea of AC Area nder the ROC Curve metric. Formally, AC is defined by the following equation: i PT j NT AC = Ip T i p T j 2 PT NT 277

3 where Ix is when x > 0. Otherwise Ix is 0. Our goal is to find the best transition matrix A and the preference vector q to maximize the AC. To achieve this, edges are represented by features X and nodes u and i are represented by features Y. To better illustrate the idea, we can assume the adjacency matrix A only contains edges of the same type. A with different types of edges will be introduced in Section 3.. Each edge <v, u> u,v T I is represented by a feature vector Xu, v. Let θ represent the vector of feature weights, the edge weight Au, v is computed by Au, v = f edge θ T Xu, v, where f edge :R R +. ser u and item i are respectively represented by feature vector 5 Y = and Y I =. Let ξ denote the feature weights. The node weights q u and q I i are computed by q u=f node ξ T Y and q I i=f node ξ T I Y I, where f node :R R +. Other entries of q and q I are all 0. According to the above representation, the transition matrix A and the adjacency matrix A can be rewritten to Aθ and Aθ. q and q can be rewritten to qξ and qξ. This means they are respectively decided by parameters θ and ξ. Since the random walk is defined by Aθ and qξ according to Equation, we know that p can be rewritten to pθ, ξ, which means the final ranking scores are parameterized by θ and ξ. However, to make the following formulae more clear, we will not rewrite the above notations with parameters θ and ξ. With edges and nodes parameterized by θ and ξ, we give a formal description of our optimization framework. Given a user u and an item i for tag recommendation and the positive tag set, the optimization problem is max ACθ, ξ = θ,ξ i PT j NT Ip Ti p T j PT NT However, the above equation only considers a single training instance. When m instances {< u k, i k, PT k >} m k= are considered, the cost function Jθ, ξ is defined as the average AC: max θ,ξ Jθ, ξ = m m k= i PT k j NT k Ip T i p T j where NT k = T PT k. The optimization framework OptRank and its solution will be introduced in Section GRAPH MODEL Before introducing the optimization problem, we first introduce more details about Equation. Section 3. introduces the transition matrix. Section 3.2 describes the preference vector. Section 3.3 introduces more intuitions and details of the random walk with restart. 3. Transition Matrix Transition matrix stores the graph structure information. Before defining the transition matrix, we first introduce how to construct a graph from a social tagging system. The graph shown in Figure is constructed with three steps: sers, tags and items are mapped as the nodes. 2 All the 5 Nodes are allowed to have more than one feature, so Y and Y I are still in bold to represent vectors. 3 binary relations, i.e., social network, tag semantic relatedness and item content similarities are mapped to edges. 3 For ternary relation <user, tag, item> where three nodes are involved, binary relations can be derived by projections on each dimension. For example, suppose we have <u, t, i> u, t T, i I, <i, t> can be derived by projecting on the user dimention. <i, t> is described by the feature which is the times of i annotated with t. Now we define the adjacency matrix. Let G denote the whole graph as shown in Figure and A denote its adjacency matrix. Let G MN M, N {, T, I} denote the each sub-graph made up by relation <m,n> m M, n N and A MN denote its adjacency matrix. We have G= M,N {,T,I} G MN and A is composed of sub-matrices A MN: A = A A T AT A TT AI A TI A I A IT A II 4 Recall that edges are represented by features. In Section 2, the edge feature set is denoted by X and the feature weights is denoted by θ. Since different types of edges have different features and feature weights. We have X={X MN M, N {, T, I}} and θ={θ MN M, N {, T, I}}. Given an edge <m,n> {M, N}, A MNm,n is defined by A MNm, n = f edge θ T MNX MNm, n 5 Note that X MNm,n is a vector and X MN is an array of three dimensions. In this paper, f edge : R R + is the sigmoid function: f edge x = + e x 6 Transition matrix A is obtained by normalizing each column of A: A = AD A TD A ID A TD T A TTD T A ITD T A ID I A TID I A IID I 7 where D, D T and D I are diagonal matrices. The i-th entry in the diagonal of D is the out-degree of the i-th user. For u, we have D u, u = M {,T,I} M k= AMk, u 8 D T and D I are defined in the same way. Following this definition, each column of A will be normalized to sum to. 3.2 Preference Vector Given a user u and an item i for tag recommendation, the preference vector q T = q T,0T,q T I specifies the restart probability at u and i. As introduced in Section 2, ser u and item i are respectively represented by feature vector Y = and Y I =. Let ξ = {ξ, ξ I } denote the feature weights. Node weight q M m M {, I}, m {u, i} is computed by q M m = f node ξ T MYM 9 The other entries of q and q I are all set to 0. f node : R R + is the sigmoid function in this paper: f node x = + e x 0 278

4 u 2 u t 2 t t 4 i 2 i i. 5Semantically related tags. t 4 and t are semantically related, which means that they may co-occur in the annotation. When data is sparse, i.e., u and i are both inactive, more information can be taken into account by jumping more than two-hops away. Now we introduce another intuition behind the random walk. With the transition matrix A defined by Equation 7, we can rewrite Equation as follows: u 3 t 3 i 3 p = αa p + A Tp T + A Ip I + αq 3 p T = αa Tp + A TTp T + A TIp I 4 Figure 2: The random walker restart at u and i in no more than 2-hops The preference vector q T = q T,0T,q T I is obtained by normalizing q T = q T,0,qT I to sum to : q = q Dq 0 q I Dq where D q is the summation of each entry in q and q I. Formally, Dq is defined as the following equation: D q = M {,I} Equation ensures that q sums to. M k= q Mk Random Walk With Restart In this section, we introduce more intuitions of the random walk with restart for personalized tag recommendation. As we introduced in Section 2, the random walker can frequently restart at u and i to rank the tags. We illustrate this idea with an example shown in Figure 2. In Figure 2, we want to recommend tags for user u to annotate i, so the random walker restarts frequently from u and i. The edges indicate how the random walker jumps from node to node. u has the history that she/he has annotated i 2 before. i has the history that it has been annotated by u 3. Besides annotation relation, u 2 is a friend of u, i 3 has similar contents with i, and t 4 has high semantic relatedness with t. Now we discuss how the random walker behaves in no more than two hops from u and i : When the random walker is only allowed to jump one hop from user u and item i, the recommended tags either have been used by user u or have been annotated on item i by other users. As we can see from Figure 2, t is such a tag. When u has annotated many items and i has been annotated by many users, the random walker will find the best common tags in both sets of u s tags and i s tags. When the random walker is allowed to jump within two hops, the recommended tags come from different sources: Items annotated by u. For example, i 2 has been annotated by u and i 2 has a tag t 2. t 2 may reflect the interests of u. 2 sers that have annotated item i. Since u 3 has annotated i, the tags annotated by u 3 may reflect the content of i. 3 Friends of u. u 2 is a friend of u and his/her tags may also be adopted by u. 4 Similar items. Since i 3 and i have similar content, the tags of i 3 may also be the tags of p I = αa Ip + A ITp T + A IIp I + αq I 5 where A MN=A MND N M, N {, T, I}. AMNp N M, N {, T, I} means that p N is spread to its neighbor nodes through the transition matrix A MN. First we discuss the extreme case that α equals to 0. Taking p T as an example, p T receives scores from p through A T, p T through A TT and p I through A TI. For t T, p T t will have a high score if t has highly ranked user neighbors, tag neighbors and item neighbors. The same rule applies to p and p I. In other words, users, tags and items reinforce each other iteratively until a stable state is reached. However, there is no personalized information considered. Given a user u and an item i for tag recommendation, when α is greater than 0, the random walker will restart at u and i. Besides reinforcement rule, p, p T and p I are also influenced by the distance from u and i. Nodes that are near to u and i will get a higher ranking. 4. OPTIMIZATION BASED FRAMEWORK In this section, we focus on how to find the best feature weights to achieve an optimal random walk with restart. Section 4. describes the objective function for optimization. Section 4.2 introduces how to solve the optimization problem. Section 4.3 introduces the derivatives of the random walk with respect to the feature weights, which belongs to the details in solving the optimization problem. 4. Objective Function As we introduced in Section 2, we want to maximize the average AC of the tag recommender according to Equation 3. To convert this problem into a minimization problem, we can rewrite Equation 2 to an equivalent form: i PT j NT AC = Ip T j p T i PT NT 6 This equation tells us that to maximize AC is equivalent to minimize i PT j NT Ip Tj p T i/ PT NT. We propose an equivalent minimization problem of Equation 3: min Jθ, ξ = m i PT k j NT k Ip T j p T i θ,ξ m k= 7 Since Jθ, ξ is not differentiable, we can use the sigmoid function with parameter β as a differentiable approximation: Sx; β = + e βx 8 The bigger the β is, the smaller the approximate error is. However, when β is big, the steep gradient will cause a numerical problem. β is empirically assigned. Now we have a 279

5 new objective function: min Jθ, ξ = m i PT k j NT k Sp T j p T i θ,ξ m k= Solving the Optimization Problem We use gradient descent to solve the optimization problem. The basic idea of gradient descent is to find the direction gradient that the objective function drops down and make a small step towards the direction to update θ and ξ. However, the cost function defined in Equation 9 requires to sum up all the training instances to perform one update, which is too costly. So we update θ and ξ based on each training instance, which is called stochastic gradient descent. The algorithm is shown in Algorithm. Algorithm : Stochastic Gradient Descent Input: m training instances lr: learning rate Output: optimal θ and ξ t=0; 2 initialize θ 0 and ξ 0 ; 3 while Jθ, ξ has not converged do 4 Randomly shuffle the m training instances; foreach training instance k do 5 θ t+ = θ t - lr J kθ t,ξ t 6 ξ t+ = ξ t - lr J kθ t,ξ t 7 t = t + ; where J k θ, ξ is the cost based on the k-th instance: i PT J k θ, ξ = k j NT k Sp T j p T i 20 Learning rate lr decides the step size towards the dropping direction. The random shuffle at Line 4 is required by stochastic descent for convergence. The updating rules for θ and ξ are shown in Lines 5 and 6. We will discuss how to compute J k θ, ξ/ and J k θ, ξ/ in detail in the following. J k θ, ξ J k θ, ξ = = i PT k j NT k Sδ ji δ ji i PT k j NT k Sδ ji δ ji T j T j i i 2 22 where δ ji = p T j p T i. Sδ ji/ δ ji is easy to compute. According to Equation 8, we can derive that Sδ ji/ δ ji = βsδ ji Sδ ji. The remaining question is how to compute j/ and i/, which will be discussed in the next section. 4.3 Derivatives of the Random Walk In this section, we will discuss how to compute the derivatives of the random walk. Suppose p T = p T,p T T,p T I T, we want to compute / and /. The basic idea is that we can derive a similar iterative way to compute derivatives from the definition of random walk. Derivatives with respect to θ. Since / is composed of / MN M, N {, T, I}, without loss of generality, we introduce how to compute /. Taking the derivatives with respect to θ on both sides of the Equations 3, 4, 5, we can get = α = α = α N {,T,I} N {,T,I} N {,T,I} A N N + AN p N A N TN + ATN p N A N IN + AIN p N Following the same rule, we can compute the derivatives with respect to any θ MN M,N {, T, I}, which all lead to the same form with the above three equations. To better illustrate the connections between computing p and computing / MN, we can rewrite the above three equations with θ replaced by θ MN in the matrix form: MN MN MN = αa MN MN MN + α A MN p p T p I 26 where A is the transition matrix defined in the original random walk. Comparing the above equation with Equation for computing p, we can find two differences: p is replaced by / MN. 2 The last term on the right side is totally changed. However, only the first term αa/ MN decides whether Equation 26 will converge to a stable state. More details about the convergence are discussed in the appendix. The last detail is how to compute A/ MN. Without loss of generality, we discuss how to compute A/. Recall that A is composed of sub-matrices {A MN M, N {, T, I}} and not all A MN are related with. According to Equation 7, only A, A T and A I can be influenced by θ. So we only need to compute A /, A T/, A I/. Take A / for example, we can get A = A D D + A 27 θ Each entry of A is defined according to Equation 5. For u, u 2, we have A u, u 2 = f edgeθ T X u, u 2 28 Each entry in the diagonal of D is the out-degree of a user. According to Equation 8, for u, the derivative is D u, u = M {,T,I} A k,u k= M k= AMk, u2 29 So far we have explained how to compute A /. The same process can be used for computing A T/ and A I/. Derivatives with respect to ξ. Computing / is analogous to computing /. Since / is composed of / M {, I}, without loss of generality, we 280

6 first focus on how to compute /. Taking the partial derivatives with respect to ξ on both sides of Equations 3, 4 and 5, we can get = α = α = α N {,T,I} N {,T,I} N {,T,I} A N N + α q 30 A TN N + α q T 3 A IN N + α q I 32 Following the same rule, / I can also be obtained. Replacing ξ I with ξ M M {, I}, we can rewrite the above three equations to a single equation in the matrix form: = αa + α q q T q I 33 From the above equation, we can see that computing / also has the same form with Equation. More details on the convergence will be discussed in the appendix. The last detail is how to compute q/ M {, I}. Without loss of generality, suppose M is, according to Equation, we have q = q D q 0 Dq q I Dq + q 34 Each entry of q is defined according to Equation 9. For u, we have qu = f nodeξ T Y 35 When f node is the sigmoid function, we know that df node x / dx = [f node x][ f node x]. Dq is defined according to Equation 2 and the derivative is D q = M {,I} M q k k= M k= q M k2 36 So far we have described how to compute q/. The same process can be performed to compute q/ I To sum up, we have introduced how to compute / and /, which can be summarized by Algorithm EXPERIMENTAL STDY 5. Datasets We test OptRank on two publicly available datasets 6 : Delicious and Last.fm, which are published by [2] as benchmarks. Delicious contains posts involving 867 users, tags, items, 5328 user relations, tag relations and 597 item relations. All types of intrarelations we studied are included in Delicious. Posts are represented by <user, tag, item>. Last.fm contains 2464 posts involving 892 users, 9749 tags, 2523 items and Algorithm 2: Derivatives of the random walk Input: Transition matrix A and preference vector q Output: and t=0; 2 Initialize p t 3 while p has not converged do 4 p t+ = - αap t + α q 5 t = t + ; 6 t=0; 0 ; 7 Initialize 8 while has not converged do 9 Computing t+ according to Equation 26 0 t = t + ; t=0; 0 ; 2 Initialize 3 while has not converged do 4 Computing t+ according to Equation 33 5 t = t + ; user relations. Last.fm is a smaller dataset and only user relations are available. We introduce each type of relation and its features as follows. Inter-relation. For edge <m,n> m M n N M, N {, T, I} M N, the feature vector X MNm, n = the times of m co-occurred with n in the posts. For example, suppose we have <u, t> u t T, X Tu, t = the times of u co-occurred with t in the posts, which means the times of t used by u. In our experiments, we use the same feature set to denote <m,n> and <n,m>. This means that A MN and A NM are both decided by ξ MN and θ MN. ser Relation. ser relations are formed by the social network. Each relation is bi-direction and binary weighted. To find the strength of a user relation, we check their items and tags in common. More formally, user u can be represented by an item vector A I, u and a tag vector A T, u. Each entry of A I and A T is re-weighted by TF-IDF. sers and items can be viewed as documents and words in the information retrieval. Let ÃMN M,N {, T, I} denote the A MN re-weighted by TF-IDF. For edge <u, u 2> u, u 2, the feature vector X u, u 2 = [cosãt, u, ÃT, u 2, cosãi, u, ÃI, u2] Tag Relation. Tag semantic relatedness is computed with the help of Wikipedia 7. To be more specific, 47% tags are article titles in Wikipedia. Articles link to each other by anchor texts. Semantic relatedness of tag pairs can be inferred from the the number of links between article pairs. We use WikipediaMiner [0], which is an off-the-shelf tool, to calculate semantic relatedness. Only tag pairs that have semantic relatedness larger than 0.25 are retained. To refine the edge weights, tags are also represented by user vectors and item vectors. We perform the same TF-IDF weighting technique to A T and A IT. Let ÃT and ÃIT denote the TF-IDF weighted matrix. For edge t, t 2 t, t 2 T, edge

7 feature X TTt, t 2 = [semantic relatedness, cosãt, t, à T, t 2, cosãit, t, ÃIT, t2]. Item Relation. We calculate item similarities based on Web page titles in Delicious. A title is a vector of words with TF-IDF weighting on each entry. Besides content similarities, we refine edge weight with TF-IDF weighted ÃI and à TI. For <i, i 2> i, i 2 I, X IIi, i 2 = [costitle, title 2, cosãi, i, ÃI, i2, cosãti, i, ÃTI, i2] Like logistic regression, we add a constant feature to each feature set X MN and all the features are normalized to have mean 0 and standard deviation. 5.2 Baselines Since OptRank is an extension of existing graph-based methods, we want to prove two points: OptRank outperforms existing graph-based methods when only <user, tag, item> is available. 2 OptRank further improves the performance by incorporating social networks, tag semantic relatedness, item content similarities. we choose two graph based methods as our baselines. Random Walk with Restart. Random Walk with Restart, called RWR for short, is the unsupervised version of OptRank. RWR performs on the graph defined by <user, tag, item>. The weight of the edge <m,n> m M n N M, N {, T, I} M N is the times of m cooccurred with n in the posts. Given a user u and an item i for tag recommendation, when the random walker decides to restart, it has the probability of 0.5 to restart at u and 0.5 to restart at i. RWR has been adopted in [6] to incorporate social networks, but the different types of edges are normalized empirically and are hard to reproduce. FolkRank. FolkRank is a state-of-the-art graph-based algorithm. The graph is defined in the same way with RWR. FolkRank can be summarized as three steps: Calculate a global PageRank score p global for each node. 2 Calculate a personalized PageRank score p pref with special preference to u and i for each node. 3 Calculate FolkRank score as the wins and loses between the personalized PageRank p pref and the global PageRank p global, i.e., score = p pref p global. In our experiments, we set the damping factor to 0.7, which achieves the best performance for FolkRank. In our experiments, FolkRank is denoted by FR. We are aware that there are many methods based on tensor factorization[2, 3, 5]. However, tensor factorization needs to learn a low rank approximation vector for each user, item and tag. In OptRank, a user can even not exist in the training set but can still get recommendation if she/he has neighbors in the test set. OptRank only needs about 3000 training instances to reach its best performance. However, tensor factorization would fail with such a small training set, which is unfair. For this reason, we did not choose these methods as baselines. 5.3 Evaluation Methodology Performance Measurements. We use average precision, precision-recall curve and average AC Area under the ROC Curve to measure the performance. We are aware that the optimal of AC is not necessarily the optimal of average precision/recall. To trade-off between best AC and best precision, we choose the model that has both high AC and precision in the cross validation set. Then the model is evaluated on the test set. AC AC Precision α Figure 3: The effect of α Precision Training/Cross Validation/Test Set. Posts are aggregated into records <u, i, PT> u, i I. For each dataset, we randomly picked 5000, 3000, and 3000 records as the training set, cross validation set, and test set. 5.4 Parameters β and learning rate. β in Equation 8 controls the error of approximating Ix. The bigger β is, the smaller the approximate error is. However, when β gets too big, the derivative at x = 0 will also get too steep and will cause a numeric problem. When β gets too small, minimizing Jθ, ξ would fail to maximize AC. From Equations 2 and 22, we can know that the summation of the derivatives is divided by. Since large dataset has big, we can use a big β. In our experiments, β is 0 9 in Delicious and 0 6 in Last.fm. Learning rate lr is strongly related to β. When lr gets too big, stochastic gradient descent would fail to converge. lr is set to 0 in both datasets. Restart Probability α. α controls how frequently the random walker chooses to restart. We evaluate how AC and precision change by differing α from 0.2 to 0.8 in Delicious. The results are shown in Figure 3. OptRank was run on the inter-relation formed by <user, tag, item>. When precision and AC are both considered, α [0.6, 0.8] seems to be a good choice. Finally, we set α to Experimental Results Results on Delicious. Results on Delicious are shown in Table and Figure 4. FR denote FolkRank. OptRank Edge, OptRank Node and OptRank EN denote OptRank with only edge features enabled, only node features enabled, and both features enabled, respectively. Firstly, we compare the algorithms that are only performed on inter-relations formed by <user, tag, item>. Since RWR always performs better than FR, we only compare OptRank with RWR in the following. When only edge features are enabled, OptRank Edge has comparable performance with RWR. This indicates that the original transition matrix of RWR and FR is nearly optimal. When only node features are enabled, OptRank Node learns the best weights for u and i, which improves the top- precision by 3.3% compared with RWR. This indicates the original node weight is not optimal. When edge features and node features are both enabled, OptRank EN futher improves the top- precision by 2.4% based on OptRank Node. From Figure 4 we can know that the OptRank EN outperforms RWR at top-5 but the advantage disappears at top-0. However, since a user usually annotates an item with less than 5 tags, top-5 performance is considered more important than top-0 performance. In terms of AC, FolkRank 282

8 Precision FR RWR OptRank_EN OptRank_I OptRank_TI Recall Figure 4: Precision-Recall Curve on Delicious Table : Precision and AC on Delicious Algorithm P@ P@2 P@3 AC FR RWR OptRank Edge OptRank Node OptRank EN OptRank OptRank T OptRank I OptRank TI has relatively poor performance, worse than the precision. In contrast, RWR has a much better average AC. This is probably because FolkRank is an empirically designed algorithm and relies too much on the global information. We can see that a high precision does not indicate a high AC. Now we discuss how OptRank performs when extra user relations, tag relations and item relations exist. When each type of relation is considered separately, OptRank, OptRank T and OptRank I improve the top- precision by around % based on OptRank EN, which is not very significant compared with previous improvement. However, as we can see from Figure 4, the top-0 performance of OptRank I is significantly improved compared with OptRank EN. Since OptRank, OptRank T and OptRank I are comparable, only OptRank I is shown in Figure 4. When all the relations are combined, we can see that OptRank TI achieves the best performance at all top-k performance. In terms of AC, OptRank TI also achieves the best performance. Results on Last.fm. The results are shown in Figure 5 and Table 2. The results are significantly better than the results on Delicious. This can be explained in terms of data sparsity. When only inter-relations are considered, a post can be viewed as an entry in the three-dimension array spanned by users, tags and items and of the entries in Last.fm and Delicious are known, respectively. Thus Last.fm is less sparse and more predictable than Delicious. In Last.fm, all algorithms have comparable performance at top-0. So we mainly focus on the top-5 performance in this experiment and this is reasonable since users usually annotate an item within 5 tags. From Figure 5 we can know that FolkRank and RWR have comparable performances in term of precision, which is different from the results on Delicious. When node features and edge features are both considered, OptRank EN improves P@, P@2 and P@3 by.2%,.5%,.5% respectively compared with FR and RWR. Although Last.fm is less sparse than Delicious, Precision FR RWR OptRank_EN OptRank_ Recall Figure 5: Precision-Recall Curve on Last.fm Table 2: Precision and AC on Last.fm Algorithm P@ P@2 P@3 AC FR RWR OptRank EN OptRank when the social network is combined, OptRank successfully improves the top- precision by 3.7% compared with the two baselines. In terms of AC, empirically designed FR still falls behind other methods. OptRank achieves the highest AC. To sum up, we have two conclusions from the experiments: When only <user, tag, item> is available, OptRank outperforms RWR and FolkRank. 2 OptRank successfully combined extra relations to improve the performance. Now we discuss some details about the training process. Since Delicious is bigger and takes more time, the training size and running time are reported according to Delicious. Over-fitting Issues. Over-fitting does not seem to be a problem in our model since we only have 8 parameters when all the relations are combined 8. OptRank TI achieves nearly the same top- precision in the cross validation set and test set. Training Size. The training size is really small compared with tensor factorization. OptRank EN achieves its best performance when 600 training instances are passed. OptRank TI achieves its best performance when 200 training instances are passed. Running Time. The experiments were conducted on a single PC with a 2-core 3.2GHz CP and 2G main memory. We implemented the algorithm in Matlab with full vectorization. When all the relations are combined, each training instance takes nearly 3.5 seconds. Prediction takes around 0. seconds per instance and most of the time is spent on computing the gradients. Training with 5000 instances would take 4.8 hours at most. However, all the algorithms in our experiments achieve their best performance within 2000 training instances. 6. RELATED WORK There are mainly three approaches for personalized recommendation in social tagging systems: Graph-based approach [4, 5, 7]. 2 Tensor decomposition [2, 3, 5]. The annotation relation is modeled as a cube with many 8 Each inter relation has 2 parameters, user relation has 3 parameters, tag relation and item relation has 4 parameters, respectively. 283

9 unknown entries. After performing tensor decomposition, we can predict the unknown entries by low-rank approximations. 3 ser/item based collaborative filtering [8,, 8]. The original user-item matrix is extended by including tag information so that we can apply user/item based collaborative filtering methods. Besides annotation behaviors, user space, tag space and item space have also been explored. [9] has studied trust networks and proposed a factor analysis approach based on probabilistic matrix factorization. [6] incorporates social network for item recommendation, but fails to improve the performance significantly. [4] links social tags from Flickr into WordNet. [7] introduces item taxonomies into recommender systems. This paper is mainly inspired by two recent work on graphbased learning [] and semi-supervised learning [3]. [] proposes supervised random walks to learn the edge weights for link prediction in homogenous graph. This paper extends [] with multi-type edges and nodes. [3] has proposed similar idea to learn edge weights and node weights with an inductive learning framework in homogenous graph. Since a recommender should have the ability to predict for future events, our framework is different from [3] in that ours belongs to transductive learning. 7. CONCLSION AND FTRE WORK In this paper, we propose an optimization-based graph method for personalized tag recommendation. To alleviate data sparsity, different sources of information are incorporated into the optimization framework. There are some problems unsolved for future work: Reducing the graph size. Since the random walker frequently restarts at u and i, nodes that are far away from u and i may be cut without influencing the final ranking. 2 Comparing with tensor factorization methods under a suitable experiment setting. 3 More features can be explored to further improve the results, such as the temporal factors. 8. ACKNOWLEDGMENTS This work was supported in part by National Basic Research Program of China 973 Program under Grant No. 20CB302206, and National Natural Science Foundation of China under Grant No REFERENCES [] L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in social networks. In WSDM, pages , 20. [2] I. Cantador, P. Brusilovsky, and T. Kuflik. Workshop hetrec 20. RecSys 20. ACM, 20. [3] B. Gao, T.-Y. Liu, W. Wei, T. Wang, and H. Li. Semi-supervised ranking on very large graphs with rich metadata. In KDD, pages 96 04, 20. [4] Z. Guan, J. Bu, Q. Mei, C. Chen, and C. Wang. Personalized tag recommendation using graph-based ranking on multi-type interrelated objects. In SIGIR, pages , [5] R. Jäschke, L. B. Marinho, A. Hotho, L. Schmidt-Thieme, and G. Stumme. Tag recommendations in folksonomies. In PKDD, pages , [6] I. Konstas, V. Stathopoulos, and J. M. Jose. On social networks and collaborative recommendation. In SIGIR, pages , [7] H. Liang, Y. Xu, Y. Li, and R. Nayak. Personalized recommender system based on item taxonomy and folksonomy. CIKM 0, pages , 200. [8] H. Liang, Y. Xu, Y. Li, R. Nayak, and X. Tao. Connecting users and items with weighted tags for personalized item recommendations. HT 0, pages ACM, 200. [9] H. Ma, T. C. Zhou, M. R. Lyu, and I. King. Improving recommender systems by incorporating social contextual information. ACM Trans. Inf. Syst., 29:9: 9:23, Apr. 20. [0] D. Milne. An Open-Source Toolkit for Mining Wikipedia, volume [] J. Peng, D. D. Zeng, H. Zhao, and F.-y. Wang. Collaborative filtering in social tagging systems based on joint item-tag recommendations. CIKM 0, pages ACM, 200. [2] S. Rendle, L. B. Marinho, A. Nanopoulos, and L. Schmidt-Thieme. Learning optimal ranking with tensor factorization for tag recommendation. In KDD, pages , [3] S. Rendle and L. Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In WSDM, pages 8 90, 200. [4] B. Sigurbjörnsson and R. van Zwol. Flickr tag recommendation based on collective knowledge. WWW 08, pages ACM, [5] P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos. Tag recommendations based on tensor dimensionality reduction. In RecSys, pages 43 50, [6] P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos. A unified framework for providing recommendations in social tagging systems based on ternary semantic analysis. TKDE, 222:79 92, 200. [7] H. Yildirim and M. S. Krishnamoorthy. A random walk method for alleviating the sparsity problem in collaborative filtering. In RecSys, pages 3 38, [8] Y. Zhen, W.-J. Li, and D.-Y. Yeung. Tagicofi: tag informed collaborative filtering. RecSys 09, pages ACM, APPENDIX We prove the convergence of Equations 26 and 33. Both the equations can be rewritten to a more general form: p t+ = λap t + µq where 0 λ, µ, A is a transition matrix with each column summing to and q can be any vector with the same dimension of p. Suppose p 0 = π, we have p = λaπ + µq, p 2 = λa 2 π + λaµq + µq,..., p n = λa n π + n k=0 λak µq. Since 0 λ, µ and the eigenvalues of the transition matrix A are in [-, ], we have lim n λa n = 0 and lim n n k=0 λak = I λa. So p n finally converges to p = I λa µq. 284

Learning Optimal Ranking with Tensor Factorization for Tag Recommendation

Learning Optimal Ranking with Tensor Factorization for Tag Recommendation Learning Optimal Ranking with Tensor Factorization for Tag Recommendation Steffen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, Lars Schmidt-Thieme Information Systems and Machine Learning Lab

More information

Pairwise Interaction Tensor Factorization for Personalized Tag Recommendation

Pairwise Interaction Tensor Factorization for Personalized Tag Recommendation Pairwise Interaction Tensor Factorization for Personalized Tag Recommendation Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University,

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 29-30, 2015 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 29-30, 2015 1 / 61 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 21, 2014 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 21, 2014 1 / 52 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

Prediction of Citations for Academic Papers

Prediction of Citations for Academic Papers 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning

Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Supplementary Material Zheyun Feng Rong Jin Anil Jain Department of Computer Science and Engineering, Michigan State University,

More information

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends Point-of-Interest Recommendations: Learning Potential Check-ins from Friends Huayu Li, Yong Ge +, Richang Hong, Hengshu Zhu University of North Carolina at Charlotte + University of Arizona Hefei University

More information

TERNARY SEMANTIC ANALYSIS OF SOCIAL TAGS FOR PERSONALIZED MUSIC RECOMMENDATION

TERNARY SEMANTIC ANALYSIS OF SOCIAL TAGS FOR PERSONALIZED MUSIC RECOMMENDATION TERNARY SEMANTIC ANALYSIS OF SOCIAL TAGS FOR PERSONALIZED MUSIC RECOMMENDATION Panagiotis Symeonidis 1 Maria Ruxanda 2 Alexandros Nanopoulos 1 Yannis Manolopoulos 1 1. Department of Informatics 2. Department

More information

Integrated Anchor and Social Link Predictions across Social Networks

Integrated Anchor and Social Link Predictions across Social Networks Proceedings of the TwentyFourth International Joint Conference on Artificial Intelligence IJCAI 2015) Integrated Anchor and Social Link Predictions across Social Networks Jiawei Zhang and Philip S. Yu

More information

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie, Shenglin Zhao, Zibin Zheng, Jieming Zhu and Michael R. Lyu School of Computer Science and Technology, Southwest

More information

Factorization Models for Context-/Time-Aware Movie Recommendations

Factorization Models for Context-/Time-Aware Movie Recommendations Factorization Models for Context-/Time-Aware Movie Recommendations Zeno Gantner Machine Learning Group University of Hildesheim Hildesheim, Germany gantner@ismll.de Steffen Rendle Machine Learning Group

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties Prof. James She james.she@ust.hk 1 Last lecture 2 Selected works from Tutorial

More information

Comparative Document Analysis for Large Text Corpora

Comparative Document Analysis for Large Text Corpora Comparative Document Analysis for Large Text Corpora Xiang Ren Yuanhua Lv Kuansan Wang Jiawei Han University of Illinois at Urbana-Champaign, Urbana, IL, USA Microsoft Research, Redmond, WA, USA {xren7,

More information

Correlation Autoencoder Hashing for Supervised Cross-Modal Search

Correlation Autoencoder Hashing for Supervised Cross-Modal Search Correlation Autoencoder Hashing for Supervised Cross-Modal Search Yue Cao, Mingsheng Long, Jianmin Wang, and Han Zhu School of Software Tsinghua University The Annual ACM International Conference on Multimedia

More information

Matrix Factorization and Factorization Machines for Recommender Systems

Matrix Factorization and Factorization Machines for Recommender Systems Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 215 Chih-Jen Lin (National Taiwan Univ.) 1 / 54 Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen

More information

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information

More information

An Application of Link Prediction in Bipartite Graphs: Personalized Blog Feedback Prediction

An Application of Link Prediction in Bipartite Graphs: Personalized Blog Feedback Prediction An Application of Link Prediction in Bipartite Graphs: Personalized Blog Feedback Prediction Krisztian Buza Dpt. of Computer Science and Inf. Theory Budapest University of Techn. and Economics 1117 Budapest,

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

TAPER: A Contextual Tensor- Based Approach for Personalized Expert Recommendation

TAPER: A Contextual Tensor- Based Approach for Personalized Expert Recommendation TAPER: A Contextual Tensor- Based Approach for Personalized Expert Recommendation Hancheng Ge, James Caverlee and Haokai Lu Department of Computer Science and Engineering Texas A&M University, USA ACM

More information

What is Happening Right Now... That Interests Me?

What is Happening Right Now... That Interests Me? What is Happening Right Now... That Interests Me? Online Topic Discovery and Recommendation in Twitter Ernesto Diaz-Aviles 1, Lucas Drumond 2, Zeno Gantner 2, Lars Schmidt-Thieme 2, and Wolfgang Nejdl

More information

From Social User Activities to People Affiliation

From Social User Activities to People Affiliation 2013 IEEE 13th International Conference on Data Mining From Social User Activities to People Affiliation Guangxiang Zeng 1, Ping uo 2, Enhong Chen 1 and Min Wang 3 1 University of Science and Technology

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Journal of Computational Information Systems 9: 15 (2013) 6251 6258 Available at http://www.jofcis.com Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng

More information

Aggregated Temporal Tensor Factorization Model for Point-of-interest Recommendation

Aggregated Temporal Tensor Factorization Model for Point-of-interest Recommendation Aggregated Temporal Tensor Factorization Model for Point-of-interest Recommendation Shenglin Zhao 1,2B, Michael R. Lyu 1,2, and Irwin King 1,2 1 Shenzhen Key Laboratory of Rich Media Big Data Analytics

More information

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou Sun College of Computer and Information Science Northeastern University yzsun@ccs.neu.edu July 25, 2015 Heterogeneous Information Networks

More information

Regularity and Conformity: Location Prediction Using Heterogeneous Mobility Data

Regularity and Conformity: Location Prediction Using Heterogeneous Mobility Data Regularity and Conformity: Location Prediction Using Heterogeneous Mobility Data Yingzi Wang 12, Nicholas Jing Yuan 2, Defu Lian 3, Linli Xu 1 Xing Xie 2, Enhong Chen 1, Yong Rui 2 1 University of Science

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Circle-based Recommendation in Online Social Networks

Circle-based Recommendation in Online Social Networks Circle-based Recommendation in Online Social Networks Xiwang Yang, Harald Steck*, and Yong Liu Polytechnic Institute of NYU * Bell Labs/Netflix 1 Outline q Background & Motivation q Circle-based RS Trust

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis

More information

Learning to Recommend Point-of-Interest with the Weighted Bayesian Personalized Ranking Method in LBSNs

Learning to Recommend Point-of-Interest with the Weighted Bayesian Personalized Ranking Method in LBSNs information Article Learning to Recommend Point-of-Interest with the Weighted Bayesian Personalized Ranking Method in LBSNs Lei Guo 1, *, Haoran Jiang 2, Xinhua Wang 3 and Fangai Liu 3 1 School of Management

More information

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter Classification Semi-supervised learning based on network Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS 249-2 2017 Winter Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin

More information

DS504/CS586: Big Data Analytics Graph Mining II

DS504/CS586: Big Data Analytics Graph Mining II Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6-8:50PM Thursday Location: AK233 Spring 2018 v Course Project I has been graded. Grading was based on v 1. Project report

More information

Joint user knowledge and matrix factorization for recommender systems

Joint user knowledge and matrix factorization for recommender systems World Wide Web (2018) 21:1141 1163 DOI 10.1007/s11280-017-0476-7 Joint user knowledge and matrix factorization for recommender systems Yonghong Yu 1,2 Yang Gao 2 Hao Wang 2 Ruili Wang 3 Received: 13 February

More information

Clustering based tensor decomposition

Clustering based tensor decomposition Clustering based tensor decomposition Huan He huan.he@emory.edu Shihua Wang shihua.wang@emory.edu Emory University November 29, 2017 (Huan)(Shihua) (Emory University) Clustering based tensor decomposition

More information

Decoupled Collaborative Ranking

Decoupled Collaborative Ranking Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

DS504/CS586: Big Data Analytics Graph Mining II

DS504/CS586: Big Data Analytics Graph Mining II Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6:00pm 8:50pm Mon. and Wed. Location: SL105 Spring 2016 Reading assignments We will increase the bar a little bit Please

More information

Collaborative Filtering with Entity Similarity Regularization in Heterogeneous Information Networks

Collaborative Filtering with Entity Similarity Regularization in Heterogeneous Information Networks Collaborative Filtering with Entity Similarity Regularization in Heterogeneous Information Networks Xiao Yu Xiang Ren Quanquan Gu Yizhou Sun Jiawei Han University of Illinois at Urbana-Champaign, Urbana,

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

A Modified PMF Model Incorporating Implicit Item Associations

A Modified PMF Model Incorporating Implicit Item Associations A Modified PMF Model Incorporating Implicit Item Associations Qiang Liu Institute of Artificial Intelligence College of Computer Science Zhejiang University Hangzhou 31007, China Email: 01dtd@gmail.com

More information

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata Wei Wu a, Hang Li b, Jun Xu b a Department of Probability and Statistics, Peking University b Microsoft Research

More information

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.

More information

Learning Topical Transition Probabilities in Click Through Data with Regression Models

Learning Topical Transition Probabilities in Click Through Data with Regression Models Learning Topical Transition Probabilities in Click Through Data with Regression Xiao Zhang, Prasenjit Mitra Department of Computer Science and Engineering College of Information Sciences and Technology

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Leverage Sparse Information in Predictive Modeling

Leverage Sparse Information in Predictive Modeling Leverage Sparse Information in Predictive Modeling Liang Xie Countrywide Home Loans, Countrywide Bank, FSB August 29, 2008 Abstract This paper examines an innovative method to leverage information from

More information

Computer science research seminar: VideoLectures.Net recommender system challenge: presentation of baseline solution

Computer science research seminar: VideoLectures.Net recommender system challenge: presentation of baseline solution Computer science research seminar: VideoLectures.Net recommender system challenge: presentation of baseline solution Nino Antulov-Fantulin 1, Mentors: Tomislav Šmuc 1 and Mile Šikić 2 3 1 Institute Rudjer

More information

Measuring Similarity in Large-scale Folksonomies

Measuring Similarity in Large-scale Folksonomies Measuring Similarity in Large-scale Folksonomies Giovanni Quattrone, Emilio Ferrara, Pasquale De Meo, Licia Capra Abstract Social (or folksonomic) tagging has become a very popular way to describe content

More information

Location Regularization-Based POI Recommendation in Location-Based Social Networks

Location Regularization-Based POI Recommendation in Location-Based Social Networks information Article Location Regularization-Based POI Recommendation in Location-Based Social Networks Lei Guo 1,2, * ID, Haoran Jiang 3 and Xinhua Wang 4 1 Postdoctoral Research Station of Management

More information

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci Link Analysis Information Retrieval and Data Mining Prof. Matteo Matteucci Hyperlinks for Indexing and Ranking 2 Page A Hyperlink Page B Intuitions The anchor text might describe the target page B Anchor

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Multi-Relational Matrix Factorization using Bayesian Personalized Ranking for Social Network Data

Multi-Relational Matrix Factorization using Bayesian Personalized Ranking for Social Network Data Multi-Relational Matrix Factorization using Bayesian Personalized Ranking for Social Network Data Artus Krohn-Grimberghe, Lucas Drumond, Christoph Freudenthaler, and Lars Schmidt-Thieme Information Systems

More information

Analysis of the recommendation systems based on the tensor factorization techniques, experiments and the proposals

Analysis of the recommendation systems based on the tensor factorization techniques, experiments and the proposals Aalborg University Project Report Analysis of the recommendation systems based on the tensor factorization techniques, experiments and the proposals Group: d519a Authors: Martin Leginus Valdas Žemaitis

More information

A Study of the Dirichlet Priors for Term Frequency Normalisation

A Study of the Dirichlet Priors for Term Frequency Normalisation A Study of the Dirichlet Priors for Term Frequency Normalisation ABSTRACT Ben He Department of Computing Science University of Glasgow Glasgow, United Kingdom ben@dcs.gla.ac.uk In Information Retrieval

More information

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada On Top-k Structural 1 Similarity Search Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China 2014/10/14 Pei

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Comparative Document Analysis for Large Text Corpora

Comparative Document Analysis for Large Text Corpora Comparative Document Analysis for Large Text Corpora Xiang Ren Yuanhua Lv Kuansan Wang Jiawei Han University of Illinois at Urbana-Champaign, Urbana, IL, USA Microsoft Research, Redmond, WA, USA {xren7,

More information

Variations of Logistic Regression with Stochastic Gradient Descent

Variations of Logistic Regression with Stochastic Gradient Descent Variations of Logistic Regression with Stochastic Gradient Descent Panqu Wang(pawang@ucsd.edu) Phuc Xuan Nguyen(pxn002@ucsd.edu) January 26, 2012 Abstract In this paper, we extend the traditional logistic

More information

Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory

Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory Bin Gao Tie-an Liu Wei-ing Ma Microsoft Research Asia 4F Sigma Center No. 49 hichun Road Beijing 00080

More information

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang

Relational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang Relational Stacked Denoising Autoencoder for Tag Recommendation Hao Wang Dept. of Computer Science and Engineering Hong Kong University of Science and Technology Joint work with Xingjian Shi and Dit-Yan

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems

A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems Haekyu Park Computer Science and Engineering Seoul National University Seoul, Republic of Korea Email: hkpark627@snu.ac.kr

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

Notes on Markov Networks

Notes on Markov Networks Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum

More information

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Link Analysis Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 The Web as a Directed Graph Page A Anchor hyperlink Page B Assumption 1: A hyperlink between pages

More information

Restricted Boltzmann Machines for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem

More information

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient

More information

Matrix Factorization Techniques for Recommender Systems

Matrix Factorization Techniques for Recommender Systems Matrix Factorization Techniques for Recommender Systems By Yehuda Koren Robert Bell Chris Volinsky Presented by Peng Xu Supervised by Prof. Michel Desmarais 1 Contents 1. Introduction 4. A Basic Matrix

More information

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation Yue Ning 1 Yue Shi 2 Liangjie Hong 2 Huzefa Rangwala 3 Naren Ramakrishnan 1 1 Virginia Tech 2 Yahoo Research. Yue Shi

More information

Collaborative Filtering with Aspect-based Opinion Mining: A Tensor Factorization Approach

Collaborative Filtering with Aspect-based Opinion Mining: A Tensor Factorization Approach 2012 IEEE 12th International Conference on Data Mining Collaborative Filtering with Aspect-based Opinion Mining: A Tensor Factorization Approach Yuanhong Wang,Yang Liu, Xiaohui Yu School of Computer Science

More information

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata ABSTRACT Wei Wu Microsoft Research Asia No 5, Danling Street, Haidian District Beiing, China, 100080 wuwei@microsoft.com

More information

Trinity: Walking on a User-Object-Tag Heterogeneous Network for Personalised Recommendations

Trinity: Walking on a User-Object-Tag Heterogeneous Network for Personalised Recommendations Gan MX, Sun L, Jiang R. : Walking on a user-object-tag heterogeneous network for personalised recommendations. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 31(3): 577 594 May 2016. DOI 10.1007/s11390-016-1648-0

More information

Contextual Bandits in A Collaborative Environment

Contextual Bandits in A Collaborative Environment Contextual Bandits in A Collaborative Environment Qingyun Wu 1, Huazheng Wang 1, Quanquan Gu 2, Hongning Wang 1 1 Department of Computer Science 2 Department of Systems and Information Engineering University

More information

A Matrix Factorization Technique with Trust Propagation for Recommendation in Social Networks

A Matrix Factorization Technique with Trust Propagation for Recommendation in Social Networks A Matrix Factorization Technique with Trust Propagation for Recommendation in Social Networks ABSTRACT Mohsen Jamali School of Computing Science Simon Fraser University Burnaby, BC, Canada mohsen_jamali@cs.sfu.ca

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the

More information

Test Generation for Designs with Multiple Clocks

Test Generation for Designs with Multiple Clocks 39.1 Test Generation for Designs with Multiple Clocks Xijiang Lin and Rob Thompson Mentor Graphics Corp. 8005 SW Boeckman Rd. Wilsonville, OR 97070 Abstract To improve the system performance, designs with

More information

Matrix Factorization with Content Relationships for Media Personalization

Matrix Factorization with Content Relationships for Media Personalization Association for Information Systems AIS Electronic Library (AISeL) Wirtschaftsinformatik Proceedings 013 Wirtschaftsinformatik 013 Matrix Factorization with Content Relationships for Media Personalization

More information

Recurrent Latent Variable Networks for Session-Based Recommendation

Recurrent Latent Variable Networks for Session-Based Recommendation Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)

More information

Data Mining Recitation Notes Week 3

Data Mining Recitation Notes Week 3 Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents

More information

Node similarity and classification

Node similarity and classification Node similarity and classification Davide Mottin, Anton Tsitsulin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Some part of this lecture is taken from: http://web.eecs.umich.edu/~dkoutra/tut/icdm14.html

More information

Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Prediction in Social Networks

Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Prediction in Social Networks Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Prediction in Social Networks Yunwen Chen kddchen@gmail.com Yingwei Xin xinyingwei@gmail.com Lu Yao luyao.2013@gmail.com Zuotao

More information

Discovering Geographical Topics in Twitter

Discovering Geographical Topics in Twitter Discovering Geographical Topics in Twitter Liangjie Hong, Lehigh University Amr Ahmed, Yahoo! Research Alexander J. Smola, Yahoo! Research Siva Gurumurthy, Twitter Kostas Tsioutsiouliklis, Twitter Overview

More information

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University

Recommender Systems EE448, Big Data Mining, Lecture 10. Weinan Zhang Shanghai Jiao Tong University 2018 EE448, Big Data Mining, Lecture 10 Recommender Systems Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of This Course Overview of

More information

Service Selection based on Similarity Measurement for Conditional Qualitative Preference

Service Selection based on Similarity Measurement for Conditional Qualitative Preference Service Selection based on Similarity Measurement for Conditional Qualitative Preference Hongbing Wang, Jie Zhang, Hualan Wang, Yangyu Tang, and Guibing Guo School of Computer Science and Engineering,

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information