Node similarity and classification

Size: px

Start display at page:

Download "Node similarity and classification"

Louisa Preston
6 years ago
Views:

1 Node similarity and classification Davide Mottin, Anton Tsitsulin HassoPlattner Institute Graph Mining course Winter Semester 2017

2 Acknowledgements Some part of this lecture is taken from: Other adapted content is from Social Network Data Analytics (Springer) Ed. Charu Aggarwal, March

3 SimRank Structural context Idea: two objects are similar if they are referenced by similar objects decay factor in [0,1] total # of in-neighbors pairs Avg similarity between in-neighbors of α and in-neighbors of b similarity of inneighbors G(V,E) α b Glen Jeh and Jennifer Widom. SimRank: a measure of structural-context similarity. SIGKDD,

4 SimRank: Intuition G(V,E) G 2 (V 2,E 2 ) Structural context Intuition: Computing SimRank is like propagating on the G " graph of node-node pairs 1. The source of similarity is self-vertices, like (Univ, Univ). 2. Similarity propagates along pair-paths in G ", away from the sources. 4

similarity between in-neighbors of c and in-neighbors of d [Jeh, Widom

5 SimRank for Bipartite Graphs G(V,E) A,B Avg similarity between out-neighbors of A and out-neighbors of B music videos music c d c,d Avg similarity between in-neighbors of c and in-neighbors of d [Jeh, Widom 02; Improvements: Antonellis+ 08 SimRank++, C. Li, Han+ 10, Y. Zhang 13, P. Li+ 14 ] 5

6 Lecture road Similarity based Iterative classification Label propagation 6

7 Iterative classification methods Idea: Use features that take into account the neighbor nodes and repeat the classification several time until nothing changes Suppose for each node we have two features: 1. Number of neighbors with class A 2. Number of neighbors without a class [1,1] [1,1] A A Repeat until convergence [1,1] [1,1] A A [1,1] [1,1] A A Learn A Recompute [0,1] Labels and [2,1] [0,1] features on apply to nodes [2,1] [1,0] unlabeled A [2,1] Neville, J. and Jensen, D., Iterative classification in relational data. AAAI. 7

8 Iterative Classification Algorithm (ICA) Train classifier using the labeled instances Until convergence Apply classifier to the unlabeled nodes Updated the feature vectors for unlabeled nodes Return the labels for the labeled nodes Neville, J. and Jensen, D., Iterative classification in relational data. AAAI. 8

9 Extension to multi-labels Each node has a distribution over the labels To avoid noise keep only the top-k labels for each unlabeled node sorted in descending order. Intuition: remove the less confident labels Lu, Q. and Getoor, L., Link-based classification. In ICML. 9

10 Lecture road Similarity based Iterative classification Label propagation 10

11 Guilt-by-Association Techniques????? Given: graph and few labeled nodes Find: class (red/green) for rest nodes Assuming: network effect (homophily/ heterophily) red green Fraudster Accomplice Honest 11

12 Personalized Random Walk with Restarts (RWR) Idea: Propagate labels from a set of nodes to the rest of the graph [Brin+ 98; Haveliwala 03; Tong+ 06; Minkov, Cohen 07] 12

13 PageRank: A kind of random walk John Andrew Bill Laura The importance of John is high if Laura, Andrew, and Bill are also important P_Rank John = -_./ _./01 6/78/ 5-_./01 90:8;< =>?@AB DE FGHIFGJK 13

Random Walk In a random walk you do the opposite, you assume that the walker moves randomly and chooses one of the neighbors to visit 1 1/2 Andrew Tom 1/2 John 1.

14 Random Walk In a random walk you do the opposite, you assume that the walker moves randomly and chooses one of the neighbors to visit 1 1/2 Andrew Tom 1/2 John 1. Tom chooses Andrew or Laura with probability 1/2. 2. Once he chooses one it increases the number of times he visited that node 3. Continue the process until nothing changes anymore (at a probabilistic level) 12 Laura John will receive many visits since many nodes are connected to him 14

15 Personalized RWR Tom Now assume that with probability c you perform another move and with probability (1-c) you jump back to Tom Andrew Laura Therefore the probability for the walker of being in Tom place will be Probability of visiting Andrew at time (t-1) Tom t = c Prob Andrew t 1 + Prob Laura t (1 c) Number of Andrew and Laura s neighbors Probability of jumping back to Tom 15

16 Comparing two nodes Start two random walks from the two nodes you want to compare separately Compare the final scores you obtain for each node in the graph using some vector comparison (e.g., cosine similarity, KL-divergence) Tom Andrew Alice John Paul [Tom, Andrew, Alice, John,, Paul] Vector for Tom = [0.2, 0.2, 0.1, 0.3,, 0.01] Vector for John = [0.05, 0.15, 0.2, 0.2,..., 0.2] Compare the two vectors (e.g., subtract) 16

17 Personalized RWR Tom Now assume that with probability c you perform another move and with probability (1-c) you jump back to Tom restart prob [I cd 1 A]x = (1 c)y ½ ½ ½ 0 ½ 0 1 0? graph relevance structure vector starting vector 17

18 Personalized RWR Personalized RWR is defined as the probability of a random surfer of reaching a node n after wandering in the graph for a long time After some time the value for n (and for all the other nodes) does not changes anymore Technically speaking the random walk converges to a stationary distribution Let A be the adjacency matrix of graph G = V, E x = cd Hf Ax + 1 c y where D is a diagonal matrix with the node degree in the diagonal and y is a vector containing the probability of starting from any of the nodes y 3, x is the probability of reaching any node in the graph dim x = dim y = V What is D Hf A? Recall that the inverse of diagonal matrix is a diagonal matrix containing the reciprocal of the elements in the diagonal We need to find x Ix = cd Hf Ax + 1 c y Ix cd Hf Ax = 1 c y (I cd Hf A)x = 1 c y x = I cd Hf A Hf 1 c y 18

19 Another way of looking at the RW x = cd Hf Ax + 1 c y Suppose the surfer starts from the beginning, it will choose one node at random among y In one step he will either choose, with probability (1-c) another starting node or with probability c to move to one neighbor. Why? Think about the process without restart, setting W = D Hf A. x f = Wx r x " = Wx f = WWx r = W " x r x 0 = W 0 x r That means that W 3,o 0 contains the probability of reaching j starting from i in n steps 19

20 Semi-Supervised Learning (SSL) Idea If you have few labeled nodes exploit the structure and the homophily (similarity) between nodes graph-based few labeled nodes edges: similarity between nodes Inference: exploit neighborhood information S T E P 1?? -0.3 S T E P Ji, M., Sun, Y., Danilevsky, M., Han, J. and Gao, J., Graph regularized transductive classification on heterogeneous information networks. ECML/PKDD. 20

21 SSL Equation Laplacian matrix 0.8 [I + a(d A)]x = y ? 1 d graph final homophily strength of structure labels neighbors ~ stiffness of spring d1 d known labels??

22 SSL Equation What does it compute? [I + a(d A)]x = y Hard? Let s unroll it! x = a D A x + y = aax adx + y 0 x 3 = a A 3o x o + (y 3 ad 33 x 3 ) o f Sum the labels from the neighbors Difference between the learned label in the node and the input label 22

23 When is RWR = SSL? [I cd 1 A]x = (1 c)y [I + a(d A)]x = y *all nodes have degree d (1 c) I cd Hf A Hf = I + a D A Hf Hf 1 c I 1 c 1 c DHf A = I + a D A Hf 1 c I 1 c 1 c DHf A = I + a D A 1 c I I 1 c 1 c DHf A = a D A c c I 1 c 1 c DHf A = a D A c 1 c (I DHf A) = a D A c 1 c (I 1 A) = a D A d c D A = ad D A 1 c c d(1 c) = a We assume the graph is d-regular* 23

Belief Propagation Iterative message-based method Propagation matrix : ² Homophily class of receiver AI class of sender 0.9 0.1 0.1 0.9... 12 until st nd round stop criterion fulfilled PL - Pearl, J.

24 Belief Propagation Iterative message-based method Propagation matrix : ² Homophily class of receiver AI class of sender until st nd round stop criterion fulfilled PL - Pearl, J., Reverend Bayes on inference engines: A distributed hierarchical approach. In AAAI. - Yedidia, J.S., Freeman, W.T. and Weiss, Y., Understanding belief propagation and its generalizations. Exploring artificial intelligence in the new millennium. 24

25 Belief Propagation Iterative message-based method Propagation matrix : ² Homophily class of receiver class of sender ² Heterophily

26 Belief Propagation Equations message(i > j) belief(i) x homophily strength i j 26

27 Belief Propagation Equations belief of i b i (x i ) φ i (x i ) m ij (x i ) prior belief j N(i) messages from neighbors i j 27

$i ) j N (i) non-linear n N(i)\ j FaBP$ approximated by [I + ad c'a] b h = φ h 1

28 Fast Belief Propagation Original [Yedidia+]: Belief Propagation m ij (x j ) = φ i (x i ) ψ ij (x i, x j ) m ni (x i ) xi b i (x i ) = η φ i (x i ) m ij (x i ) j N (i) non-linear n N(i)\ j FaBP [Koutra+]: Linearized BP BPis approximated by [I + ad c'a] b h = φ h d1 d2 d3 linear ? prior beliefs Koutra, D., Ke, T.Y., Kang, U., Chau, D.H.P., Pao, H.K.K. and Faloutsos, C., Unifying guilt-by-association approaches: Theorems and fast algorithms. PKDD 28

29 Wait How? m ij (x j ) = φ i (x i ) ψ ij (x i, x j ) m ni (x i ) xi b i (x i ) = η φ i (x i ) m ij (x i ) j N (i) n N(i)\ j [I + ad c'a] b h = φ h [I + ad c'a] b h = φ h Ib ƒ + adb ƒ c Ab = φ ƒ b ƒ = ad c A b + φ ƒ b ƒ i = ad 33 b ƒ (i) c o A 3o b ƒ j + φ ƒ (i) Exchanged messages 29

30 Qualitative Comparison GBA Method Heterophily Scalability Convergence RWR SSL BP? FABP 30

31 Correspondence of Methods RWR SSL BP Random Walk Semi-supervised Belief with Restart Learning Propagation Method Matrix unknown known RWR [I c AD -1 ] x = (1-c)y SSL [I + a(d - A)] x = y FABP [I + a D - c A] b h = φ h ?

32 Questions? 32

33 References Lorrain, F. and White, H.C.. Structural equivalence of individuals in social networks. The Journal of mathematical sociology, Borgatti, S.P. and Everett, M.G., Notions of position in social network analysis. Sociological methodology. Borgatti, S.P. and Everett, M.G., Regular blockmodels of multiway, multimode matrices. Social Networks. Henderson, K., Gallagher, B., Eliassi-Rad, T., Tong, H., Basu, S., Akoglu, L., Koutra, D., Faloutsos, C. and Li, L., Rolx: structural role extraction & mining in large graphs. SIGKDD Henderson, K., Gallagher, B., Li, L., Akoglu, L., Eliassi-Rad, T., Tong, H. and Faloutsos, C., It's who you know: graph mining using recursive structural features. SIGKDD. H. Tong, Y. Koren, & C. Faloutsos. Fast direction-aware proximity for graph mining. SIGKDD, Glen Jeh and Jennifer Widom. SimRank: a measure of structural-context similarity. SIGKDD, 2002 Ji, M., Sun, Y., Danilevsky, M., Han, J. and Gao, J., Graph regularized transductive classification on heterogeneous information networks. ECML/PKDD. Pearl, J., Reverend Bayes on inference engines: A distributed hierarchical approach. In AAAI. Yedidia, J.S., Freeman, W.T. and Weiss, Y., Understanding belief propagation and its generalizations. Exploring artificial intelligence in the new millennium. Koutra, D., Ke, T.Y., Kang, U., Chau, D.H.P., Pao, H.K.K. and Faloutsos, C., Unifying guilt-byassociation approaches: Theorems and fast algorithms. PKDD Neville, J. and Jensen, D., Iterative classification in relational data. AAAI. Lu, Q. and Getoor, L., Link-based classification. In ICML. 33

Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms

Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra 1, Tai-You Ke 2,U.Kang 1, Duen Horng (Polo) Chau 1, Hsing-Kuo Kenneth Pao 2, and Christos Faloutsos 1 1 School of Computer