Global and Local Feature Learning for Ego-Network Analysis

Size: px

Start display at page:

Download "Global and Local Feature Learning for Ego-Network Analysis"

Wilfrid Hensley
6 years ago
Views:

1 Global and Local Feature Learning for Ego-Network Analysis Fatemeh Salehi Rizi Michael Granitzer and Konstantin Ziegler TIR Workshop 29 August 2017 (TIR Workshop) University of Passau 29 August / 28

2 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

3 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

4 representation. Adjacency Matrix Graph Embedding Bryan Perozzi Latent Dimensions V d << V Anomaly Detection Attribute Prediction Clustering Link Prediction... DeepWalk: Online Learning of Social Representations Given graph G with a set of nodes V = {v1,..., vn } f : vi 7 yi Rd, d V Methods based on eigen-decomposition of the Adjacency Matrix Methods inspired by NLP and Deep Learning (TIR Workshop) University of Passau 29 August / 28

representation. Adjacency Matrix Graph Embedding Bryan Perozzi Latent Dimensions V d << V Anomaly Detection Attribute Prediction Clustering Link Prediction.

5 representation. Adjacency Matrix Graph Embedding Bryan Perozzi Latent Dimensions V d << V Anomaly Detection Attribute Prediction Clustering Link Prediction... DeepWalk: Online Learning of Social Representations Given graph G with a set of nodes V = {v1,..., vn } f : vi 7 yi Rd, d V Methods based on eigen-decomposition of the Adjacency Matrix Methods inspired by NLP and Deep Learning (TIR Workshop) University of Passau 29 August / 28

6 What is an Ego Network? Social graphs have been divided to several subgraphs (ego-networks) [1] extracting features for nodes detecting distinct neighborhood patterns study social relationships Ego-network [1] ego alters social circles Sport teammates alter ego Colleagues Friends Family members An ego-network with four social circles (TIR Workshop) University of Passau 29 August / 28

Local Neighborhood Analysis Neighborhood around each ego has a different pattern [2] (a) Linked neighbors (b) Strongly linked (c) Dense (d) Complete (e) Powerful ego node (f) Strong ego neighbor (g)

7 Local Neighborhood Analysis Neighborhood around each ego has a different pattern [2] (a) Linked neighbors (b) Strongly linked (c) Dense (d) Complete (e) Powerful ego node (f) Strong ego neighbor (g) Less cohesive star (h) Star Figure 2: Clustering results of ego graphs, depicting the ego node in the middle (red color), with connections to first and second-degree neighbors. This study focuses on the automatic categorization of such ego graphs according to their graph structure. All in all, there exist eight distinct trends. We label each graph according to its characteristics. Cluster(f) (Strong ego neighbors): The ego node has Network C(a) C(b) C(c) C(d) C(e) C(f) C(g) C(h) M Nokia few immediate neighbors and the network is not densely Nokia (TIR Workshop) University of Passau +* * 29 August / 28 BT +*! +*! +! +*! GPS +*! +*! +*! +*! Fri& Fam +*! * +*! +*!

8 Local Neighborhood Analysis Neighborhood around each ego has a different pattern [2] (a) Linked neighbors (b) Strongly linked (c) Dense (d) Complete (e) Powerful ego node (f) Strong ego neighbor (g) Less cohesive star (h) Star Figure 2: Clustering results of ego graphs, depicting the ego node in the middle (red color), with connections to first and second-degree neighbors. This study focuses on the automatic categorization of such ego graphs according to their graph structure. All in all, there exist eight distinct trends. We label each graph according to its characteristics. Finding a vector representation for each ego-network Social circle detection and prediction Cluster(f) (Strong ego neighbors): The ego node has Network C(a) C(b) C(c) C(d) C(e) C(f) C(g) C(h) M Nokia few immediate neighbors and the network is not densely Nokia (TIR Workshop) University of Passau +* * 29 August / 28 BT +*! +*! +! +*! GPS +*! +*! +*! +*! Fri& Fam +*! * +*! +*!

9 Social Circle Prediction Predicting the social circle for a new added alter to the ego-network (TIR Workshop) University of Passau 29 August / 28

10 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

11 Our Contributions We introduce local vector representations for egos to capture neighborhood structures We apply local vectors to the circle prediction problem We replace global representations by local to improve the performance (TIR Workshop) University of Passau 29 August / 28

12 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

13 Vector Representation for Social Graphs DeepWalk [3] walks globally over the graph and samples sequences of nodes treats all these sequences as an artificial corpus feeds the corpus to a Skip-Gram based Word2Vec [5] (TIR Workshop) University of Passau 29 August / 28

14 Vector Representation for Social Graphs DeepWalk [3] walks globally over the graph and samples sequences of nodes treats all these sequences as an artificial corpus feeds the corpus to a Skip-Gram based Word2Vec [5] Word2Vec: Having the sequence of words {w 1, w 2,..., w t 1, w t, w t+1..., w n }, language models aims to maximize P (w t w 1,..., w t 1 ). Input layer Hidden layer Output layer x 1 x 2 x 3 x k h 1 h 2 h i W V N ={w ki } W' N V ={w' ij } h N y 1 y 2 y 3 y j x V y V Figure 1: A simple CBOW model with only one word in the context (TIR Workshop) University of Passau 29 August / 28

15 Vector Representation for Social Graphs DeepWalk [3] walks globally over the graph and samples sequences of nodes treats all these sequences as an artificial corpus feeds the corpus to a Skip-Gram based Word2Vec [5] Word2Vec: Having the sequence of words {w1, w2,..., wt 1, wt, wt+1..., wn }, language models aims to maximize P (wt w1,..., wt 1 ). given sequence nodes {v1, v2,..., vt 1, vt, vt+1,..., vn } it Visual Example Pnof log P r(v vt+c,..., vt1, vt+1,..., vt c ) maximizes: t=1 On Zachary s KaratetGraph: glo : V Rd Input Output Zachary s karate club embedding [2] Bryan Perozzi (TIR Workshop) DeepWalk: Online Learning of Social Representations University of Passau 29 August / 28

16 Vector Representation for Social Graphs node2vec [4] similar to DeepWalk with two additional parameters hyper-parameters p R + and q R + control random walks q > 1 and p < min(q, 1) walk locally (BFS) p > 1 and q < min(q, 1)) walk explorative (DFS) belong to (i.e., hod be based on the structural equivawe observe nodes mmunity of nodes, munities share the etworks commonly s essential to allow sentations obeying that embed nodes ether, as well as to ilar roles have simning algorithms to prediction tasks. s 1 s 3 u s 2 s 4 (TIR Workshop) University of Passau 29 August / 28 s 7 s 5 s 6 s 8 s 9 BFS DFS Figure 1: BFS and DFS search strategies from node u (k = 3). principles in network science, providing flexibility in discovering representations conforming to different equivalences. 3. We extend node2vec and other feature learning methods based

17 Vector Representation for Social Graphs node2vec [4] similar to DeepWalk with two additional parameters hyper-parameters p R + and q R + control random walks q > 1 and p < min(q, 1) walk locally (BFS) p > 1 and q < min(q, 1)) walk explorative (DFS) belong to (i.e., hod be based on the structural equivawe observe nodes mmunity of nodes, munities share the etworks commonly s essential to allow sentations obeying that embed nodes ether, as well as to ilar roles have simning algorithms to prediction tasks. s 1 s 3 u Even the local walk can exceed the ego-network s 2 s 4 (TIR Workshop) University of Passau 29 August / 28 s 7 s 5 s 6 s 8 s 9 BFS DFS Figure 1: BFS and DFS search strategies from node u (k = 3). principles in network science, providing flexibility in discovering representations conforming to different equivalences. 3. We extend node2vec and other feature learning methods based

18 4:12 J. McAuley and J. Leskovec Social Circle Prediction by et al. [1] ALGORITHM 2: Update memberships node x and circle k. Data: nodex whose membership to circle Ck is to be updated Result: updated membership for node x initialize l k x (0) := 0, lk x (1) := 0; construct a dummy node x0 with the communities and features of x but with x / Ck; construct a dummy node x1 with the communities and features of x but with x Ck; for (c, f ) dom(types) do // c = community string, f = feature string n := types(c, f ); // n = number of nodes of this type if S(x) = c Q(x) = f then // avoid including a self-loop on x n := n 1; end construct a dummy node y with community memberships c and features f ; // first compute probabilities assuming all pairs (x, y) are non-edges l k x (0) := lk x (0) + nlog p((x0, y) / E); l k x (1) := lk x (1) + nlog p((x1, y) / E); end for (x, y) E do // correct for edges incident on x l k x (0) := lk x (0) log p((x0, y) / E) + log p((x0, y) E); l k x (1) := lk x (1) log p((x1, y) / E) + log p((x1, y) E); end // update membership to circle k types(s(x), Q(x)) := types(s(x), Q(x)) 1; z U(0, 1); if z < exp {T (l k x (1) lk x (0))} then S(x)[k] := 1 else S(x)[k] := 0 end types(s(x), Q(x)) := types(s(x), Q(x)) + 1; 4:14 J. McAuley and J. Leskove Fig. 3. Feature construction. Profiles are tree structured, and we construct features by comparing paths those trees.examples of trees for two users x (blue) and y (pink) are shown at top. Two schemes for construc ing feature vectors from these profiles are shown at bottom. (1) (bottom left) We construct binary indicato measuring the difference between leaves in the two trees for example, work position Cryptanalyst a pears in both trees. (2) (bottom right) We sum over the leaf nodes in the first scheme, maintaining the fa that the two users worked at the same institution but discarding the identity of that institution. A Probabilistic Classifier Interms Time of Complexity Big-O notation, O(n 3 ) our MCMC algorithm computes updates in worst case O( N 2 7. CONSTRUCTING FEATURES FROM USER PROFILES ) per iteration, whereas our pseudo-boolean optimization algorithm of Section 3 (which is (TIR based Workshop) on maximum flow) requires O( N 3 University ). These are of Passau both loose, worst-case 29 August / 28 Profile information in all of our datasets can be represented as a tree where each lev encodes increasingly specific information (Figure 3, left). In other words, user profile are organized into increasingly specific categories. For example, a user s profile migh

19 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

20 Local Representations using Paragraph Vector walking locally over an ego-network to generate sequence of nodes treating this sequence as an artificial paragraph applying Paragraph Vector [6] to learn vector representation (TIR Workshop) University of Passau 29 August / 28

21 Local Representations using Paragraph Vector walking locally over an ego-network to generate sequence of nodes treating this sequence as an artificial paragraph applying Paragraph Vector [6] to learn vector representation given an artificial paragraph v 1, v 2, v 3,..., v t,..., v l for ego u i, it maximizes the average log probability: loc: U R d l log P r(v t u i, v t+c,..., v t c ) t=1 (TIR Workshop) University of Passau 29 August / 28

22 Social Circle Prediction Setting social network G = (V, E) with egos U and alters V \ U profile information (v. feat 1,..., v. feat f ) for every v V Input/Output predict social circles c: V \ U {C 1,..., C k } given several samples Approach Feature selection (users profile information, graph embeddings) A Neural Network Classifier (TIR Workshop) University of Passau 29 August / 28

23 Incorporating Profile Information Similarity of ego s and alter s profile as a feature ego s profile feature: u. feat 1,..., u. feat f alter s profile feature: v. feat 1,..., v. feat f sim(u, v) = (b 1,..., b f ), where { 1 if u. feat i = v. feat i, b i = 0 otherwise. (1) (TIR Workshop) University of Passau 29 August / 28

24 Social Circle Prediction A feed-forward neural network classifier Input layer Hidden layer Output layer Predicting social circle for alter v which belongs to ego-network of ego u Input layer: locglo: loc(u) glo(v) gloglo: glo(u) glo(v) locgloglo: loc(u) glo(u) glo(v) locglosim: loc(u) glo(v sim(u, v) gloglosim: glo(u) glo(v) sim(u, v) locgloglosim: loc(u) glo(u) glo(v) sim(u, v) Hidden layer: a single dense layer with ReLU activation units Output layer: softmax units (same number as circles) Ground-truth alter s circle label (family, colleagues, etc) (TIR Workshop) University of Passau 29 August / 28

25 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

26 Dataset Table 1: Statistics of Social Network Datasets [7] Facebook Twitter Google+ nodes V 4,039 81, ,614 edges E 88,234 1,768,149 13,673,453 egos U circles C features f 576 2,271 4,122 (TIR Workshop) University of Passau 29 August / 28

27 Experimental Results Table 2: Performance (F 1 -score) of different embeddings for circle prediction on three dataset. Standard deviation is less than 0.02 for all experiments. Approach Facebook Twitter Google+ gloglo locglo locgloglo gloglosim locglosim locgloglosim Φ 1, McAuley & Leskovec [1] (TIR Workshop) University of Passau 29 August / 28

28 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

29 Future Work Using embeddings to approximate more complex measures (e.g. shortest-path distance) Using embedding to find similar egos Learning embedding for directed graphs (TIR Workshop) University of Passau 29 August / 28

30 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

31 References McAuley, Julian J., and Jure Leskovec. Learning to Discover Social Circles in Ego Networks. In NIPS, vol. 2012, pp Muhammad, Syed Agha, and Kristof Van Laerhoven. DUKE: A Solution for Discovering Neighborhood Patterns in Ego Networks. In ICWSM, pp Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp ACM, Grover, Aditya, and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp ACM, T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013 Le, Quoc V., and Tomas Mikolov. Distributed Representations of Sentences and Documents. In ICML, vol. 14, pp (TIR Workshop) University of Passau 29 August / 28

32 Word2vec [5] Language Modeling Distributional Hypothesis in the natural languages: semantically similar words dispose to appear in similar word neighborhoods Having the sequence of words {w 1, w 2,..., w t 1, w t,..., w n }, language models aims to maximize P (w t w 1,..., w t 1 ). In word2vec [2], they defined a fix context length surrounding each word with length context c and the sequence of words {w 1, w 2,..., w t 1, w t,..., w n } the goal is to word2vec is to maximize: n t=1 log P (w t w t+c,..., w t c ) The neural network which learns word representations: One hidden layer The number of input layer entries is equal to the vocabulary size of the text The number of units in the hidden layer determines dimensionality of the vectors (TIR Workshop) University of Passau 29 August / 28

33 Word2vec [5] Having a sequence of words {w 1, w 2,..., w t 1, w t,..., w n } and context window with length one Predict one target word, given one context word P (w t w t 1 ). Input layer Hidden layer Output layer x 1 x 2 x 3 x k h 1 h 2 h i W V N ={w ki } W' h N V ={w' ij } N y 1 y 2 y 3 y j x V y V Figure 1: A simple CBOW model with only one word in the context Each input is a one-hot encoding vector layers are fully connected. The input is a one-hot encoded vector, which means for a given input context word, only one out of V units, {x1,, xv }, will be 1, and all other units are 0. The weights between the input layer and the hidden layer can be represented by a V N matrix W. Each row of W is the N-dimension vector representation vw of the associated word of the input layer. Formally, row i of W is vw. T Given a context (a word), assuming xk = 1 and xk = 0 for k k, we have V h = W T x = W(k, ) T := vt, i=1 eu j (1) wi The probability is computed using the softmax function: h = W T x, u = W T e h, P (w t w t 1 ) = y t = u t After several iterations matrix W will not change (TIR Workshop) which is essentially copying the k-th row of W to h. vwi is the vector representation of the input word wi. This implies that the link (activation) function of the hidden layer units is simply linear (i.e., directly passing its weighted sum of inputs to the next layer). From the hidden layer to the output layer, there is a different weight matrix W = {w ij }, which is an N V matrix. University Using these of weights, Passau we can compute a score uj for each word 29 August / 28

34 t al., 2013d). It is also posanslate words and phrases l., 2013b). Paragraph Vector ctors attractive for many s such as language modolov, 2012), natural lant & Weston, 2008; Zhila translation (Mikolov et al., ge understanding (Frome tion (Socher et al., 2013a). Given a sequence theof softmax paragraphs weights, are p 1 fixed., p 2,..., p q and training words w 1, w 2, w 3,..., Suppose w t,... wthat n, there the idea are Nof paragraphs Paragraph in the Vector corpus, [6] M is to maximize p(w t p j, w t c..., w t+c )) For example consider 2 paragraphs and window size of 3 ibuted memory model raph vectors is inspired by d vectors. The inspiration to contribute to a predicthe sentence. So despite initialized randomly, they as an indirect result of the idea in our paragraph vecaragraph vectors are also tion task of the next word m the paragraph. ork (see Figure 2), every e vector, represented by a word is also mapped to a column in matrix W. The rs are averaged or concate- Figure 2. A framework for learning paragraph vector. This framework (TIR Workshop) is similar to the University framework of Passau presented in Figure 1; the only29 August / 28 our model. At prediction time, one needs to perform an inference step to compute the paragraph vector for a new paragraph. This is also obtained by gradient descent. In this step, the parameters for the rest of the model, the word vectors W and words in the vocabulary, and we want to learn paragraph vectors such that each paragraph is mapped to p dimensions and each word is mapped to q dimensions, then the model has the total of N p + M q parameters (excluding the softmax parameters). Even though the number of parameters can be large when N is large, the updates during training are typically sparse and thus efficient. P 1 : The cat sat on the mat P 2 : I ate potato crisps for evening snack p(on P 1, T he, cat, sat)

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat