Analysis of Multiview Legislative Networks with Structured Matrix Factorization: Does Twitter Influence Translate to the Real World?

Size: px

Start display at page:

Download "Analysis of Multiview Legislative Networks with Structured Matrix Factorization: Does Twitter Influence Translate to the Real World?"

Stanley Payne
5 years ago
Views:

1 Analysis of Multiview Legislative Networks with Structured Matrix Factorization: Does Twitter Influence Translate to the Real World? Shawn Mankad The University of Maryland Joint work with: George Michailidis 1 / 30

Motivation There is a growing literature that attempts to understand and exploit social networking platforms for resource optimization and marketing.

2 Motivation There is a growing literature that attempts to understand and exploit social networking platforms for resource optimization and marketing. We develop new methodology for identifying important accounts based on studying networks that are generated from Twitter, which has over 20 million active accounts each month as of September / 30

3 Motivation Twitter platform Twitter allows accounts to broadcast short messages, referred to as tweets A tweet that is a copy of another account s tweet is called a retweet Within a tweet, an account can mention another account by referring to their account name with symbol as a prefix Accounts also declare the other accounts they are interested in following, which means the follower receives notication whenever a new tweet is posted by the followed account Each of the three actions define networks. Collectively, they define a multiview network. 3 / 30

Motivation Example of Multiview Networks Twitter

the United Kingdom Retweet Network Mentions Network

Liberal Democrats 5 MPs representing the Scottish

4 Motivation Example of Multiview Networks Twitter networks from 418 Members of Parliament (MPs) in the United Kingdom Retweet Network Mentions Network Follows Network 12 Conservative MPs 18 Labour 43 Liberal Democrats 5 MPs representing the Scottish National Party (SNP) 11 MPs belonging to other parties 4 / 30

5 Motivation Motivating Question Can we use the network structures in Twitter to create an influence measure that is a surrogate for real-life MP influence? There are many ways to combine network structure (communities) with network statistics for the identification of influential nodes, (e.g., MPs), but it remains unclear which is the preferred method. We integrate both steps together to address this issue through matrix factorization. PageRank, HITS, etc. 5 / 30

6 Non-negative Matrix Factorization for Network Analysis Outline Motivation Non-negative Matrix Factorization for Network Analysis Structured NMF for Network Analysis Extension to Multiview Networks Application to the Data 6 / 30

7 Non-negative Matrix Factorization for Network Analysis Non-negative Matrix Factorization Let Y be an observed n p matrix that is non-negative. NMF expresses where U R n K +, V Rp K +. Y UV T, / 30

8 Non-negative Matrix Factorization for Network Analysis Why NMF? 1 Better interpretability: NMF SVD Networks, other data from social sciences are typically non-negative 1 Images modified from Xu, W., Liu, X., & Gong, Y. (2003, July). Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (pp ). ACM. 8 / 30

9 Non-negative Matrix Factorization for Network Analysis Interpretations of NMF Y = = K U k Vk T k=1 Mean of Cluster k in R p +... s.t. V jk = 1 k [P(Obs.1 group k),..., P(Obs.n group k)], Ding et al (2009) show NMF equivalence with relaxed K-means. Y ij = (UDV T ) ij s.t. Y ij = 1, V kj = i,j k k P(w i, d j ) = P(w i z k ) P(z k ) P(d j z k ), Ding et al (2008) show NMF equivalence with PLSI. U ik = 1 9 / 30

10 Non-negative Matrix Factorization for Network Analysis Edge Assignment and Overlapping Communities Y ij = U i1 V j U ik V jk, U ik V jk measures the contribution of community k to edge Y ij. Rank 3 NMF SVD (Spectral clustering) 10 / 30

11 Structured NMF for Network Analysis Outline Motivation Non-negative Matrix Factorization for Network Analysis Structured NMF for Network Analysis Extension to Multiview Networks Application to the Data 11 / 30

12 Structured NMF for Network Analysis Structured Semi-NMF We propose min Y SΛV T 2 F, Λ;V 0 where S R n d, Λ R d K, and V R n K +. Each column of S is a node-level network statistic that is calculated a-priori, e.g., c 1 b 1 c 2 b 2 S = c n b n. S are covariates that guide the matrix factorization to more interpretable solutions. Then V can be used to rank nodes within each community. 12 / 30

Structured NMF for Network Analysis Centrality Measures If S is specified, then nodes with different types of local topologies will be emphasized

13 Structured NMF for Network Analysis Centrality Measures If S is specified, then nodes with different types of local topologies will be emphasized in the factorizations. For instance, in each of the following networks, X has higher centrality than Y according to a particular measure. 13 / 30

14 Structured NMF for Network Analysis Analysis Procedure 1. Specify S (node-level statistics), K (number of communities). 2. Perform the matrix factorization. 3. Node i has importance I i = k V ik. 4. Rank nodes according to I. 14 / 30

15 Structured NMF for Network Analysis Semi-NMF If S = I, then min Y ΛV T 2 F, Λ;V 0 which is similar to the standard NMF model. Thus, if S is not specified, then the usual results. 15 / 30

16 Structured NMF for Network Analysis PageRank Structured Semi-NMF with S = I Structured Semi-NMF with S = [Clustering Coefficient] Structured Semi-NMF with S = [Clustering Coefficient, Betweenness, Closeness, Degree] / 30

17 Extension to Multiview Networks Outline Motivation Non-negative Matrix Factorization for Network Analysis Structured NMF for Network Analysis Extension to Multiview Networks Application to the Data 1 / 30

18 Extension to Multiview Networks New Objective Function Each column of S m is a node-level network statistic, e.g., c 1 b 1 c 2 b 2 Then we propose min Λ m,θ 0,V m 0 S m = c n b n Y m S m Λ m (Θ + V m ) T 2 F, where S m R n d, Λ m R d K, and Θ, V m R n K +. m Rows of Θ reveal the overall importance of a node to each community. 18 / 30

19 Extension to Multiview Networks Analysis Procedure 1. Specify S m (node-level statistics), K (number of communities). 2. Perform the matrix factorization. 3. Node i has importance I i = k Θ ik. 4. Rank nodes according to I. 19 / 30

20 Extension to Multiview Networks Approximate Alternating Least Squares Λ m = (Sm T S m ) 1 Sm T A m (Θ + V m )((Θ + V m ) T (Θ + V m )) 1 V m = A T ms m Λ m (Λ T msm T S m Λ m ) 1 Θ = A T ms m Λ m (Λ T msm T S m Λ m ) 1 m To overcome numerical instabilities that occur when too many elements are exactly zero, and maintain non-negativity of Θ and V m, we project to a small constant. 20 / 30

21 Application to the Data Outline Motivation Non-negative Matrix Factorization for Network Analysis Structured NMF for Network Analysis Extension to Multiview Networks Application to the Data 21 / 30

22 Application to the Data Specifying S m S m = (Betweenness, ClusteringCoefficient, Closeness, Degree) Clustering coefficient for a given node quantifies how close its neighbors are to being a complete graph. A higher measure of clustering coefficient could result from an MP creating buzz. Betweenness quantifies the control of a node on the communication between other nodes in a social network, and is computed as the number of shortest paths going through a given node. Closeness is a related centrality measure that quantifies the length of time it would take for information to spread from a given node to all other nodes. Degree, the number of connections a node has obtained, ensures that active MPs are emphasized in the factorization. 22 / 30

23 Application to the Data Rank 2 S m Rank 3 S m Rank 4 S m % Variance Explained % Variance Explained % Variance Explained Estimated Rank of θ, V m Estimated Rank of θ, V m Estimated Rank of θ, V m We set K = 6 and rank of S m = / 30

24 Application to the Data Results: Ranking by Twitter influence Rank Structured Semi-NMF Semi-NMF PageRank HITS 1 Ed Miliband (L, 248) Ed Miliband (L, 248) Ian Austin (L, 3) Michael Dugher (L, 120) 2 Ed Balls (L, 580) Ed Balls (L, 580) William Hague (C, 1) Ed Miliband (L, 248) 3 Tom Watson (L, 253) Michael Dugher (L, 120) Hugo Swire (C, 5) Ed Balls (L, 580) 4 Michael Dugher (L, 120) Tom Watson (L, 253) Tom Watson (L, 253) Chuka Umunna (L, 203) 5 Chuka Umunna (L, 203) Chuka Umunna (L, 203) Ed Balls (L, 580) Andy Burnham (L, 125) 6 Rachel Reeves (L, 54) Rachel Reeves (L, 54) Michael Dugher (L, 120) Tom Watson (L, 253) Stella Creasy (L, 18) Chris Bryant (L, 164) Pat McFadden (L, 1) Rachel Reeves (L, 54) 8 Chris Bryant (L, 164) Stella Creasy (L, 18) Ed Miliband (L, 248) Chris Bryant (L, 164) 9 Tom Harris (L, 113) Luciana Berger (L, 133) Stella Ceasy (L, 18) Diana Johnson (L, 105) 10 David Miliband (L, 489) Andy Burnham (L, 125) Matthew Hancock (C, 32) Tom Harris (L, 113) 24 / 30

25 Application to the Data Results: Twitter influence does translate to the real world Predicting future newspaper coverage with Poisson Regression and various influence measures I where Controls includes Age Gender Constituency Size Political Party HeadlineCount = F (α + βi + γcontrols), Indicator variable denoting whether each MP represents a constituency within the city of London. 25 / 30

26 Application to the Data UK UK without D.Cameron Irish RMSE Structured Semi NMF Semi NMF HITS PageRank None 0 Structured Semi NMF Semi NMF HITS PageRank None 0 Structured Semi NMF Semi NMF HITS PageRank None Method 26 / 30

27 Application to the Data Using Θ and V m to identify interesting substructure: (a) Retweet Network (b) Mentions Network (c) Follows Network 2 / 30

28 Application to the Data Wrap up Key idea: Use network statistics to guide the factorization to better solutions. 1. If we can identify the right local topology, then we can overcome not having dynamic data for certain tasks. 2. The data is exclusively link meta-data. Content analysis can potentially be avoided with network analysis tools for identifying influential users. Important for applications in marketing and intelligence gathering. Thank you! 28 / 30

29 Application to the Data Betweenness Centrality In marketing theory, these are the types: 1. Bridge Node 2. Gateway Node 3. Creation Node 4. Consumption Node Viral marketing depends heavily on high betweeness bridge nodes! 29 / 30

30 Application to the Data Clustering Coefficient The clustering coefficient for node B asks, if A B and B C, is A C connected? The clustering coefficient for a given node is defined as the ratio of closed triads to total possible closed triads. 30 / 30

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL Jing (Selena) He Department of Computer Science, Kennesaw State University Shouling Ji,