Pre-Processing of Dynamic Networks * Its Impact on Observed Communities and Other nalytics Sofus. Macskassy Data Scientist, Facebook Unifying Theory and Experiment for Large-Scale Networks Simons Institute for the Theory of Computing, UC Berkeley November 0, 0 * Work done while at USC/ISI
One problem with dynamic networks t t t t t 5 What does the network look like at time t 5? Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0
One problem with dynamic networks t t t t t 5 What does the network look like at time t 5? Latest snapshot? Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0
One problem with dynamic networks t t t t t 5 What does the network look like at time t 5? Latest snapshot? Recent past? Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0
One problem with dynamic networks t t t t t 5 What does the network look like at time t 5? Latest snapshot? Recent past? Complete past? Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0
One problem with dynamic networks t t t t t 5 What does the network look like at time t 5? Latest snapshot? Recent past? Complete past? Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 Time Decay? 5
One problem with dynamic networks We have seen all types of aggregation in the literature. t Rarely has the method been justified. t t t t 5 What does However, the network it has deep look impact like at on time the t 5? outcome of downstream analytics. Latest snapshot? Recent past? Complete past? Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 Time Decay? 6
How does aggregation affect analysis? We here explore four particular questions:. What does the network look like at time t?. What are the communities and how do they evolve?. How do nodes change (centrality/membership)?. What is the impact on analytics? (e.g., on analytics such as machine learning) Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 7
Generating Network at Time t What is a current network? t any given t ±, may be few, if any, edges Solution: aggregate over a time window However, past edges are also informative Edges from prior window may still have some influence dd edges from prior network with decay parameter Prune edges with low weight (below ) Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 8
Generating Network at Time t Network at time t, then can be defined as djacency matrix at time t: 0 t Final network at time t: = { t' e ij (t δ) t ' t} t = 0 t +α (t δ ) G t = ( V t, E t ) E t = { t e ij a t t ij η}, a ij t V t = { v i j( e ij E t e ji E t )} Normalize t and to result in snapshots G G T Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 9
Tracking Communities Given G t, use modularity clustering to identify k communities (using weighted or unweighted edges) Identify communities from C t- which are also in C t Categorize community actions into four major events Continue: c i (t ) c j t C t = ( c t t,, c k ), c t i = v j v j V t > 0.5* c i (t ) { } and c i (t ) c j t 0.65* c j t Merge: c i (t ) c j t > 0.5* c i (t ) and c i (t ) c j t < 0.65* c j t Split: Significant portions (>0%) of or more communities in C t c i (t ) move into two Death: None of the above Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 0
Tracking Nodes From community actions, we can track nodes Split node movement into three major events Stay: Community continues/merges and node stays with community Leave: Community continues/merges but node goes to another community Other: Community splits or dies Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0
Experimental Study Research question: What is the effect of varying graph- extraction parameters? Methodology:. Select data sets. Vary parameters and extract communities. Track communities over time. Downstream analytics: machine learning Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0
Data Sets used The Enron email data set (http://www.cs.cmu.edu/~enron/) 5 nodes (Enron employees) communicating with each other over years We set = month for our study Edge-weight = # emails between people in a given month 60K emails, containing 9K links over years World trade flows (WTF) (http://www.nber.org/data/) 0 nodes (countries) of trades between countries from 96 through 000 We set = year as that is the granularity of the data Edge-weight of ià j is normalized across all ià k (keep top-0) K links Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0
Varying parameters We performed an empirical study looking at the effects of changing and ={0.5, 0.75, 0.9,.0} ={0.05, 5.0, 0.0} [enron] ={0.0, 0.05, 0.0, 0.5, 0.5, 0.75,.0} [world trade flows] We tested when using weighted and unweighted edges Used to generate attribute values and identifying communities Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0
What does the network look like? World trade flows (99) ( =.0; =0.05) ( =0.5; =0.75) Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 5
Snapshot of evolution ( =0.5; =0.05) World Trade Flows Enron
Snapshot of evolution (WTF: =0.5; =0.05)
Effect on communities stability WTF:'α=.0;'η=0.05;'weighted'edges' WTF:'α=0.5;'η=0.05;'weighted'edges'' C W N N C M O N N C Q N C M P Q C Q N T W V C M Q C Q Q N X C M Q C Q Q N BB Z Y BC BD BE BG B C M Q Q BC BI BH BD BE C M Q Q C M Q Q C M Q Q C M Q Q C M R Legend:' new$(any'filled'shape)' con'nues$in$next$step$ merges$in$next$step$ Q splits$in$next$step$ dissolves$a5er$this$ BC BD BJ BH BE BK B BC BD BJ BK BO BM BC BD BS BK BQ BO BP BC BD BU BS BV BP BR BT BC BD BS BX BZ BY BR BW BC BD BS CB BZ C BC BD BS BZ CF C
Effect on communities stability Enron: =.0; =5.0; weighted edges Enron: =0.5; =5.0; weighted edges O R Q F E P S P Q J E R S K O R Q T F E P S P Q J E R S K O R Q T F E P S P V R U K O R Q U F E P S P W V R U U K O R Q F E P S P W V R X Y O R Q V F E P S P W Z B O R Q F E P S BE W BD BC BB B Legend: BG W W BF BC BB B B new (any filled shape) continues in next step BH.6 BC BB B merges in next step BH BK BJ BL BI B splits in next step dissolves after this
Effect on community sizes World Trade Flow Enron Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 0
Effect on longevity of communities WTF: =.0; =0.05 Group Group Group Group 8 8 8 7 UK Canada Japan Syria Ireland US sian NES Italy Cyprus Colombia China HK Jordan Mauri@us Ecuador Lao ustria Malta Mexico Malaysia Iraq Bermuda Costa Rica Singapore Czechoslovak Fiji El Salvador Thailand Germany Samoa Guatemala China Bulgaria New Zealand Honduras Korea Hungary Kenya Dominican R Vietnam Fm. Yugoslav Hai@ US NES Turkey Trinidad Myanmar Lebanon Jamaica Cambodia Saudi rabia Peru Indonesia lbania Venezuela Philippines Romania Bahamas Taiwan Somalia St. Pierre Poland Fr. Guiana Guyana Suriname WTF: =0.5; =0.05 Group Group Group 8 Canada Japan UK US sian NES Ireland Colombia China HK Cyprus Ecuador Lao Mauri@us Mexico Malaysia Costa Rica Singapore El Salvador Thailand Malawi (9) Guatemala China Fiji (8) Honduras Korea Kenya (8) Dominican R Vietnam Hai@ Trinidad US NES (7) Myanmar (5) Jamaica (6) Kiriba@ (6) Venezuela () Papua N. Guin (6) Bahamas () Peru ()
Effect on Betweenness centrality (world trade flow) =0.5; =0.5 =0.9; =0.5 US US Fm. W. Germany Germany Fm. W. Germany Germany Japan France/Monaco Japan France/Monaco UK
Downstream nalytics: Machine Learning Classification problems Given G G (t-) predict changes in communities going into C t Given G G (t-) predict changes in nodes going into C t ttributes used are purely Community: Density, inter- and intra-link ratio, size, number of triangles, average closeness centrality,... Nodes: Number of triangles, inter- vs intra-link ratio, size of communities linked to, We performed 5x CV Various off-the-shelf ML methods used Logistic regression, decision trees, naïve bayes Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0
Class Distribution: Enron Enron communities C / M / S C / M / S C / M / S α \ η 0.05 5.00 0.00 0.50 5 / 5 / / 55 / 0 / 56 / 0.75 76 / 0 / 79 / / 80 / / 0.90 80 / / 87 / 5 / 96 / 5 /.00 00 / 6 / 0 97 / 6 / 0 / 7 / Enron nodes L / S L / S L / S α \ η 0.05 5.00 0.00 0.50 77 / 59 660 / 908 85 / 65 0.75 505 / 6 55 / 96 6 / 6 0.90 9 / 7 06 / / 8.00 0 / 8 6 / 6 66 / 9 Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0
Class Distribution: WTF World trade flows communities C / M / S C / M / S C / M / S C / M / S C / M / S C / M / S C / M / S α \ η 0.0 0.05 0.0 0.5 0.50 0.75.00 0.50 87 / 60 / 9 / 69 / 0 0 / 79 / 8 6 / 86 / 7 5 / 59 / 6 / 0 / 0 / 0 / 0 0.75 05 / 7 / 0 / / 5 5 / 5 / 6 9 / 66 / 7 75 / 67 / 7 89 / 55 / 65 / / 0 0.90 / 0 / 06 / 0 / 5 9 / 5 / 6 / 6 / 5 / 59 / 6 0 / 58 / 69 / 9 /.00 / 7 / 0 / 8 / 05 / / / 0 / 08 / 7 / / / 8 / 9 / World trade flows nodes L / S L / S L / S L / S L / S L / S L / S α \ η 0.0 0.05 0.0 0.5 0.50 0.75.00 0.50 9 / 795 69 / 768 6 / 65 7 / 709 585 / 976 7 / 56 / 78 0.75 906 / 560 980 / 57 906 / 5 08 / 95 89 / 797 60 / 970 86 / 78 0.90 80 / 580 806 / 50 86 / 59 8 / 56 96 / 986 805 / 887 60 / 765.00 576 / 560 599 / 5578 60 / 5555 6 / 576 70 / 566 69 / 50 68 / 98 Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 5
Effect on downstream analytics Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 6
Effect on downstream analytics Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 7
Conclusion. There are many ways to pre-process dynamic data. Introduced principled parameterized framework. Explored how parameters affected various analytics Take-aways:. Varying parameters can uncover structure. Different parameters needed to answer different questions. Exploring parameters crucial to understand data. Need to make explicit what parameters were used in study and why Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 8
Thank you Sofus. Macskassy Data Scientist, Facebook Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 9