Pre-Processing of Dynamic Networks * Its Impact on Observed Communities and Other Analytics

Similar documents
About the Authors Geography and Tourism: The Attraction of Place p. 1 The Elements of Geography p. 2 Themes of Geography p. 4 Location: The Where of

DISTILLED SPIRITS - EXPORTS BY VALUE DECEMBER 2017

Supplementary Appendix for. Version: February 3, 2014

Canadian Imports of Honey

Situation on the death penalty in the world. UNGA Vote 2012 Resolutio n 67/176. UNGA Vote 2010 Resolutio n 65/206. UNGA Vote 2008 Resolutio n 63/168

2017 Source of Foreign Income Earned By Fund

READY TO SCRAP: HOW MANY VESSELS AT DEMOLITION VALUE?

Country of Citizenship, College-Wide - All Students, Fall 2014

Appendices. Please note that Internet resources are of a time-sensitive nature and URL addresses may often change or be deleted.

DISTILLED SPIRITS - IMPORTS BY VALUE DECEMBER 2017

DISTILLED SPIRITS - IMPORTS BY VOLUME DECEMBER 2017

ICC Rev August 2010 Original: English. Agreement. International Coffee Council 105 th Session September 2010 London, England

International Student Enrollment Fall 2018 By CIP Code, Country of Citizenship, and Education Level Harpur College of Arts and Sciences

PRECURSORS. Pseudoephedrine preparations 3,4-MDP-2-P a P-2-P b. Ephedrine

PROPOSED BUDGET FOR THE PROGRAMME OF WORK OF THE CONVENTION ON BIOLOGICAL DIVERSITY FOR THE BIENNIUM Corrigendum

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional

Governments that have requested pre-export notifications pursuant to article 12, paragraph 10 (a), of the 1988 Convention

Office of Budget & Planning 311 Thomas Boyd Hall Baton Rouge, LA Telephone 225/ Fax 225/

Fall International Student Enrollment Statistics

USDA Dairy Import License Circular for 2018

2001 Environmental Sustainability Index

Does socio-economic indicator influent ICT variable? II. Method of data collection, Objective and data gathered

TMM UPDATE TRANS DAY OF REMEMBRANCE 2017

Spring 2007 International Student Enrollment by Country, Educational Level, and Gender

Mexico, Central America and the Caribbean South America

Most Recent Periodic Report Initial State Report. Next Periodic Accession/Ratification. Report Publication Publication. Report Due

Official Journal of the European Union L 312/19 COMMISSION

USDA Dairy Import License Circular for 2018 Commodity/

Demography, Time and Space

SUGAR YEAR BOOK INTERNATIONAL SUGAR ORGANIZATION 1 CANADA SQUARE, CANARY WHARF, LONDON, E14 5AA.

GINA Children. II Global Index for humanitarian Needs Assessment (GINA 2004) Sheet N V V VI VIII IX X XI XII XII HDR2003 HDR 2003 UNDP

PROPOSED BUDGET FOR THE PROGRAMME OF WORK OF THE CARTAGENA PROTOCOL ON BIOSAFETY FOR THE BIENNIUM Corrigendum

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional

MOCK EXAMINATION 1. Name Class Date INSTRUCTIONS

Asia. JigsawGeo. Free Printable Maps for Geography Education. Try our geography games for the ipod Touch or iphone.

Solow model: Convergence

USDA Dairy Import License Circular for 2018

Patent Cooperation Treaty (PCT) Working Group

Programme budget for the biennium Programme budget for the biennium

Do Policy-Related Shocks Affect Real Exchange Rates? An Empirical Analysis Using Sign Restrictions and a Penalty-Function Approach

Fall International Student Enrollment Statistics

Fall International Student Enrollment & Scholar Statistics

AT&T Phone. International Calling Rates for Phone International Plus, Phone 200 and Phone Unlimited North America

How Well Are Recessions and Recoveries Forecast? Prakash Loungani, Herman Stekler and Natalia Tamirisa

Office of Budget & Planning 311 Thomas Boyd Hall Baton Rouge, LA Telephone 225/ Fax 225/

Appendix A. ICT Core Indicators: Definitions

Swaziland Posts and Telecommunications Corporation (SPTC)---International Call Charges

The Chemical Weapons Convention, Biological and Toxin Weapons Convention, Geneva Protocol

AP Human Geography Summer Assignment

Does Corruption Persist In Sub-Saharan Africa?

INTERNATIONAL S T U D E N T E N R O L L M E N T

CMY. SCIENCE (CHEMISTRY) See C. P. B.Sc. Hons., Dip.Ed. A. B. Terence B.Sc., M.Ed., M.A. (IDT), PGDE

Velocity Virtual Rate Card 2018

Radiation Protection Procedures

Chapter 8 - Appendixes

Research Exercise 1: Instructions

Kernel Wt. 593,190,150 1,218,046,237 1,811,236, ,364, ,826, ,191, Crop Year

Bilateral Labour Agreements, 2004

Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts

04 June Dim A W V Total. Total Laser Met

Additional VEX Worlds 2019 Spot Allocations

International legal instruments related to the prevention and suppression of international terrorism

page 1 Total ( )

CONTINENT WISE ANALYSIS OF ZOOLOGICAL SCIENCE PERIODICALS: A SCIENTOMETRIC STUDY

ProxiWorld tariffs & zones 2016

Government Size and Economic Growth: A new Framework and Some Evidence from Cross-Section and Time-Series Data

1. Impacts of Natural Disasters by Region, 2008

2,152,283. Japan 6,350,859 32,301 6,383,160 58,239, ,790 58,464,425 6,091, ,091,085 52,565, ,420 52,768,905

Immigrant Status and Period of Immigration Newfoundland and Labrador 2001 Census

University of Oklahoma, Norman Campus International Student Report Fall 2014

Delegations School GA Opening Speech 1 SPC Opening Speech 2 SC Total Amnesty International Agora Sant Cugat Botswana Agora Sant Cugat 1 Y 1 Y

Forecast Million Lbs. % Change 1. Carryin August 1, ,677, ,001, % 45.0

International Institute of Seismology and Earthquake Engineering

Dimensionality Reduction and Visualization

Countries in Order of Increasing Per Capita Income, 2000

Export Destinations and Input Prices. Appendix A

Bahrain, Israel, Jordan, Kuwait, Saudi Arabia, United Arab Emirates, Azerbaijan, Iraq, Qatar and Sudan.

Florida's Refugee Population Statistical Report

Human resources: update

Report by the Secretariat

Table of Contents. Alumni. Introduction

PTV South America City Map 2018 (Standardmap)

Climate variability and international migration: an empirical analysis

Nigerian Capital Importation QUARTER THREE 2016

Natural Resource Management Indicators for the Least Developed Countries

Fertility and population policy

Lecture 5: Political Institutions and Policy Outcomes

Yodekoo Business Pro Tariff (Including Quickstart Out Of Bundle)

Connectivity in the LAC region

2014/CTI/WKSP7/008 Experiences in Implementation of Safeguards and Transitional Safeguards from Business Perspective

INTERNATIONAL TELECOMMUNICATION UNION SERIES T: TERMINALS FOR TELEMATIC SERVICES

W o r l d O i l a n d G a s R e v i e w

November 2014 CL 150/LIM 2 COUNCIL. Hundred and Fiftieth Session. Rome, 1-5 December 2014

PTV South America City Map 2017 (Standardmap)

Hundred and Fifty-sixth Session. Rome, 3-7 November Status of Current Assessments and Arrears as at 30 June 2014

Erratum to: Policies against human trafficking: the role of religion and political institutions

AP Human Geography Summer Assignment (2014)

Marketing Report: Traffic Demographics (Monthly Comprehensive)

Publication Date: 15 Jan 2015 Effective Date: 12 Jan 2015 Addendum 6 to the CRI Technical Report (Version: 2014, Update 1)

Figure S1. Log (rig activity) and log(real oil price). US

Transcription:

Pre-Processing of Dynamic Networks * Its Impact on Observed Communities and Other nalytics Sofus. Macskassy Data Scientist, Facebook Unifying Theory and Experiment for Large-Scale Networks Simons Institute for the Theory of Computing, UC Berkeley November 0, 0 * Work done while at USC/ISI

One problem with dynamic networks t t t t t 5 What does the network look like at time t 5? Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0

One problem with dynamic networks t t t t t 5 What does the network look like at time t 5? Latest snapshot? Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0

One problem with dynamic networks t t t t t 5 What does the network look like at time t 5? Latest snapshot? Recent past? Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0

One problem with dynamic networks t t t t t 5 What does the network look like at time t 5? Latest snapshot? Recent past? Complete past? Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0

One problem with dynamic networks t t t t t 5 What does the network look like at time t 5? Latest snapshot? Recent past? Complete past? Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 Time Decay? 5

One problem with dynamic networks We have seen all types of aggregation in the literature. t Rarely has the method been justified. t t t t 5 What does However, the network it has deep look impact like at on time the t 5? outcome of downstream analytics. Latest snapshot? Recent past? Complete past? Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 Time Decay? 6

How does aggregation affect analysis? We here explore four particular questions:. What does the network look like at time t?. What are the communities and how do they evolve?. How do nodes change (centrality/membership)?. What is the impact on analytics? (e.g., on analytics such as machine learning) Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 7

Generating Network at Time t What is a current network? t any given t ±, may be few, if any, edges Solution: aggregate over a time window However, past edges are also informative Edges from prior window may still have some influence dd edges from prior network with decay parameter Prune edges with low weight (below ) Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 8

Generating Network at Time t Network at time t, then can be defined as djacency matrix at time t: 0 t Final network at time t: = { t' e ij (t δ) t ' t} t = 0 t +α (t δ ) G t = ( V t, E t ) E t = { t e ij a t t ij η}, a ij t V t = { v i j( e ij E t e ji E t )} Normalize t and to result in snapshots G G T Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 9

Tracking Communities Given G t, use modularity clustering to identify k communities (using weighted or unweighted edges) Identify communities from C t- which are also in C t Categorize community actions into four major events Continue: c i (t ) c j t C t = ( c t t,, c k ), c t i = v j v j V t > 0.5* c i (t ) { } and c i (t ) c j t 0.65* c j t Merge: c i (t ) c j t > 0.5* c i (t ) and c i (t ) c j t < 0.65* c j t Split: Significant portions (>0%) of or more communities in C t c i (t ) move into two Death: None of the above Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 0

Tracking Nodes From community actions, we can track nodes Split node movement into three major events Stay: Community continues/merges and node stays with community Leave: Community continues/merges but node goes to another community Other: Community splits or dies Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0

Experimental Study Research question: What is the effect of varying graph- extraction parameters? Methodology:. Select data sets. Vary parameters and extract communities. Track communities over time. Downstream analytics: machine learning Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0

Data Sets used The Enron email data set (http://www.cs.cmu.edu/~enron/) 5 nodes (Enron employees) communicating with each other over years We set = month for our study Edge-weight = # emails between people in a given month 60K emails, containing 9K links over years World trade flows (WTF) (http://www.nber.org/data/) 0 nodes (countries) of trades between countries from 96 through 000 We set = year as that is the granularity of the data Edge-weight of ià j is normalized across all ià k (keep top-0) K links Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0

Varying parameters We performed an empirical study looking at the effects of changing and ={0.5, 0.75, 0.9,.0} ={0.05, 5.0, 0.0} [enron] ={0.0, 0.05, 0.0, 0.5, 0.5, 0.75,.0} [world trade flows] We tested when using weighted and unweighted edges Used to generate attribute values and identifying communities Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0

What does the network look like? World trade flows (99) ( =.0; =0.05) ( =0.5; =0.75) Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 5

Snapshot of evolution ( =0.5; =0.05) World Trade Flows Enron

Snapshot of evolution (WTF: =0.5; =0.05)

Effect on communities stability WTF:'α=.0;'η=0.05;'weighted'edges' WTF:'α=0.5;'η=0.05;'weighted'edges'' C W N N C M O N N C Q N C M P Q C Q N T W V C M Q C Q Q N X C M Q C Q Q N BB Z Y BC BD BE BG B C M Q Q BC BI BH BD BE C M Q Q C M Q Q C M Q Q C M Q Q C M R Legend:' new$(any'filled'shape)' con'nues$in$next$step$ merges$in$next$step$ Q splits$in$next$step$ dissolves$a5er$this$ BC BD BJ BH BE BK B BC BD BJ BK BO BM BC BD BS BK BQ BO BP BC BD BU BS BV BP BR BT BC BD BS BX BZ BY BR BW BC BD BS CB BZ C BC BD BS BZ CF C

Effect on communities stability Enron: =.0; =5.0; weighted edges Enron: =0.5; =5.0; weighted edges O R Q F E P S P Q J E R S K O R Q T F E P S P Q J E R S K O R Q T F E P S P V R U K O R Q U F E P S P W V R U U K O R Q F E P S P W V R X Y O R Q V F E P S P W Z B O R Q F E P S BE W BD BC BB B Legend: BG W W BF BC BB B B new (any filled shape) continues in next step BH.6 BC BB B merges in next step BH BK BJ BL BI B splits in next step dissolves after this

Effect on community sizes World Trade Flow Enron Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 0

Effect on longevity of communities WTF: =.0; =0.05 Group Group Group Group 8 8 8 7 UK Canada Japan Syria Ireland US sian NES Italy Cyprus Colombia China HK Jordan Mauri@us Ecuador Lao ustria Malta Mexico Malaysia Iraq Bermuda Costa Rica Singapore Czechoslovak Fiji El Salvador Thailand Germany Samoa Guatemala China Bulgaria New Zealand Honduras Korea Hungary Kenya Dominican R Vietnam Fm. Yugoslav Hai@ US NES Turkey Trinidad Myanmar Lebanon Jamaica Cambodia Saudi rabia Peru Indonesia lbania Venezuela Philippines Romania Bahamas Taiwan Somalia St. Pierre Poland Fr. Guiana Guyana Suriname WTF: =0.5; =0.05 Group Group Group 8 Canada Japan UK US sian NES Ireland Colombia China HK Cyprus Ecuador Lao Mauri@us Mexico Malaysia Costa Rica Singapore El Salvador Thailand Malawi (9) Guatemala China Fiji (8) Honduras Korea Kenya (8) Dominican R Vietnam Hai@ Trinidad US NES (7) Myanmar (5) Jamaica (6) Kiriba@ (6) Venezuela () Papua N. Guin (6) Bahamas () Peru ()

Effect on Betweenness centrality (world trade flow) =0.5; =0.5 =0.9; =0.5 US US Fm. W. Germany Germany Fm. W. Germany Germany Japan France/Monaco Japan France/Monaco UK

Downstream nalytics: Machine Learning Classification problems Given G G (t-) predict changes in communities going into C t Given G G (t-) predict changes in nodes going into C t ttributes used are purely Community: Density, inter- and intra-link ratio, size, number of triangles, average closeness centrality,... Nodes: Number of triangles, inter- vs intra-link ratio, size of communities linked to, We performed 5x CV Various off-the-shelf ML methods used Logistic regression, decision trees, naïve bayes Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0

Class Distribution: Enron Enron communities C / M / S C / M / S C / M / S α \ η 0.05 5.00 0.00 0.50 5 / 5 / / 55 / 0 / 56 / 0.75 76 / 0 / 79 / / 80 / / 0.90 80 / / 87 / 5 / 96 / 5 /.00 00 / 6 / 0 97 / 6 / 0 / 7 / Enron nodes L / S L / S L / S α \ η 0.05 5.00 0.00 0.50 77 / 59 660 / 908 85 / 65 0.75 505 / 6 55 / 96 6 / 6 0.90 9 / 7 06 / / 8.00 0 / 8 6 / 6 66 / 9 Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0

Class Distribution: WTF World trade flows communities C / M / S C / M / S C / M / S C / M / S C / M / S C / M / S C / M / S α \ η 0.0 0.05 0.0 0.5 0.50 0.75.00 0.50 87 / 60 / 9 / 69 / 0 0 / 79 / 8 6 / 86 / 7 5 / 59 / 6 / 0 / 0 / 0 / 0 0.75 05 / 7 / 0 / / 5 5 / 5 / 6 9 / 66 / 7 75 / 67 / 7 89 / 55 / 65 / / 0 0.90 / 0 / 06 / 0 / 5 9 / 5 / 6 / 6 / 5 / 59 / 6 0 / 58 / 69 / 9 /.00 / 7 / 0 / 8 / 05 / / / 0 / 08 / 7 / / / 8 / 9 / World trade flows nodes L / S L / S L / S L / S L / S L / S L / S α \ η 0.0 0.05 0.0 0.5 0.50 0.75.00 0.50 9 / 795 69 / 768 6 / 65 7 / 709 585 / 976 7 / 56 / 78 0.75 906 / 560 980 / 57 906 / 5 08 / 95 89 / 797 60 / 970 86 / 78 0.90 80 / 580 806 / 50 86 / 59 8 / 56 96 / 986 805 / 887 60 / 765.00 576 / 560 599 / 5578 60 / 5555 6 / 576 70 / 566 69 / 50 68 / 98 Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 5

Effect on downstream analytics Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 6

Effect on downstream analytics Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 7

Conclusion. There are many ways to pre-process dynamic data. Introduced principled parameterized framework. Explored how parameters affected various analytics Take-aways:. Varying parameters can uncover structure. Different parameters needed to answer different questions. Exploring parameters crucial to understand data. Need to make explicit what parameters were used in study and why Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 8

Thank you Sofus. Macskassy Data Scientist, Facebook Unifying Theory and Experiment for Large-Scale Networks - Simons Institute - 0 9