Kristina Lerman USC Information Sciences Institute

Similar documents
Web Structure Mining Nodes, Links and Influence

Degree Distribution: The case of Citation Networks

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Complex Social System, Elections. Introduction to Network Analysis 1

Finding central nodes in large networks

LAPLACIAN MATRIX AND APPLICATIONS

DS504/CS586: Big Data Analytics Graph Mining II

Algebraic Representation of Networks

DS504/CS586: Big Data Analytics Graph Mining II

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

0.1 Naive formulation of PageRank

Spectral Graph Theory Tools. Analysis of Complex Networks

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

Similarity Measures for Link Prediction Using Power Law Degree Distribution

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS6220: DATA MINING TECHNIQUES

ECEN 689 Special Topics in Data Science for Communications Networks

Machine Learning and Modeling for Social Networks

As it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation

Node Centrality and Ranking on Networks

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

Data Mining and Matrices

Online Social Networks and Media. Link Analysis and Web Search

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

Characterizing Information Diffusion in Online Social Networks with Linear Diffusive Model

Social Networks. Chapter 9

Introduction to Link Prediction

Node and Link Analysis

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Data Mining Techniques

CSCI 3210: Computational Game Theory. Cascading Behavior in Networks Ref: [AGT] Ch 24

Lecture 13: Spectral Graph Theory

Link Analysis and Web Search

Link Analysis. Leonid E. Zhukov

Online Social Networks and Media. Link Analysis and Web Search

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Link Analysis Ranking

SMALL-WORLD NAVIGABILITY. Alexandru Seminar in Distributed Computing

ECS 289 F / MAE 298, Lecture 15 May 20, Diffusion, Cascades and Influence

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS6220: DATA MINING TECHNIQUES

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Identifying influential spreaders in complex networks based on entropy method

A new centrality measure for probabilistic diffusion in network

Diffusion of information and social contagion

Communities, Spectral Clustering, and Random Walks

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

Heat Kernel Based Community Detection

A Note on Google s PageRank

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Page rank computation HPC course project a.y

On the Precision of Social and Information Networks

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan

A Parameterized Centrality Metric for Network Analysis

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS249: ADVANCED DATA MINING

Spectral Graph Theory for. Dynamic Processes on Networks

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

Analytically tractable processes on networks

1998: enter Link Analysis

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory

Data Mining and Analysis: Fundamental Concepts and Algorithms

Link prediction in drug-target interactions network using similarity indices

Network Infusion to Infer Information Sources in Networks Soheil Feizi, Ken Duffy, Manolis Kellis, and Muriel Medard

Modeling face-to-face social interaction networks

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Lecture II: Matrix Functions in Network Science, Part 1

ECS 253 / MAE 253, Lecture 13 May 15, Diffusion, Cascades and Influence Mathematical models & generating functions

Analysis of an Optimal Measurement Index Based on the Complex Network

Intelligent Data Analysis. PageRank. School of Computer Science University of Birmingham

Facebook Friends! and Matrix Functions

Metrics: Growth, dimension, expansion

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity

Synchronization in Quotient Network Based on Symmetry

Temporal Networks aka time-varying networks, time-stamped graphs, dynamical networks...

Data science with multilayer networks: Mathematical foundations and applications

Friendship and Mobility: User Movement In Location-Based Social Networks. Eunjoon Cho* Seth A. Myers* Jure Leskovec

Four graph partitioning algorithms. Fan Chung University of California, San Diego

Groups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA

Machine Learning for Data Science (CS4786) Lecture 11

KINETICS OF SOCIAL CONTAGION. János Kertész Central European University. SNU, June

ORIE 6334 Spectral Graph Theory September 22, Lecture 11

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

KINETICS OF COMPLEX SOCIAL CONTAGION. János Kertész Central European University. Pohang, May 27, 2016

Online Sampling of High Centrality Individuals in Social Networks

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds

Influence Maximization in Dynamic Social Networks

Diffusion of Innovations in Social Networks

Diffusion of Innovation

Network Observational Methods and. Quantitative Metrics: II

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Information Retrieval and Search. Web Linkage Mining. Miłosz Kadziński

Transcription:

Rethinking Network Structure Kristina Lerman USC Information Sciences Institute Università della Svizzera Italiana, December 16, 2011

Measuring network structure Central nodes Community structure Strength of ties Zachary, J. Anthro. Research 33 No. 4. (1977)

Measuring network structure SNA metrics examine network topology to measure structure Centrality Degree, Katz score [Katz, 1953], Betweenness [Freeman, 1977], eigenvector [Bonacich, 1987], PageRank [Brin et al, 1998], Community detection Dozens of algorithms to partition network into groups 400+ references in 2010 review of community detection Strength of ties Neighborhood overlap to measure strength of tie [Granovetter, 1973] Claim: The nature of interactions between nodes affects how we measure network structure Consequences for network analysis metrics and algorithms

Types of interactions Two classes of interactions between network nodes Conservative One to one: phone calls, money transfer, web surfing Modeled by random walk 1 4 5 Non conservative One to many: epidemics, information diffusion, innovation adoption Modeled by contact process 1 4 5 2 3 2 3 Transfer matrix ~ D -1 A Transfer matrix ~ A

Matrix formulation 0 1 0 0 0 1 4 5 2 3 Adjacency matrix of the network A = 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 Outdegree matrix D = 0 2 0 0 0 0 0 2 0 0 0 0 0 1 0 0 0 0 0 1

Conservative interactions At time t, each node receives amount (t) At next time step, node retains (1-) of the amount it received and divides the rest among its neighbors Transfer matrix T: amount given to each neighbor T=D -1 A when evenly divided among neighbors 4 5 t=0 ( 0) w 0 1 2 3

Conservative interactions At time t, each node receives amount (t) At next time step, node retains (1-) of the amount it received and divides the rest among its neighbors Transfer matrix T: amount given to each neighbor T=D -1 A when evenly divided among neighbors 1 4 5 t=0 t=1 ( 0) w 0 ( 1) (0)T w0t 2 3

Conservative interactions At time t, each node receives amount (t) At next time step, node retains (1-) of the amount it received and divides the rest among its neighbors Transfer matrix T: amount given to each neighbor T=D -1 A when evenly divided among neighbors 1 4 5 2 3 t=0 t=1 t=2 ( 0) w 0 ( 1) (0)T w0t 2 2 ( 2) (1) T w0t

Conservative interactions At time t, each node receives amount (t) At next time step, node retains (1-) of the amount it received and divides the rest among its neighbors Transfer matrix T: amount given to each neighbor T=D -1 A when evenly divided among neighbors 1 4 5 2 3 t=0 t=1 t=2 t ( 0) w 0 ( 1) (0)T w0t 2 2 ( 2) (1) T w0t ( t) ( t 1)T t w0 T t

Conservative interactions At time t, each node receives amount (t) At next time step, node retains (1-) of the amount it received and divides the rest among its neighbors Transfer matrix T: amount given to each neighbor T=D -1 A when evenly divided among neighbors 4 5 t=0 t=1 ( 0) w 0 ( 1) (0)T w0t 1 t=2 2 2 ( 2) (1) T w0t 2 3 t ( t) ( t 1)T t w0 T t w( t) (1 ) t 1 k 0 (1 ) w 0 ( k) ( t) w( t 1)T

Steady state of conservative dynamic process At time t, each node receives amount (t) At next time step, node retains (1-) of the amount it received and divides the rest among its neighbors Transfer matrix T: amount given to each neighbor T=D -1 A when evenly divided among neighbors 1 4 5 2 3 t w (1 ) w (1 ) w 0 0 w T ( I T ) 1

Non conservative interactions At time t, each node receives amount (t) At next time step, node prints fraction of this amount for each out neighbor Replication matrix R: the additional amount produced for each neighbor R=A where A is adjacency matrix t=0 ( 0) w0 1 4 5 2 3

Non conservative interactions At time t, each node receives amount (t) At next time step, node prints fraction of this amount for each out neighbor Replication matrix R: the additional amount produced for each neighbor R=A where A is adjacency matrix t=0 ( 0) w 0 t=1 ( 1) (0)R w0r 4 5 1 2 3

Non conservative interactions At time t, each node receives amount (t) At next time step, node prints fraction of this amount for each out neighbor Replication matrix R: the additional amount produced for each neighbor R=A where A is adjacency matrix t=0 t=1 t=2 ( 0) w 0 ( 1) (0)R w R 0 2 2 ( 2) (1) R w0r 1 4 5 2 3

Non conservative interactions At time t, each node receives amount (t) At next time step, node prints fraction of this amount for each out neighbor Replication matrix R: the additional amount produced for each neighbor R=A where A is adjacency matrix t=0 t=1 t=2 t ( 0) w 0 ( 1) (0)R w R 0 2 2 ( 2) (1) R w0r ( t) ( t 1)R t w0 R t 1 4 5 2 3

Non conservative interactions At time t, each node receives amount (t) At next time step, node prints fraction of this amount for each out neighbor Replication matrix R: the additional amount produced for each neighbor R=A where A is adjacency matrix t=0 ( 0) w 0 t=1 ( 1) (0)R w R 0 t=2 t 2 2 ( 2) (1) R w0r ( t) ( t 1)R t w0 R t 1 4 5 w( t) t k 0 w 0 ( k) k 0 w( t 1) R t k w R 0 k 2 3

Steady state of non conservative dynamic process At time t, each node receives amount (t) At next time step, node prints fraction of this amount for each out neighbor Replication matrix R: the additional amount produced for each neighbor R=A where A is adjacency matrix 4 5 t 1 w w w 0 0 w ( I R R) 1 while <1/ max 2 3

Interactions and centrality Centrality identifies important nodes in the network e.g., most connected Degree centrality e.g., in the middle of most shortest paths Betweenness centrality e.g., those that are often visited by a process Nature of the process matters Conservative PageRank Non conservative Alpha Centrality

Interactions and Centrality Centrality identifies important nodes in the network, i.e., those that are often visited by a dynamic process Conservative Random surfer: follows out links at random with probability ; otherwise, jumps to random node Equilibrium PageRank Non conservative Epidemic spread: with probability, transmit disease to each out neighbor Equilibrium while <1/ max Alpha Centrality pr (1 ) s pr D 1 A cr s cr A 4 5 4 5 1 1 2 3 2 3

Interactions and Centrality Centrality identifies important nodes in the network, i.e., those that are often visited by a dynamic process Conservative Random surfer: follows out links at random with probability ; otherwise, jumps to random node Equilibrium PageRank Non conservative Epidemic spread: with probability, transmit disease to each out neighbor Equilibrium while <1/ max Alpha Centrality pr (1 ) s pr D 1 A cr s cr A 1 4 5 1 4 5 2 3 2 3 w c (1 ) w (0) w c c T w w (0) w n n n R

Which centrality metric is right for social media? follower submitter follower follower Information flow in social media is non conservative

Ground truth User activity data in social media provides ground truth Empirical measure of influence/importance 1. average size of cascades a node triggers 2. average number re broadcasts by followers Rank nodes by the empirical measure ground truth Compare rankings produced by centrality metrics to the ground truth

Which centrality metric is right for social media? Correlation between the ground truth and rankings predicted by Alpha-Centrality and PageRank Digg Twitter Non conservative Alpha Centrality best predicts node centrality

Alpha Centrality [Bonacich, 87] C() A A 2 2 A 3... A k 0 k A k A (I A) Measures the number of paths between nodes, each path attenuated by its length with parameter Parameter [0,1/ 1 ) sets the length scale of interactions Local: For = 0, only short range (local) interactions are considered Same rankings as degree centrality Meso: As grows, the length scale of interactions grows Global: As 1/ 1, global interactions are considered (length diverges) Same rankings as eigenvector centrality [Ghosh and Lerman, Parameterized Metric for Network Analysis Physical Review E, 2011]

Epidemic threshold for non conservative processes Diverging length scale critical phenomena Threshold behavior in non conservative diffusion Critical value of transmissibility c =1/ 1 [Wang et al., 2003] for < c, epidemic dies out, i.e., reaches vanishing fraction of nodes for > c, epidemic reaches a large fraction of nodes c =0.006 Size of simulated epidemics on the Digg follower graph and a synthetic graph transmissibility [Ver Steeg, Ghosh & Lerman, What stops social epidemics? ICWSM, 2011]

Multi scale analysis with Alpha Centrality Length scale parameter allows for multi scale analysis of networks Differentiate between local and global structures Change in rankings with Leaders: high influence on group members Nodes with high centrality locally (small Bridges: mediate communication between groups Nodes with low centrality locally (small But high centrality globally (large Peripherals: poorly connected to everyone Nodes with low centrality for any

Karate club network [Zachary, 1977] administrator instructor [Zachary An Information Flow Model for Conflict and Fission in Small Groups. J. Anthro. Research 33 No. 4. (1977)]

Ranking karate club members Centrality scores of nodes vs. No need to know communities to find bridging nodes

Community detection Divide the network into group such that nodes within a group are more similar to each other than to other nodes [Zachary An Information Flow Model for Conflict and Fission in Small Groups. J. Anthro. Research 33 No. 4. (1977)]

Synchronization in complex networks after a long time Hierarchical community structure revealed en route to synchronization [Arenas et al. Synchronization Reveals Topological Scales in Complex Networks, Phys. Rev. Lett. 96 (2006)]

Mathematics of synchronization Conservative Kuramoto model of coupled oscillators d i dt i sin( j i ) j neighbors(i) Linearizedmodel: Laplace operator d dt (D A) L

Mathematics of synchronization Conservative Kuramoto model of coupled oscillators d i dt Linearizedmodel: Laplace operator d dt i sin( j i ) j neighbors(i) (D A) L Non conservative Non conservative model A node does not divide its coupling energy among neighbors; rather, it applies its full coupling energy to each neighbor Linearizedmodel: Replicator operator d dt (I A) R max

Steady state d dt X X=L or R (t) ( 0 X 1 )e Xt X 1 System reaches steady state iff X is positive semi definite Time to reach the steady state ~ 1/ 1 (smallest positive eigenvalue of X) In steady state, ~ eigenvector corresponding to 0 (smallest eigenvalue of X) Conservative (X=L): i (t)= i (t+1)= j (t+1) Non conservative (X=R): i (t)= i (t+1);

Synthetic graph Adjacency matrix of the graph

Eigenvalue Spectrum Eigenvalue spectrum of the Laplacian used to characterize graph structure Number of null eigenvalues # disconnected components Smallest positive eigenvalue equilibration time Gaps between consecutive eigenvalues relative difference of time scales Large eigenvalues hubs in the network

Synchronization matrix T=1500 Conservative Non-conservative

Zachary karate club Adjacency matrix of the graph

Eigenvalue spectrum Eigenvalue spectrum of the Laplacian used to characterize graph structure Number of null eigenvalues # disconnected components Gaps between consecutive eigenvalues relative difference of time scales Large eigenvalues hubs in the network Cheeger bounds, graph partitioning criteria, conductance,

Synchronization matrix of the Karate Club Network Laplacian (T=1000) Replicator (T=1000) More synchronization Less synchronization

Hierarchical clustering emerging communities Non conservative Conservative t=10 t=1000 t=3000 t=3899

Community structure Conservative Non conservative Hierarchical agglomerative clustering on synchronization matrix at time=3899 Non conservative: clustering reveals ground truth community structure Conservative: two nodes mis assigned

Community structure of Digg social network

Community structure of Digg social network Whiskers Core No further structure in the core [Leskovec et al., 2008]

Onion like structure of the core Non conservative Conservative Digg mutual follower network with ~40K nodes, ~360K edges Each core has its own core and whiskers structure Little overlap between the cores discovered by the two models [Ghosh & Lerman, Role of Dynamic Interactions in Multi scale Analysis of Community Structure submitted to WWW]

Long tailed size distribution of whiskers Non conservative Conservative whiskers in a sub core disconnected components in the mutual follower graph Clustering nodes in the core, reveals many small communities (whiskers) with long-tailed size distribution [Ghosh & Lerman, Role of Dynamic Interactions in Multi scale Analysis of Community Structure submitted to WWW]

Strength of ties Social ties and proximity People receive novel information (e.g., new jobs) not through close friends (strong ties) but acquaintances (weak ties) [Granovetter, 1973] Proposed neighborhood overlap as measure of tie strength Tie strength ~ proximity in networks Empirical correlation between proximity (neighborhood overlap) and tie strength (frequency of calls) in a mobile call graph [Onnela et al, 2007] Link prediction Proximity predicts future links in networks E.g., future collaborations between scientists [Liben Nowell & Kleinberg, 2003] Tested many proximity metrics

Measuring proximity Variety of metrics proposed to measure proximity in graph CN: number of common neighbors JA: fraction of common neighbors (Jaccard) AA: Adamic Adar metric [Adamic & Adar, 1998] weighs each common neighbor by log 1 ( degree) best metric for predicting future collaborations! [Liben Nowell & Kleinberg, 2003] 1 AA uv log(d z ) z Neighbors Effective conductance [Koren et al., 2006]

Interactions and proximity Proximity between u and v = likelihood a message will get from u to v or vice versa Conservative Non conservative 4 v 4 v 1 1 u 3 u 3 CO 1 2 z Neighbors 1 d u d z Attention limited CO_ AL 1 2 zneighbors zneighbors 2 d u d z d z d v 1 d v d z NC 1 2 1 1 z Neighbors NC _ AL 1 2 zneighbors z Neighbors CN 1 1 d z d v d z d u

Activity prediction in social media What posts will user retweet? Social media users tend to be similar to their friends i.e. retweet the same posts as friends do (or vote for the same stories on Digg [Lerman, 2007]) But they tend to be more similar to closer friends Closeness based on proximity in the follower graph Which proximity metric is better? [Lerman et al., Using proximity to predict activity in social networks submitted to WWW]

Prediction experiment user? friend friend friend x i friend friend Pr u p p Re u p u Measure how well each proximity metric predicts activity [Lerman et al., Using proximity to predict activity in social networks submitted to WWW]

Prediction results: Digg Baseline = all friends contribute equally to user s activity Lift = percent change over baseline 70 60 50 precision recall 40 lift (%) 30 20 10 0-10 CN, NC JA AA CS CS_AL NC_AL -20 [Lerman et al., Using proximity to predict activity in social networks submitted to WWW]

Prediction results: Twitter Baseline = all friends contribute equally to user s activity Lift = percent change over baseline 30 25 20 precision recall lift (%) 15 10 5 0-5 -10-15 -20 CN,NC JA AA CS CS_AL NC_AL [Lerman et al., Using proximity to predict activity in social networks submitted to WWW]

Conclusion How we measure network structure depends on the nature of interactions between nodes Centrality Conservative interactions PageRank, Non-conservative Alpha-centrality, Alpha-centrality better predicts influential users on Digg, Twitter Community structure Conservative use Laplacian to probe structure Non-conservative use Replicator operator Communities synchronize faster in non-conservative interactions Social ties A principled way to measure proximity in graphs Attention-limited proximity better predicts user activity on Digg, Twitter