Algebraic Representation of Networks

Similar documents
Math 304 Handout: Linear algebra, graphs, and networks.

Data Mining and Matrices

A Note on Google s PageRank

The Second Eigenvalue of the Google Matrix

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Eigenvalue Problems Computation and Applications

Calculating Web Page Authority Using the PageRank Algorithm

1 Searching the World Wide Web

eigenvalues, markov matrices, and the power method

Graph Models The PageRank Algorithm

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Data Mining and Analysis: Fundamental Concepts and Algorithms

Uncertainty and Randomization

An Introduction to Spectral Graph Theory

Recitation 8: Graphs and Adjacency Matrices

Applications of The Perron-Frobenius Theorem

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Graph fundamentals. Matrices associated with a graph

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

Link Analysis. Leonid E. Zhukov

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

LAPLACIAN MATRIX AND APPLICATIONS

Markov Chains and Spectral Clustering

Markov Chains, Random Walks on Graphs, and the Laplacian

ECEN 689 Special Topics in Data Science for Communications Networks

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis

1998: enter Link Analysis

Information Propagation Analysis of Social Network Using the Universality of Random Matrix

CS 246 Review of Linear Algebra 01/17/19

Lecture 13: Spectral Graph Theory

CS6220: DATA MINING TECHNIQUES

Inf 2B: Ranking Queries on the WWW

Lecture: Local Spectral Methods (1 of 4)

Link Analysis Ranking

Spectral Analysis of Directed Complex Networks. Tetsuro Murai

Online Social Networks and Media. Link Analysis and Web Search

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

Markov Chains Handout for Stat 110

CS6220: DATA MINING TECHNIQUES

0.1 Naive formulation of PageRank

SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES

This section is an introduction to the basic themes of the course.

The Google Markov Chain: convergence speed and eigenvalues

Lecture 5: Random Walks and Markov Chain

Online Social Networks and Media. Link Analysis and Web Search

Data Mining and Analysis: Fundamental Concepts and Algorithms

CS249: ADVANCED DATA MINING

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

1 9/5 Matrices, vectors, and their applications

The non-backtracking operator

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

arxiv:cond-mat/ v1 3 Sep 2004

Google Page Rank Project Linear Algebra Summer 2012

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

A New Spectral Technique Using Normalized Adjacency Matrices for Graph Matching 1

Diffusion and random walks on graphs

Google Matrix, dynamical attractors and Ulam networks Dima Shepelyansky (CNRS, Toulouse)

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Krylov Subspace Methods to Calculate PageRank

Lecture 1: Graphs, Adjacency Matrices, Graph Laplacian

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Lecture 15 Perron-Frobenius Theory

On the mathematical background of Google PageRank algorithm

Spectra of Adjacency and Laplacian Matrices


Random Walks on Graphs. One Concrete Example of a random walk Motivation applications

Spectral Graph Theory Tools. Analysis of Complex Networks

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Basic graph theory 18.S995 - L26.

Link Analysis. Stony Brook University CSE545, Fall 2016

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

EE263 Review Session 1

Data Mining Recitation Notes Week 3

Communities, Spectral Clustering, and Random Walks

Computational Economics and Finance

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

Linear algebra and applications to graphs Part 1

Lecture 1 and 2: Introduction and Graph theory basics. Spring EE 194, Networked estimation and control (Prof. Khan) January 23, 2012

Page rank computation HPC course project a.y

1.10 Matrix Representation of Graphs

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016

Web Structure Mining Nodes, Links and Influence

MAT 1302B Mathematical Methods II

TMA Calculus 3. Lecture 21, April 3. Toke Meier Carlsen Norwegian University of Science and Technology Spring 2013

MAS275 Probability Modelling Exercises

18.312: Algebraic Combinatorics Lionel Levine. Lecture 19

Quick Tour of Linear Algebra and Graph Theory

Kristina Lerman USC Information Sciences Institute

Node Centrality and Ranking on Networks

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

The Push Algorithm for Spectral Ranking

IR: Information Retrieval

arxiv: v2 [cond-mat.dis-nn] 21 Jun 2016

Transcription:

Algebraic Representation of Networks 0 1 2 1 1 0 0 1 2 0 0 1 1 1 1 1 Hiroki Sayama sayama@binghamton.edu

Describing networks with matrices (1) Adjacency matrix A matrix with rows and columns labeled by nodes, where a ij represents the number of edges between node i and node j (must be symmetric for undirected graph) Incidence matrix (not discussed much) A matrix with rows labeled by nodes and columns labeled by edges, where a ij indicates whether edge j is connected to node i (1) or not (0) 2

Describing networks with matrices (2) Transition probability matrix A matrix with rows and columns labeled by states (nodes), where a ij represents the probability of transition from state (node) i to state (node) j Laplacian matrix A matrix with rows and columns labeled by nodes, where a ij represents node degree if i = j, or is -1 if node i and node j are connected 3

Exercise Write adjacency and incidence matrices of the (multi-)graph below u 1 e 1 u 2 e 6 e 2 e 3 e 4 u 3 e 5 u 4 e 7 4

Exercise Write an adjacency matrix of the (multi-)graph below u 1 u 2 u 3 u 4 5

Exercise Think about which node would be most suitable to be a source or a sink in a network represented by the adjacency matrix on the right Find the maximal flow of this network 0 0 3 0 0 0 0 0 0 0 0 0 0 2 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 5 0 0 6 0 4 0 0 0 5 0 0 4 0 0 0 0 0 2 0 7 0 0 0 0 0 2 0 2 2 0 0 0 0 6

Arithmetic Operations Applied to Adjacency Matrices

Sum and difference of adjacency matrices One can calculate a sum and a difference of adjacency matrices if the two graphs have the same number of nodes. Adjacency matrix of graph A Adjacency matrix of graph B A + B Sum of the two adjacency matrices 8

Exercise Calculate the sum of and the difference between the adjacency matrices of the following two graphs, and draw the actual shape of the resultant graphs u 1 u 1 u 2 u 4 u 2 u 4 u 3 u 3 9

Product of adjacency matrices Similarly, one can calculate a product of two adjacency matrices (multiplication is not commutative) Adjacency matrix of graph A Adjacency matrix of graph B Adjacency matrix of graph B Adjacency matrix of graph A A B Product of the two adjacent matrices (1) not equal in general B A Product of the two adjacent matrices (2) 10

Exercise Calculate two different products of the adjacency matrices of the following two graphs, and draw the actual shape of the results (Note: such multiplication may create directed graphs) Then think about what the product means A u 1 u 6 u 1 u 6 B u 2 u 5 u 2 u 5 u 3 u 4 u 3 u 4 11

Answer Product X Y indicates a directed graph that maps each node to a set of possible destinations that may be reached by a two-step move, first following Y and then X A B u 1 u 6 u 1 u 6 B A u 2 u 5 u 2 u 5 u 3 u 4 u 3 u 4 12

Power of Adjacency Matrices

What does a power of an adjacency matrix mean? u 1 u 3 u 2 u 4 A = 0 1 2 1 1 0 0 1 2 0 0 1 1 1 1 1 A n x? 0 1 0 0 Try to calculate Ax, A 2 x, A 3 x, etc. 14

What does a power of an adjacency matrix mean? u 1 u 3 u 2 u 4 A = 0 1 2 1 1 0 0 1 2 0 0 1 1 1 1 1 A n x? 0 1 0 0 This formula gives a set of nodes that can be reached in n steps from node u 2 (and the # of such walks) 15

What does a power of an adjacency matrix mean? u 1 u 3 u 2 u 4 A = 0 1 2 1 1 0 0 1 2 0 0 1 1 1 1 1 A n Arranging all the I = An results starting from 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 every node gives a power of adjacency matrix A 16

A theorem on the power of adjacency matrix In adjacency matrix A raised to the power of n, (A n ) ij gives the number of different walks of length n that starts at node j and ends at node i (This applies to both undirected and directed graphs; proof can be easily obtained by using mathematical induction with n) 17

Exercise Calculate how many walks of length two exist between u 1 and every other node in the graph below u 1 u 2 u 3 u 4 18

Exercise Using the power of an adjacency matrix, count the number of triangles included in: (a) A complete graph made of 20 nodes (b) An Erdos-Renyi random network made of 1000 nodes with connection probability 0.01 19

Determining graph connectivity A n gives the number of different walks of length n between every pair of nodes C n = Σ k=1~n A k gives the number of different walks of length n or shorter between every pair of nodes 20

Determining graph connectivity C n = Σ k=1~n A k gives the number of different walks of length n or shorter between every pair of nodes In C (# of nodes - 1), every possible path in that graph should be counted Because a path must not visit the same node more than once 21

Determining graph connectivity C (# of nodes - 1) = Σ k=1~(# of nodes - 1) A k If (C (# of nodes - 1) ) ij > 0 for all i j, then there is a path between any pair of nodes (and vice versa) The original graph is *numerically* determined to be a (strongly) connected graph 22

Exercise Show the strong connectivity of the graph below by calculating the sum of powers of its adjacency matrix u1 u 2 u 3 u 4 23

Exercise An alternative method is just to calculate (A + I) (# of nodes -1) and check if all elements have positive values Those values no longer show # of paths, but still tell us whether there are paths between each pair of nodes Why does this work? 24

Transitive closure Transitive closure of a graph is a graph that contains edge <u, v> whenever there is a path from node u to node v in the original graph Obtained by making all diagonal components 0 and all non-diagonal nonzero components 1 in C (# of nodes - 1) Describes accessibility between nodes Is a complete graph if the original graph is (strongly) connected 25

Transition Probability Matrix

Transition probability matrix An adjacency matrix of a directed graph with normalized weights (i.e., sum of all weights of outgoing links is always 1 for every node) Considers each node as a state, and a directed link as a stochastic state transition : Representing a Markov chain Can be constructed from a unweighted directed graph by assigning normalized weights 27

Exercise Create a TPM of the following graph 1 2 3 4 5 28

Properties of TPMs A product of two TPMs is also a TPM Always has eigenvalue 1 λ <= 1 for all eigenvalues If the original network is strongly connected (with some additional conditions), the TPM has one and only one eigenvalue 1 (no degeneration) 29

TPM and asymptotic probability distribution λ <= 1 for all eigenvalues If the original network is strongly connected (with some additional conditions), the TPM has one and only one eigenvalue 1 (no degeneration) This is a unique dominant eigenvalue; the probability vector will converge to its corresponding eigenvector 30

Exercise Obtain eigenvalues and eigenvectors of the TPM created in the previous exercise Calculate T n (1/5,1/5,1/5,1/5,1/5) T for large n and see what you will get 31

Application: Google s PageRank Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd, 'The PageRank Citation Ranking: Bringing Order to the Web' (1998): http://www-db.stanford.edu/~backrub/pageranksub.ps Node: Web pages link: Web links State: Temporary importance of that node Its coefficient matrix is a transition probability matrix that can be obtained by dividing each column of the adjacency matrix by the number of 1 s in that column. 32

Example s 1 s 2 s 5 s 3 s 4 0 0.5 0 0 1 0 0 0 0.5 0 1 0.5 0 0 0 0 0 0.5 0 0 0 0 0.5 0.5 0 * PageRank is actually calculated by forcedly assigning positive non-zero weights to all pairs of nodes in order to make the entire network strongly connected 33

Interpreting the PageRank network as a stochastic system s 1 s 2 s 5 s 3 s 4 State of each node can be viewed as a relative population that are visiting the webpage at t At next timestep, the population will distribute to other webpages linked from that page evenly 34

PageRank calculation Just one dominant eigenvector of the TPM of a strongly connected network always exists, with λ = 1 This shows the equilibrium distribution of the population over WWW So, just solve x = Ax and you will get the PageRank for all the web pages on the World Wide Web 35

Exercise s 1 s 2 s 5 s 3 s 4 0 0.5 0 0 1 0 0 0 0.5 0 1 0.5 0 0 0 0 0 0.5 0 0 0 0 0.5 0.5 0 Calculate the PageRank of each node in the above network (the network is already strongly connected so you can directly calculate its dominant eigenvector; but also try using the NetworkX built-in function for PageRank) 36

A note on PageRank PageRank algorithm gives non-trivial results only for asymmetric networks If links are symmetric (undirected), the PageRank values will be the same as node degrees Prove this 37

Laplacian Matrix

Laplacian matrix A matrix with rows and columns labeled by nodes, where a ij represents node degree if i = j, or is -1 if node i and node j are connected L = D A D: degree matrix (diagonal elements are node degrees; all 0 elsewhere) 39

Exercise Write a Laplacian matrix of the graph below 1 2 3 4 5 40

Relationship with Laplacians in vector calculus Related to Laplacian in vector calculus/pdes It is a negative, discrete version of it Similar to a second-order derivative, defined on a network E.g. diffusion on a network: x(t+1) = x(t) d L x(t) 41

Relationship with Laplacians in vector calculus Laplacian discretized over 2-D space: 2 f = 2 f/ x 2 + 2 f/ y 2 ~ ( f t (x+ x,y)+f t (x- x,y) 2f t (x,y) ) / x 2 +( f t (x,y+ y)+f t (x,y- y) 2f t (x,y) ) / y 2 = ( f t (x+ k,y) + f t (x- k,y) + f t (x,y+ k) + f t (x,y- k) 4f t (x,y) ) / k 2 Laplacian (graph) ~ - Laplacian (vector calc.) 42

Properties of a Laplacian Has (1, 1, 1,, 1) as an eigenvector Because each row/column adds up to 0 The corresponding eigenvalue is 0 All eigenvalues >= 0 # of zero eigenvalues = # of connected components in a graph 2nd smallest ev.: algebraic connectivity Smallest non-zero ev.: spectral gap Shows how quickly the network can suppress non-homogeneous states and synchronize 43

Exercise Create an Erdos-Renyi random network made of 100 nodes with connection probability 0.03 Obtain its Laplacian matrix and calculate its eigenvalues See what you find Visualize the network and compare the results 44

Exercise Generate the following network topologies w/ similar size and density: random graph barbell graph ring-shaped graph (i.e., degree-2 regular graph) Measure their spectral gaps and see how topologies quantitatively affect their values 45

Graph Spectrum

Degree distribution and graph spectrum Structural characteristics of a large complex network can be studied by analyzing these distributions Similar networks often have similar degree distributions and graph spectra Degree distribution is structural, intuitive and very easy to obtain Graph spectrum has strong connection to both structure and dynamical behavior 47

Graph spectrum Distribution of eigenvalues of the adjacency matrix of the network 0 1 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 Spectrum -2, 0, 0, 0, 2 Degenerate eigenvalues Undirected graphs have symmetric adj. matrices all real eigenvalues 48

Graph spectral analysis Plotting an eigenvalue distribution (i.e., histogram) Especially effective for visualizing complex network data obtained experimentally Computing power may be needed to obtain these plots for large networks Wigner s semi-circle law de Aguiar & Bar-Yam, Phys. Rev. E 71: 016106 (2005) 49

Exercise Obtain spectra of networks made of 1,000 nodes each Random Scale-free Based on some data Plot their density distributions 50

Exercise Obtain the spectrum of the Supreme Court Citation network Can you do this?? If you can t, make a subgraph induced by randomly selected 1,000 nodes, and conduct the same analysis Crude random sampling technique 51

What eigenvalues and eigenvectors can tell us An eigenvalue tells whether a particular state of the network (specified by its corresponding eigenvectors) grows or shrinks by interactions between nodes over edges Re(λ) > 0 growing Re(λ) < 0 shrinking 52

Laplacian spectrum Distribution of eigenvalues of the Laplacian matrix of the network 4-1 -1-1 -1-1 1 0 0 0-1 0 1 0 0-1 0 0 1 0-1 0 0 0 1 Spectrum 0, 1, 1, 1, 5 Degenerate eigenvalues 53

Review of Laplacian spectrum At least one λ is zero All the other λs are zero or positive # of zero λs corresponds to # of connected components in the graph 2nd smallest λ: algebraic connectivity Smallest non-zero λ: spectral gap Algebraic connectivity 0 = λ 1 = λ 2 = = λ k-1 < λ k λ k+1 λ N As many 0 s as # of CC s Spectral gap 54

Spectral gap Determines how easily a dynamical network can get synchronized The larger it is (relatively to the largest λ N ), the easier the synchronization is (Barahona & Pecora, Phys. Rev. Lett. 89: 054101. 2002) I. ER random II. NW small-world III.WS small-world IV. BA scale-free (Zhan, Chen & Yeung, Physica A 389: 1779-1788, 2010) 55

Exercise Create a small-world network of 1,000 nodes with varying p Obtain Laplacian spectra of the network and find its spectral gap λ 2 Plot λ 2 over p and see how it changes as random rewiring rate increases 56