Lecture III: Matrix Functions in Network Science, Part 2

Size: px

Start display at page:

Download "Lecture III: Matrix Functions in Network Science, Part 2"

Elisabeth Farmer
5 years ago
Views:

1 Lecture III: Matrix Functions in Network Science, Part 2 Michele Benzi Department of Mathematics and Computer Science Emory University Atlanta, Georgia, USA Summer School on Theory and Computation of Matrix Functions Dobbiaco, June,

2 Outline 1 Diffusion on networks 2 Hub and authority rankings in digraphs 3 Numerical methods of network analysis 4 Numerical examples 5 Bibliography 2

3 Outline 1 Diffusion on networks 2 Hub and authority rankings in digraphs 3 Numerical methods of network analysis 4 Numerical examples 5 Bibliography 3

4 Dynamics on networks: diffusion The diffusion of contagious diseases or computer viruses, the adoption of new products, the spread of rumors, beliefs, fads, consensus phenomena etc. can be modeled in terms of diffusion processes on graphs. There exist a number of models to describe diffusion phenomena on networks, both deterministic and stochastic in nature. Moreover, it is possible to investigate a number of important structural properties of networks by studying the behavior of solutions of evolution problems, such as diffusion or wave propagation, taking place on the network. An increasingly important model in this direction is given by quantum graphs, dicussed in Lecture VI. 4

5 Dynamics on networks: diffusion (cont.) The simplest (semi-deterministic) model of diffusion on a graph is probably the initial value problem for the heat equation on G: dx dt = L x, t > 0, x(0) = x 0. Here L = D A denotes the graph Laplacian, x = x(t) is the temperature" (or concentration") at time t, and x 0 the initial distribution; e.g., if x 0 = e i, the substance" that is diffusing on G is initially all concentrated on node v i. As a rule, x 0 0. The solution of the initial value problem is then given by x(t) = e tl x 0, t > 0. Here we assume that G is connected. 5

6 Dynamics on networks: diffusion (cont.) We note that the heat kernel {e tl } t 0 forms a family (semigroup) of doubly stochastic matrices. This means that each matrix e tl is nonnegative and has all its row and column sums equal to 1. The diffusion process on G is therefore a continuous time Markov process, which converges for t to the uniform distribution, regardless of the (nonzero) initial state: lim x(t) = c1, c > 0, t where c is the average of the initial vector x 0. 6

7 Dynamics on networks: diffusion (cont.) The rate at which the system reaches the steady-state (equilibrium) is governed by the smallest non-zero eigenvalue of L, which in turn is determined by the topology of the graph. In general, the small-world property leads to extremely fast diffusion ( mixing") in complex networks. This can help explain the very rapid global spread of certain epidemics (like the flu) but also of fads, rumors, financial crises, etc. Note that this rapid convergence to equilibrium is in contrast with the situation for diffusion on a regular grid obtained from the discretization of a (bounded) domain Ω R d (d = 1, 2, 3). 7

8 Dynamics on networks: diffusion (cont.) More in detail, let L = QΛQ T = N λ k q k q T k be the eigendecomposition of the Laplacian of the (connected) graph G. Recalling that the eigenvalues of L satisfy 0 = λ 1 < λ 2... λ N and that the (normalized) eigenvector associated with the simple zero eigenvalue is q 1 = 1 we obtain e tl = 1 N N 11T + e tλ k q k q T k. Therefore we have e tl x 0 = 1 N (1T x 0 )1 + k=1 k=2 N e tλ k (q T k x 0 )q k c1 for t, k=2 N 1, where c = 1 N (1T x 0 ) is positive if the initial (nonzero) vector x 0 is nonnegative. It is clear that the rate of convergence to equilibrium ( mixing rate") depends on the smallest nonzero eigenvalue λ 2 of L (also called the spectral gap of L). 8

9 Dynamics on networks: diffusion (cont.) As an extreme example, consider the complete graph on N nodes, in which every node is connected to any other node. The Laplacian of this graph has only two distinct eigenvalues, 0 and N, and equilibrium is reached extremely fast. However, there also exist very sparse graphs that exhibit a sizable gap and therefore fast mixing rates. An important example is given by expander graphs. These graphs are highly sparse, yet they are highly connected and have numerous applications in the design of communication and infrastructure networks. Another important property of such graphs is that they are very robust, meaning that many edges have to be deleted to break the network into two (large) disconnected subnetworks. The problem of constructing sparse graphs with good expansion properties is an active area of research. 9

10 Dynamics on networks: the Schrödinger equation Also of interest in Network Science is the Schrödinger equation on a graph: i dψ dt = Lψ, t R, ψ(0) = ψ 0, where ψ 0 C N is a prescribed initial state with ψ 0 2 = 1. The solution is given esplicitly by ψ(t) = e itl ψ 0, for all t R; note that since itl is skew-hermitian, the propagator U(t) = e itl is a unitary matrix, which guarantees that the solution has unit norm for all t: ψ(t) 2 = U(t)ψ 0 2 = ψ 0 2 = 1, t R. The amplitudes [U(t)] ij 2 express the probability to find the system in state e i at time t, given that the initial state was e j. The Schrödinger equation has recently been proposed as a tool to analyze properties of complex graphs in P. Suau, E. R. Hancock, & F. Escolano, Graph characteristics from the Schrödinger operator, LNCS 7877 (2013), pp

11 Outline 1 Diffusion on networks 2 Hub and authority rankings in digraphs 3 Numerical methods of network analysis 4 Numerical examples 5 Bibliography 11

12 Hubs and authorities in digraphs Returning to the topic of centrality measures, suppose that G = (V, E) is a directed graph. In such graphs each node can play two different roles: that of hub, and that of authority. For example, we may be interested in analyzing citation patterns among authors in a given field, or prey-predator relations in a food web. The Hyperlink-Induced Topic Search (HITS) algorithm (Kleinberg, 1999) provides a way of ranking hubs and authorities in a digraph. The original application area was WWW information retrieval. In a nutshell: A hub is a webpage that points to many good authorities; An authority is a webpage that is pointed to by many good hubs; Every webpage i is given an authority score x i and a hub score y i, based on an iterative scheme. 12

13 Calculation of HITS scores The HITS ranking scheme is an iterative method converging to a stationary solution. Initially, each x i and y i is set equal to 1. 13

14 Calculation of HITS scores The HITS ranking scheme is an iterative method converging to a stationary solution. Initially, each x i and y i is set equal to 1. Weights are updated in the following way: x (k) i = for k = 1, 2, 3,... j:(j,i) E y (k 1) j and y (k) i = j:(i,j) E x (k) j 13

15 Calculation of HITS scores The HITS ranking scheme is an iterative method converging to a stationary solution. Initially, each x i and y i is set equal to 1. Weights are updated in the following way: x (k) i = for k = 1, 2, 3,... j:(j,i) E y (k 1) j and y (k) i = j:(i,j) E x (k) j Weights are normalized so that j (x(k j )2 = 1 and j (y(k) j ) 2 = 1. 13

16 Calculation of HITS scores The HITS ranking scheme is an iterative method converging to a stationary solution. Initially, each x i and y i is set equal to 1. Weights are updated in the following way: x (k) i = for k = 1, 2, 3,... j:(j,i) E y (k 1) j and y (k) i = j:(i,j) E x (k) j Weights are normalized so that j (x(k j )2 = 1 and j (y(k) j ) 2 = 1. Under mild conditions, the sequences of vectors {x (k) } and {y (k) } converge as k. 13

17 HITS as the power method When written in terms of matrices, the HITS process becomes This can be rewritten as x (k) = A T y (k 1) and y (k) = Ax (k). x (k) = A T Ax (k 1) and y (k) = AA T y (k 1). HITS is just an iterative power method to compute the dominant eigenvector for A T A and AA T. 14

18 HITS as the power method When written in terms of matrices, the HITS process becomes x (k) = A T y (k 1) and y (k) = Ax (k). This can be rewritten as x (k) = A T Ax (k 1) and y (k) = AA T y (k 1). HITS is just an iterative power method to compute the dominant eigenvector for A T A and AA T. A T A is the authority matrix. 14

19 HITS as the power method When written in terms of matrices, the HITS process becomes x (k) = A T y (k 1) and y (k) = Ax (k). This can be rewritten as x (k) = A T Ax (k 1) and y (k) = AA T y (k 1). HITS is just an iterative power method to compute the dominant eigenvector for A T A and AA T. A T A is the authority matrix. AA T is the hub matrix. 14

20 HITS as the power method When written in terms of matrices, the HITS process becomes x (k) = A T y (k 1) and y (k) = Ax (k). This can be rewritten as x (k) = A T Ax (k 1) and y (k) = AA T y (k 1). HITS is just an iterative power method to compute the dominant eigenvector for A T A and AA T. A T A is the authority matrix. AA T is the hub matrix. Note that the Perron Frobenius Theorem applies to both matrices. 14

21 HITS reformulation Letting we obtain a symmetric matrix. A = [ 0 A A T 0 ] Taking powers of this matrix, we get: [ A 2k (AA = T ) k ] 0 0 (A T A) k [ A 2k+1 = 0 A(A T A) k (A T A) k A T 0 ]. Additionally, A T A = AA T = A 2. 15

22 HITS reformulation (cont.) [ ] y Let u (k) (k) = x (k) 16

23 HITS reformulation (cont.) [ ] y Let u (k) (k) = x (k) The HITS iteration becomes u (k) = A 2 u (k 1) = [ AA T 0 0 A T A ] u (k 1). 16

24 HITS reformulation (cont.) [ ] y Let u (k) (k) = x (k) The HITS iteration becomes u (k) = A 2 u (k 1) = [ AA T 0 0 A T A ] u (k 1). The first N entries of u (k) are the hub rankings on the nodes. The last N are the authority rankings. 16

25 HITS reformulation (cont.) [ ] y Let u (k) (k) = x (k) The HITS iteration becomes u (k) = A 2 u (k 1) = [ AA T 0 0 A T A ] u (k 1). The first N entries of u (k) are the hub rankings on the nodes. The last N are the authority rankings. The HITS rankings use only the information contained in the dominant eigenvector of A. 16

26 HITS reformulation (cont.) [ ] y Let u (k) (k) = x (k) The HITS iteration becomes u (k) = A 2 u (k 1) = [ AA T 0 0 A T A ] u (k 1). The first N entries of u (k) are the hub rankings on the nodes. The last N are the authority rankings. The HITS rankings use only the information contained in the dominant eigenvector of A. Hence, HITS is just eigenvector centrality for A! 16

27 HITS reformulation (cont.) [ ] y Let u (k) (k) = x (k) The HITS iteration becomes u (k) = A 2 u (k 1) = [ AA T 0 0 A T A ] u (k 1). The first N entries of u (k) are the hub rankings on the nodes. The last N are the authority rankings. The HITS rankings use only the information contained in the dominant eigenvector of A. Hence, HITS is just eigenvector centrality for A! Can we do better than HITS? 16

28 Subgraph centrality for digraphs Recall that the subgraph centrality of a node v i in an undirected graph is SC(i) = [e A ] ii, and that the communicability between two nodes v i, v j is C(i, j) = [e A ] ij. The notion of subgraph centrality applies primarily to undirected graphs, since it cannot distinguish between the two roles of hub and authority. In analogy with HITS, we can calculate these quantities for the augmented symmetric matrix [ ] 0 A A = A T. 0 17

29 Subgraph centrality for digraphs (cont.) As we saw in Lecture II, this matrix is the adjacency matrix of the bipartite graph G = (V, E) associated with the original directed graph G. The network behind A can be thought of as follows: 18

30 Subgraph centrality for digraphs (cont.) As we saw in Lecture II, this matrix is the adjacency matrix of the bipartite graph G = (V, E) associated with the original directed graph G. The network behind A can be thought of as follows: Take the vertices from the original network A and make two copies V and V. Let V = V V. 18

31 Subgraph centrality for digraphs (cont.) As we saw in Lecture II, this matrix is the adjacency matrix of the bipartite graph G = (V, E) associated with the original directed graph G. The network behind A can be thought of as follows: Take the vertices from the original network A and make two copies V and V. Let V = V V. Undirected edges exist between the two sets based on the following rule: E = {(v i, v j ) there is a directed edge from v i to v j in the original network}. 18

32 Subgraph centrality for digraphs (cont.) As we saw in Lecture II, this matrix is the adjacency matrix of the bipartite graph G = (V, E) associated with the original directed graph G. The network behind A can be thought of as follows: Take the vertices from the original network A and make two copies V and V. Let V = V V. Undirected edges exist between the two sets based on the following rule: E = {(v i, v j ) there is a directed edge from v i to v j in the original network}. This creates a graph with 2N nodes. 18

33 Subgraph centrality for digraphs (cont.) As we saw in Lecture II, this matrix is the adjacency matrix of the bipartite graph G = (V, E) associated with the original directed graph G. The network behind A can be thought of as follows: Take the vertices from the original network A and make two copies V and V. Let V = V V. Undirected edges exist between the two sets based on the following rule: E = {(v i, v j ) there is a directed edge from v i to v j in the original network}. This creates a graph with 2N nodes. The nodes v i V 1 V represent the original nodes in their role as hubs, while the nodes v i V 2 V represent the original nodes in their role as authorities. 18

34 Subgraph centrality for digraphs (cont.) Using the SVD of A, we can show that ( ) e A cosh AA T = ( ( ) A AT A) sinh AT A ). sinh ( AT A) ( AT A) A T cosh ( AT A Here A denotes the pseudoinverse of A. ( ) The diagonal entries of the top left block, cosh AA T, can be used to rank the nodes of G in their role as hubs; those of the bottom right block, ( ) cosh A T A, can be used to rank the nodes in their role as authorities. The interpretation of the off-diagonal entries is a bit more involved, but they too provide useful information on the network. 19

35 Interpretation In a directed network, each node has two roles: being a hub and being an authority. 20

36 Interpretation In a directed network, each node has two roles: being a hub and being an authority. Nodes 1, 2,..., n V represent the original nodes in their role as hubs. 20

37 Interpretation In a directed network, each node has two roles: being a hub and being an authority. Nodes 1, 2,..., n V represent the original nodes in their role as hubs. Nodes n + 1, n + 2,..., 2n V represent the original nodes in their role as authorities. 20

38 Alternating walks An alternating walk of length k, starting with an out-edge, from v 1 to v k+1 is: v 1 v 2 v 3 21

39 Alternating walks An alternating walk of length k, starting with an out-edge, from v 1 to v k+1 is: v 1 v 2 v 3 An alternating walk of length k, starting with an in-edge, from node v 1 to v k+1 is: v 1 v 2 v 3 21

40 Alternating walks An alternating walk of length k, starting with an out-edge, from v 1 to v k+1 is: v 1 v 2 v 3 An alternating walk of length k, starting with an in-edge, from node v 1 to v k+1 is: v 1 v 2 v 3 [AA T A...] ij (k matrices multiplied) counts the number of alternating walks of length k, starting with an out-edge, from node i to node j. 21

41 Alternating walks An alternating walk of length k, starting with an out-edge, from v 1 to v k+1 is: v 1 v 2 v 3 An alternating walk of length k, starting with an in-edge, from node v 1 to v k+1 is: v 1 v 2 v 3 [AA T A...] ij (k matrices multiplied) counts the number of alternating walks of length k, starting with an out-edge, from node i to node j. [A T AA T...] ij (k matrices multiplied) counts the number of alternating walks of length k, starting with an in-edge, from node i to node j. 21

42 Interpretation of diagonal entries [(AA T ) k ] ij and [(A T A) k ] ij count the number of walks of length 2k starting, respectively, with an out-edge and an in-edge. 22

43 Interpretation of diagonal entries [(AA T ) k ] ij and [(A T A) k ] ij count the number of walks of length 2k starting, respectively, with an out-edge and an in-edge. If node i in the original matrix is a good hub, it will point to many good authorities. These authorities will be pointed at by many good hubs, etc. 22

44 Interpretation of diagonal entries [(AA T ) k ] ij and [(A T A) k ] ij count the number of walks of length 2k starting, respectively, with an out-edge and an in-edge. If node i in the original matrix is a good hub, it will point to many good authorities. These authorities will be pointed at by many good hubs, etc. Thus, if node i is a good hub, it will be included in the sets of hubs described above many times. 22

45 Interpretation of diagonal entries [(AA T ) k ] ij and [(A T A) k ] ij count the number of walks of length 2k starting, respectively, with an out-edge and an in-edge. If node i in the original matrix is a good hub, it will point to many good authorities. These authorities will be pointed at by many good hubs, etc. Thus, if node i is a good hub, it will be included in the sets of hubs described above many times. This means that there should be many even length alternating walks, starting with an out-edge, from node i to itself. 22

46 Interpretation of diagonal entries (cont.) The number of such walks, scaling a walk of length 2k by 1 (2k)!, can be counted by the (i, i) entry of: I + AAT 2! + AAT AA T 4! + + (AAT ) k (2k)! + 23

47 Interpretation of diagonal entries (cont.) The number of such walks, scaling a walk of length 2k by 1 (2k)!, can be counted by the (i, i) entry of: I + AAT 2! + AAT AA T 4! + + (AAT ) k (2k)! + Letting A = UΣV T be the SVD of A, this becomes: ) U (I + Σ2 2! + Σ4 4! + + Σ2k (2k)! + U T = U cosh(σ)u T = cosh( AA T ) 23

48 Interpretation of diagonal entries (cont.) The hub centrality of node i (in the original network) is given by [e A ] (i,i) = [cosh( AA T )] (i,i). It measures how well node i transmits information to the authoritative nodes in the network. 24

49 Interpretation of diagonal entries (cont.) The hub centrality of node i (in the original network) is given by [e A ] (i,i) = [cosh( AA T )] (i,i). It measures how well node i transmits information to the authoritative nodes in the network. Using a similar analysis, it can be seen that the authority centrality of node i is given by [e A ] (i+n,i+n) = [cosh( A T A)] (i,i). It measures how well node i receives information from the hubs in the network. 24

50 Interpretation of diagonal entries (cont.) The hub centrality of node i (in the original network) is given by [e A ] (i,i) = [cosh( AA T )] (i,i). It measures how well node i transmits information to the authoritative nodes in the network. Using a similar analysis, it can be seen that the authority centrality of node i is given by [e A ] (i+n,i+n) = [cosh( A T A)] (i,i). It measures how well node i receives information from the hubs in the network. These measures use information from all of the eigenvectors of the matrix A. 24

51 The influence of the spectral gap When the gap between the largest and second largest singular value of A is large enough, the resulting rankings tend to agree with those obtained by HITS. However, if the gap is relatively small the new rankings can be significantly different, and arguably more meaningful, than those obtained with HITS, since more singular value/vector information is taken into account. This method is an alternative to the Katz-like approach based on the row and column sums of f(a). We illustrate this on three examples. M. Benzi, E. Estrada, & C. Klymko, Ranking hubs and authorities using matrix functions, Linear Algebra Appl., 438 (2013), pp

52 Example: ranking of websites on specific topics Next, we compare our approach with HITS. The following data sets are small web graphs on various issues. The data is provided by P. Tsaparas and can be found on his website tsap/experiments/www10-experiments/index.html 26

53 Abortion dataset The abortion dataset contains n = 2293 nodes and m = 9644 directed edges. The maximum eigenvalue of A is λ N = and the second largest eigenvalue is λ N 1 =

54 Hub ranking Table: Top 10 hubs of the abortion data set, ranked using [e A ] ii and HITS. [e A ] ii nodes HITS nodes

55 Authority ranking Table: Top 10 authorities of the abortion data set, ranked using [e A ] ii and HITS. [e A ] ii nodes HITS nodes

56 Computational complexity dataset The computational complexity dataset contains n = 884 nodes and m = 1616 directed edges. The maximum eigenvalue of A is λ N = and the second largest eigenvalue is λ N 1 =

57 Hub ranking Table: Top 10 hubs of the computational complexity data set, ranked using [e A ] ii and HITS. [e A ] ii nodes HITS nodes

58 Authority ranking Table: Top 10 authorities of the computational complexity data set, ranked using [e A ] ii and HITS. [e A ] ii nodes HITS nodes

59 Death penalty dataset The death penalty dataset contains n = 1850 nodes and m = 7363 directed edges. The maximum eigenvalue of A is λ N = and the second largest eigenvalue λ N 1 =

60 Hub ranking Table: Top 10 hubs of the death penalty data set, ranked using [e A ] ii and HITS. [e A ] ii nodes HITS nodes

61 Authority ranking Table: Top 10 authorities of the death penalty data set, ranked using [e A ] ii and HITS. [e A ] ii nodes HITS nodes

62 Outline 1 Diffusion on networks 2 Hub and authority rankings in digraphs 3 Numerical methods of network analysis 4 Numerical examples 5 Bibliography 36

63 Computational tasks We saw that many problems in network analysis lead to solving sparse linear systems (Katz), computing eigenvectors or singular vectors (eigenvector centrality, PageRank, HITS), and evaluating matrix functions f(a) (or f(l)), or their action on a prescribed vector. Typically, the diagonal entries of f(a) are wanted, and for large graphs some global averages over subsets of entries of f(a), often expressed in terms of traces or row/column sums. High accuracy is not always required. For example, in the case of centralities it is often enough to be able to identify just the top-ranked nodes. In many cases, bounds are often sufficient. Remark: When only row and/or column sums of f(a) are required, there is no need to explicitly compute any of the entries of f(a). 37

64 Computational tasks (cont.) Many of these computations can be formulated as evaluation of expressions of the form u T f(a) v for suitably chosen vectors u, v R N and for suitable functions f(x): Subgraph centrality: SC(i) = e T i exp (A) e i Communicability: C(i, j) = e T i exp (A) e j Average communicability: C(i) = 1 [ 1 T exp (A) e i e T ] i exp (A) e i N 1 Hub scores: SH(i) = e T i cosh ( AA T ) e i Authority scores: SA(i) = e T i cosh ( A T A) e i Total network communicability: T C(G) = 1 N 1T exp(a) 1 Analogous resolvent-based, Katz-like quantities Question: How do we compute these quantities efficiently? 38

65 Golub & Meurant to the rescue! Gene Golub and Gérard Meurant (Oxford, July 2007) 39

66 Golub & Meurant to the rescue! 40

67 Quadrature-based bounds Explicit computation of f(a) is prohibitive for large graphs. Also, many real-world networks are noisy. For examples, edges in biological networks have many false positives. Gaussian quadrature rules can be used to obtain bounds on the entries of certain functions of symmetric matrices: Golub & Meurant (1994, 2010), Benzi & Golub (1999). Upper and lower bounds are available for the bilinear form with h(u, v) = f(a)u, v = u T f(a)v u, v R N (taking u = e i, v = e j yields [f(a)] ij ) A R N N symmetric f(x) strictly completely monotonic on an interval containing the spectrum of A. 41

68 Quadrature-based bounds (cont.) Definition A real function f(x) is strictly completely monotonic on an interval I R if f (2j) (x) > 0 and f (2j+1) (x) < 0 on I for all j 0. Examples: f(x) = 1/x is s.c.m. on (0, ), f(x) = e x is not s.c.m., f(x) = e x is s.c.m. on R. In particular, we can compute: bounds on e A = e ( A), bounds on B 1 if B = I αa is positive definite (that is, if 0 < α < 1 λ max(a) ). 42

69 Quadrature-based bounds (cont.) Consider the spectral decompositions For u, v R N we have A = QΛQ T, f(a) = Qf(Λ)Q T. u T f(a)v = u T Qf(Λ)Q T v = p T f(λ)q = N f(λ i )p i q i, i=1 where p = Q T u and q = Q T v. Rewrite this as a Riemann-Stieltjes integral: b 0 λ < a = λ 1 u T i f(a)v = f(λ)dµ(λ), µ(λ) = j=1 a p jq j λ i λ < λ i+1 N j=1 p jq j b = λ N λ. Caveat: Note the numbering of the eigenvalues of A here. 43

70 Gauss quadrature The general Gauss-type quadrature rule is b a f(λ)dµ(λ) = n M w j f(t j ) + v k f(z k ) + R[f], j=1 where the nodes {z k } are prescribed. Gauss: M = 0, Gauss Radau: M = 1, z 1 = a or z 2 = b, k=1 Gauss Lobatto: M = 2, z 1 = a and z 2 = b. The evaluation of these quadrature rules is reduced to computation of orthogonal polynomials via three-term recurrence, or, equivalently, computation of entries and spectral information of the corresponding tridiagonal matrix (Lanczos). 44

71 Gauss quadrature (cont.) For instance, we have for the Gauss rule: with for some a < η < b. b a f(λ)dµ(λ) = R[f] = f (2n+M) (η) (2n + M)! b n w j f(t j ) + R[f] j=1 a [ n 2dµ(λ), (λ t j )] j=1 Theorem (Golub-Meurant) n w j f(t j ) = e T 1 f(j n )e 1 = [f(j n )] 11 j=1 45

72 Gauss quadrature (cont.) The tridiagonal matrix J n corresponds to the three-term recurrence relationship satisfied by the set of polynomials orthonormal with respect to dµ: J n = ω 1 γ 1 γ 1 ω 2 γ γ n 2 ω n 1 γ n 1 γ n 1 ω n The eigenvalues of J n are the Gauss nodes, whereas the Gauss weights are given by the squares of the first entries of the normalized eigenvectors of J n. The quadrature rule is computed with the Golub Welsch QR algorithm. Alternatively, the (1,1) entry of f(j n ) can be computed via explicit diagonalization of J n. 46

73 Gauss quadrature (cont.) Consider the case u = v = e i (corresp. to the (i, i) entry of f(a)). The entries of J n are computed using the symmetric Lanczos algorithm: γ j x j = r j = (A ω j I)x j 1 γ j 1 x j 2, j = 1, 2,... ω j = x T j 1Ax j 1, γ j = r j 2 with initial vectors x 1 = 0 and x 0 = e i. Two approaches for computing bounds: Explicit, a priori bounds are obtained by taking a single Lanczos step. Alternatively, one may explicitly carry out a certain number of Lanczos iterations (MMQ Matlab toolbox by G. Meurant). Each additional Lanczos step amounts to adding another node to the Gauss-type quadrature rule, resulting in tighter and tighter bounds. 47

74 Gauss quadrature (cont.) For f(a) = e A and f(a) = (I αa) 1 we obtain: bounds on [f(a)] ii from symmetric Lanczos, bounds on [f(a)] ij using the identity e T i f(a)e j = 1 4 [(e i + e j ) T f(a)(e i + e j ) (e i e j ) T f(a)(e i e j )], or, bounds on [f(a)] ii + [f(a)] ij using nonsymmetric Lanczos, lower bounds from the Gauss and the Gauss-Radau rules, upper bounds from the Gauss-Radau and Gauss-Lobatto rules. In computations one can use the simple (Geršgorin) estimates λ min max 1 i N {deg(i)}, where deg(i) is the degree of node v i. λ max max 1 i N {deg(i)}. Using more accurate estimates of the extreme eigenvalues of A generally leads to improved results, especially for scale-free graphs. 48

75 Explicit bounds for SC(i) and EE(G) Carrying out explicitly ( by hand ) a single Lanczos step, we obtain the following bounds for centrality: Gauss: [e A ] ii Gauss-Radau: T i e d i q 4T 3 i + 4d3 i 0 q 4T i 3 + 4d3 i cosh q 4T 3 i + 4d3 i 2d i q 1 4T i 3 + 4d3 i C 2T i sinh A 2d i b 2 e d i b + d i e b b 2 + d i d i [e A ] ii a2 e a + d i e a a 2 + d i Gauss-Lobatto: [e A ] ii ae b be a. a b Bounds on the Estrada index are easily computed by summing over i. In these formulas, T i denotes the number of triangles (closed walks of length three) in G with a vertex at node i. 49

76 Explicit bounds for communicability The Gauss-Radau rule gives the following bounds: b 2 e T ij b + T ij e b b 2 + T ij d i a2 e a + d i e a a 2 + d i [e A ] ij [e A ] ij a2 e Tij a + T ij e a a 2 + T ij d i b2 e b where T ij := k i A ki(a ki + A kj ) (A ij ) d i e b b 2. + d i 50

77 Explicit bounds for resolvent centrality Let B = I A/(N 1). Note that B is a symmetric M-matrix. We have: and (Gauss) (Radau) (Lobatto) k i k i l i B kib kl B li l i B kib kl B li 1 b + di b(n 1) 2 [B 1 ] ii 1 b + di (N 1) 2 [ B 1 ] a + b 1 ii ab 1 [B 1 ] ii 2 N 2 3N + 2 N 2 3N + 3. [B 1 ] ii, d2 i (N 1) 4 1 a + di a(n 1) 2, 1 a + di (N 1) 2 M. Benzi & P. Boito, Quadrature rule-based bounds for functions of adjacency matrices, Linear Algebra Appl., 433: (2010). 51

78 MMQ (iteratively computed) bounds This refers to the (increasingly tight) bounds obtained by adding quadrature nodes (= performing additional Lanczos steps) to estimate entries of f(a); Computations are performed using G. Meurant s MMQ package, suitably adapted to handle sparse matrices; Fast and accurate computation of upper and lower bounds for centrality and communicability; Complexity is at most O(N) per iteration (much less for the first 2-3 iterations) ; For graphs with bounded max degree, number of iterations does not depend on the graph size N (see below); Parallelization is possible on shared memory machines. 52

79 Low-rank approximations A major drawback of subgraph centrality (and communicability) in the case of large networks is cost. The approximate O(N 2 ) scaling to evaluate the subgraph centrality of each node for a network with N nodes is unacceptably high for networks with millions of nodes (or more). The cost can be reduced by working with low-rank approximations of A. Estimating a small number k N of the largest eigenpairs (λ i, x i ) and using the approximation e A k e λ i x i x T i i=1 can lead to significant savings. The accuracy is typically good due to rapid decay in the eigenvalues of A for many networks. 53

80 Low-rank approximations (cont.) Additional savings can be obtained if one only wants to identify the top-k nodes, where k N. Similarly, in case of digraphs A can be approximated in terms of a few dominant singular triplets. Other improvements can be obtained when computing communicabilities by using the block Lanczos algorithm. For details, we refer to the following works: C. Fenu, D. Martin, L. Reichel, & G. Rodriguez, Network analysis via partial spectral factorization and Gauss quadrature, SIAM J. Sci. Comput., 35 (2013), pp. A C. Fenu, D. Martin, L. Reichel, & G. Rodriguez, Block Gauss and anti-gauss quadrature with application to networks, SIAM J. Matrix Anal. Appl., 34 (2013), pp J. Baglama, C. Fenu, L. Reichel, & G. Rodriguez, Analysis of directed networks via partial singular value decomposition and Gauss quadrature, Linear Algebra Appl., 456 (2014), pp

81 Computing row and column sums Recall that the computations of row/column sums of matrix functions amounts to computing the action of f(a) or f(a T ) = f(a) T on the vector 1 of all ones. For instance, computing the total communicability of a graph with adjacency matrix A, T C(G) = 1 N 1T e A 1, amounts to taking the arithmetic mean of the entries of the vector e A 1. This can be accomplished very fast using Krylov subspace methods for evaluating the action of the matrix exponential on a vector. A very efficient Matlab tolbox has been developed by Stefan Güttel: guettels/funm_kryl 55

82 Krylov subspace methods Krylov subspace methods are the algorithms of choice for solving many linear algebra problems, including: Large-scale linear systems of equations Ax = b Large-scale (generalized) eigenvalue problems Ax = λb x Computing f(a)v where A is a large matrix, v is a vector and f a given function (e.g., f(a) = e ta, t > 0) Solving matrix equations (e.g., AX + XA + B = 0) An attractive feature of Krylov methods in some applications is that the matrix A is not needed explicitly: only the ability to multiply A times a given vector is required (action, or operator principle). 56

83 Krylov subspace methods (cont.) The main idea behind Krylov methods is the following: A nested sequence of suitable low-dimensional subspaces (the Krylov subspaces) is generated; The original problem is projected onto these subspaces; The (small) projected problems are solved exactly"; The approximate solution is expanded back to the original N-dimensional space, once sufficient accuracy has been attained. Krylov subspace methods are example of polynomial approximation methods, where f(a)v is approximated by p(a)v where p is a (low-degree) polynomial. Since every matrix function f(a) is a polynomial in A, this is appropriate. 57

84 Krylov subspace methods (cont.) The kth Krylov subspace of A C N N and a nonzero vector v C N is defined by K k (A, v) = span {v, Av,..., A k 1 v}, and it can be written as Obviously, K k (A, v) = {q(a)v q is a polynomial of degree k 1}. K 1 (A, v) K 2 (A, v) K d (A, v) = = K N (A, v). Here d is the degree of the minimum polynomial of A with respect to v. 58

85 Krylov subspace methods (cont.) As is well known, computing projections onto a subspace is greatly facilitated if an orthonormal basis for the subspace is known. This is also desirable for numerical stability reasons. An orthonormal basis for a Krylov subspace can be efficiently constructed using the Arnoldi process; in the Hermitian case, this is known as the Lanczos process (Arnoldi, 1951; Lanczos, 1952). Both of these are efficient implementations of the classical Gram Schmidt process. In Arnoldi s method, the projected matrix has upper Hessenberg structure, which can be exploited in the computation. In the Hermitian case it is tridiagonal. 59

86 Arnoldi s process For arbitrary A C N N and v C N, v 0, the Arnoldi process is: Set q 1 = v/ v 2 ; For j = 1,..., m do: h i,j = Aq i, q j for i = 1, 2,..., j u j = Aq j j i=1 h i,jq i h j+1,j = u j 2 If h j+1,j = 0 then STOP; q j+1 = u j / u j 2 Remarks: (i) If the algorithm does not stop before the mth step, the Arnoldi vectors {q 1,..., q m } form an ONB for the Krylov subspace K m (A, v). (ii) At each step j, the Arnoldi process requires one matrix-vector product, j + 1 inner products, and j linked triads. (iii) At each step the algorithm computes Aq j and then orthonormalizes it against all previously computed q j s. 60

87 Arnoldi s process (cont.) Define Q m = [q 1,..., q m ] C N m. Introducing the (m + 1) m matrix Ĥ m = [h ij ] and the m m upper Hessenberg matrix H m obtained by deleting the last row of Ĥm, the following Arnoldi relations hold: AQ m = Q m H m + u m e T m = Q m+1 Ĥ m Q maq m = H m Hence, the m m matrix H m is precisely the projected matrix Q maq m. If A = A, then H m = H m = T m is a tridiagonal matrix, and the Arnoldi process reduces to the Lanczos process, which is much cheaper in terms of both operations and storage. Indeed, the Lanczos process consists of a three-term recurrence, with constant operation count and storage costs per step, whereas the Arnoldi process has increasing costs for increasing j. 61

88 Krylov subspace approximation to f(a)v To approximate f(a)v we take q 1 = v/ v 2 and for an appropriate k we compute f k := v 2 Q k f(h k )e 1 = Q k f(h k )Q k v. The vector f k is the kth approximation to f(a)v. Typically, k N and computing f(h k ) is inexpensive, and can be carried out in a number of ways. For instance, when H k = H k = T k, it can be computed via explicit diagonalization of T k. Deciding when this approximation f k is sufficiently close to f(a)v to stop the process is a non-trivial problem, and an area of active research. For the case of the matrix exponential, Saad (1992) suggested the error estimate e A v Q k e H k Q k v 2 v 2 e T k eh k e 1. Note that e T k eh ke 1 is the (k, 1) entry of e H k. 62

89 Outline 1 Diffusion on networks 2 Hub and authority rankings in digraphs 3 Numerical methods of network analysis 4 Numerical examples 5 Bibliography 63

90 Summary of results We begin with subgraph centrality and communicability calculations. Numerical experiments were performed on a number of networks, both synthetic (generated with the CONTEST Toolbox for Matlab, Taylor & Higham 2010) and and from real-world applications, including biological, social, collaboration, and transportation networks. A few (3-5) Lanczos steps per estimate are usually enough Independent of N for graphs with bounded max degree Almost as good for power-law degree distributions Very efficient approach for computing global averages (like the total communicability) M. Benzi and P. Boito, Quadrature rule-based bounds for functions of adjacency matrices, Linear Algebra Appl., 433: (2010). M. Benzi and C. Klymko, Total communicability as a centrality measure, J. Complex Networks, 1(2): (2013). 64

91 Numerical examples Experiments are performed on "random" matrices, "small world" matrices and "range-dependent" matrices, generated using the CONTEST toolbox for Matlab ([Taylor & Higham 2010]). We test a priori and a posteriori bounds and compare them to previous results. 65

92 Small-world networks (Watts Strogatz model) A=smallw(N,k,p) N positive integer (number of nodes) k positive integer (number of connected neighbors) 0 p 1 (probability of a shortcut) The Watts Strogatz small-world model interpolates between a regular lattice and a random graph. 66

93 Small-world networks (Watts Strogatz model) Begin with a ring... 67

94 Small-world networks (Watts Strogatz model) Begin with a ring... 68

95 Small-world networks (Watts Strogatz model) Begin with a ring... 69

96 Small-world networks (Watts Strogatz model) Begin with a ring... and add random shortcuts. 70

97 Small-world networks (Watts Strogatz model) Here is the corresponding adjacency matrix (N = 15, k = 1, p = 0.4): nz = 40 71

98 Range-dependent networks A=renga(N,α,λ) N positive integer (number of nodes) α > 0 0 λ 1. Choose an ordering of the nodes (i = 1, 2,..., N). Nodes i and j have a probability α λ i j 1 of being connected. Good model for protein-protein interaction. 72

99 Range-dependent networks Sparsity pattern for N = 100, α = 1 and λ = 0.85: nz =

100 Erdös-Rényi (random) networks A=erdrey(N,m) N positive integer (number of nodes) m positive integer (number of edges) An (N, m)-erdös-rényi graph is a graph chosen uniformly at random in the set of all (suitable) graphs with N nodes and m edges. 74

101 Erdös-Rényi (random) networks Sparsity pattern for N = 100, m = 300: nz =

102 A priori bounds for EE(G) More bounds for EE(G) are found in the literature (de la Peña et al., LAA (2007)): N 2 + 4m EE(G) N 1 + e 2m. We compare bounds on EE(G) for a range-dependent test matrix: de la Peña Radau Gauss EE(G) EE(G) Radau Lobatto de la Peña

103 A priori bounds for EE(G) Logarithmic plot of bounds on the Estrada index for small world matrices of increasing size. 30 Estrada index 25 Radau (lower) de la Peña (lower) Radau (upper) de la Peña (upper) N 77

104 MMQ bounds for EE(G) and communicability A is a range dependent matrix. MMQ approximations of the Estrada index EE(G) = # it Gauss Radau (lower) Radau (upper) Lobatto MMQ approximations of [e A ] 1,5 = # it Radau (lower) Radau (upper)

105 MMQ bounds w.r.t. matrix size Relative error for the Estrada index of small world matrices of parameters (4, 0.1); averaged over 10 matrices; 5 iterations. N error (upper bound) error (lower bound) e e e e e e e e e e e e 5 79

106 MMQ bounds w.r.t. matrix size (cont.) Bounds for the Estrada index of small world matrices of increasing size. Watts Strogatz model with parameters (4, 0.1); five Lanczos iterations. N Gauss Gauss-Radau (l) Gauss-Radau (u) Gauss-Lobatto e5 1.63e5 1.64e5 1.66e e5 4.10e5 4.11e5 4.12e e5 8.18e5 8.20e5 8.33e e6 1.22e6 1.22e6 1.25e e6 1.63e6 1.64e6 1.66e6 Five iterations suffice to reach a small relative error, independent of N. 80

107 MMQ bounds for resolvent centrality Relative errors for MMQ bounds on the resolvent centrality of node 10 for Erdös Rényi matrices associated with graphs with N vertices and 4N edges; averaged on 10 matrices; 2 iterations. N Gauss Radau (lower) Radau (upper) Lobatto e e e e e e e e e e e e e e e e 13 In this case the error decreases because I A/(N 1) approaches the identity matrix I as N. 81

108 Asymptotic independence of convergence rate on N Theorem Let {A N } be the adjacency matrices associated with a sequence of graphs {G N } of size N. Assume that the degree d i of any node in G N satisfies d i D for all N, with D constant. Let f be an analytic function defined on a region containing the interval [ D, D]. Then, for any given ε > 0, the number of Lanczos steps (or, equivalently, of quadrature nodes) needed to approximate any entry [f(a)] ij with an error < ε is bounded independently of N. The assumption that d i D for all N may be restrictive for certain networks (e.g., Twitter). More generally, we have proved that the number of Lanczos steps grows at worse like O(D N ), where D N is the max degree of any node in G N. The result is a consequence of the fact that the coefficients of an analytic function in the Chebyshev expansion decay at least exponentially fast. 82

109 Some timings results in Matlab Timings (in seconds) for computing the diagonal of the exponential of small-world matrices of increasing order N. Blue: time for the Matlab expm function. Black: time using eig. Red: using 5 iterations of the Lanczos algorithm for each node. 83

110 Some timings results in Matlab (cont.) Plot of the time in seconds for computing the diagonal entries of the exponential of of small world matrices of size N = 1000, 5000, using 5 iterations of the Lanczos algorithm for each node. 84

111 Results for scale-free networks Experiments were performed on scale-free networks (Barabási Albert model) of increasing size, to see the effect of increasing maximum degree. In this case, the Geršgorin estimate λ max d max gives poor results. Indeed, whereas the max degree increases with N, the spread of the eigenvalues tend to grow much more slowly (roughly, λ max = O( d max ) for N.) A better strategy is to estimate λ min, λ max using Lanczos. The number of Lanczos steps per estimate (either diagonal elements or off-diagonal ones) increases very slowly with N, for a fixed accuracy. Typically, 9-11 Lanczos steps suffice to achieve relative errors around

112 Results for real world networks Network N NNZ λ 1 λ 2 Zachary Karate Club Drug Users Yeast PPI Pajek/Erdos Pajek/Erdos Pajek/Erdos Pajek/Erdos SNAP/ca-GrQc SNAP/ca-HepTh SNAP/as Gleich/Minnesota Characteristics of selected real world networks. All networks are undirected. 86

113 Results for real world networks (cont.) Network expm mmq funm_kryl Zachary Karate Club Drug Users Yeast PPI Pajek/Erdos Pajek/Erdos Pajek/Erdos Pajek/Erdos SNAP/ca-GrQc SNAP/ca-HepTh SNAP/as Gleich/Minnesota Timings (in seconds) to compute centrality rankings based on the diagonal and row sum of e A for various test problems using different codes. expm: Matlab s built-in matrix exponential; mmq: Meurant s code, modified; funm_kryl: Güttel s code. 87

114 A large example: the Wikipedia graph The Wikipedia graph is a directed graph with N = 4, 189, 503 nodes and 67, 197, 636 edges (as of June 6, 2011). The row/columns of the exponential of the adjacency matrix can be used to rank the hubs and authorities in Wikipedia in order of importance. Using the Arnoldi-based code funm_kryl we can compute the hub and authority rankings for all the nodes in about 216s to high accuracy on a parallel system comprising 24 Intel(R) Xeon(R) E GHz CPUs. 88

115 Summary of results (cont.) Computing Katz centrality scores requires the solution of large and sparse linear systems of the form (I αa)x = 1. For this task, iterative methods should be used. Note that for an undirected graph the condition number of I αa is bounded by κ = 2 1 αλ 1. For α not too close to 1 λ 1, convergence is usually fast; for example, conjugate gradient, Richardson and Chebyshev iterations have all been found to perform well. Often, simple preconditioners give the best results. On the other hand, for large networks (N = O(10 6 ) or larger) one would like to parallelize the computation. It turns out that for most complex networks this is very difficult, due to the fact that generalized nested dissection and similar orderings result in huge separator sets and therefore very high communication costs. 89

116 Summary of results (cont.) For the same reasons, sparse direct solvers generate enormous amounts of fill-in and are generally unsuitable for solving complex network problems. Example: GeneRank linear system of order N = 152, 520 from computational genomics, 791, 768 nonzeros, SPD. Matlab s backslash: 42.3 sec CG with no preconditioning: 1.46 sec CG with diagonal preconditioning: 0.42 sec Chebyshev with diagonal preconditioning: 0.18 sec With AMD reordering, the Cholesky factor contains 211,825,948 nonzeros! M. Benzi and V. Kuhlemann, Chebyshev acceleration of the GeneRank algorithm, Electr. Trans. Numer. Anal., 40 (2013),

117 Summary of results (cont.) In many problems involving the matrix exponential, convergence is usually very fast; one should also keep in mind that high accuracy is seldom needed in this context. As already mentioned, for very large problems, low rank approximations of the adjacency matrix can be used to reduce cost. For example, centrality calculations can often be performed working with just the 5-10 leading eigenpairs (or singular triplets) of the adjacency matrix. The presence of a few dominant eigenvalues well-separated from the rest of the spectrum is of great help. This is another aspect in which complex network problems behave very differently from discrete PDE problems. 91

118 Concluding remarks Network Science is an exciting multidisciplinary field undergoing rapid development. Research opportunities abound, and applied mathematicians can make a number of important contributions on all fronts: theory, modeling, and computation. Current emphasis is on dynamic processes involving time-evolving networks. The notions of centrality, communicability, community, diffusion etc. are being adapted to this setting, but much more work remains to be done, including (especially?) on the algorithmic aspects. 92

119 Outline 1 Diffusion on networks 2 Hub and authority rankings in digraphs 3 Numerical methods of network analysis 4 Numerical examples 5 Bibliography 93

120 Monographs G. H. Golub and G. Meurant, Matrices, Moments and Quadrature with Applications, Princeton University Press, J. Liesen and Z. Strakos, Krylov Subspace Methods: Principles and Analysis, Oxford University Press,

121 Software Complex Networks Package for Matlab: CONTEST Matlab Toolbox: G. Meurant s MMQ package: Stefan Güttel package: guettels/funm_kryl 95

Sparse Linear Algebra Issues Arising in the Analysis of Complex Networks

Sparse Linear Algebra Issues Arising in the Analysis of Complex Networks Department of Mathematics and Computer Science Emory University Atlanta, GA 30322, USA Acknowledgments Christine Klymko (Emory)