Node and Link Analysis Leonid E. Zhukov School of Applied Mathematics and Information Science National Research University Higher School of Economics 10.02.2014 Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 1 / 25
Centrality Measures Sociology. Linton Freeman, 1979. Most "important"actors: actor location in the social network Actor centrality - involvment with other actors, many ties, source or recepient Actor prestige - recipient (object) of many ties, ties directed to an actor Three graphs: Star graph Circle graph Line graph Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 2 / 25
Three graphs Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 3 / 25
Degree Centrality Degree centrality C D (i) = k(i) = j A ij = j A ji Normalized degree centrality C D (i) = 1 n 1 C D(i) High centrality degree -direct contact with many other actors Low degree - not active, peripherial Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 4 / 25
Closeness Centrality Closeness centrality How close an actor to all the other actors in network C C (i) = d(i, j) j 1 Normalized closenss centrality C C (i) = (n 1)C C (i) Actor in the center can quickly interact with all others, short communication path to others, minimal number of steps to reach others Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 5 / 25
Betweenness Centrality Betweenness Centrality Number of shortest paths going through the actor σ st (i) C B (i) = Normalized betweenness centrality s t i σ st (i) σ st C B (i) = 2 (n 1)(n 2) C B(i) Probability that a communication from s to t will go through i (geodesics) Edge betweenness Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 6 / 25
Degree Prestige Degree prestige P D (i) = k in (i) = j A ji Normalized degree prestige P D (i) = 1 n 1 P D(i) Prestigious actors recieve many nominations Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 7 / 25
Proximity Prestige Proximity prestige Influence domain - set of actors that can reach i directly and indirectly. I i - size of influence domain. Average distance j d(j, i)/i i P p (i) = I i/(n 1) j d(j, i)/i i Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 8 / 25
Centralization Centralization (network measure) - how central the most central node in the network in relation to all other nodes. C x - one of the centrality measures N i [C x (p ) C x (p i )] C x = max N i [C x (p ) C x (p i )] Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 9 / 25
Link Analysis Graph G(n, m) and adjacency matrix A ij, edge i j Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 10 / 25
Status or Rank Prestige Leo Katz, 1953. Take into account status (prestige) of directly connected actors p i j N(i) p j = j A ji p j p i = j A ji p j p = A T p (I A T )p = 0 Nontrivial solution only if det(i A T ) = 0. Need to constraint matrix Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 11 / 25
Status or Rank Prestige Leo Katz, 1953 An actor gives equal parts of its prestige to all nearest neighbours p i j A ji p j k out (j) = j A ji k out (j) p j p = (D 1 A) T p where D ii = max(k i, 1) D 1 A - stochastic matrix, j (D 1 A) ij = 1, guaranteed λ max = 1 (D 1 A) T p = λp Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 12 / 25
Eigenvector Centrality Phillip Bonacich, 1972. c i j A ji c j c i = 1 A ji c j κ κc = A T c (κi A T )p = 0 Nontrivial solution only when det(κi A T ) = 0. Eigenvalue problem. Choose eigenvector corresponding do maximum eigenvalue: κ max = κ 1, c = c 1 j Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 13 / 25
Status or Rank Prestige. Eigenvector Centrality Phillip Bonacich, 1987. Parametrized centrality measure c(α, β) c i = j (α + βc j )A ji c = αa T e + βa T c c = α(i βa T ) 1 A T e α - found from normalizaion c 2 = c 2 i = 1 β - parameter, degree and direction of dependence on others Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 14 / 25
Link Analysis Graph G(n, m) zero out degree nodes, k out (i) = 0 zero in degree nodes, k in (i) = 0 Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 15 / 25
Real World Bow tie structure of the web Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 16 / 25
Perron-Frobenius Theorem Oscar Perron, 1907, Georg Frobenius,1912. Eigenvalue problem: Pp = λp Perron-Frobenius theorem: Real square matrix with positive entries stochastic (non-negative and rows sum up to one) irreducible (strongly connected graph) aperiodic then unique largest eigenvalue λ max = 1, with positive left eigenvector and power iterations coneverges to it. Solution satisfies p 1 = 1 Stationary distribution of Markov chain Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 17 / 25
PageRank Sergey Brin and Larry Page, 1998 Transition matrix: P = D 1 A Stochastic matrix: P = P + det n PageRank matrix: P = αp + (1 α) eet n Eigenvalue problem (choose solution with λ = 1): P T p = λp Notations: e - unit column vector, d - absorbing nodes indicator vector (column) Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 18 / 25
PageRank computations Eigenvalue problem [ ) ] α ((D 1 A) T + edt + (1 α) eet p = λp n n Power iterations p α(d 1 A) T p + α e n (dt p) + (1 α) e n (et p) p p p Sparse linear system (λ = 1, p 1 = 1) [ )] I α ((D 1 A) T + edt p = (1 α) e n n Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 19 / 25
Hubs and Authorities HITS, Jon Kleinberg, 1999 Citation networks. Reviews vs original research (authoritative) papers authorities, contain useful information, a i hubs, contains links to authorities, h i Mutual recursion Good authorities reffered by good hubs a i j A ji h j Good hubs point to good authorities h i j A ij a j Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 20 / 25
HITS System of linear equations a = αa T h h = βaa Symmetric eigenvalue problem (A T A)a = λa (AA T )h = λh where eigenvalue λ = (αβ) 1 Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 21 / 25
Centrality and Prestige of Florentine Families The Medici family marriage network Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 22 / 25
Ranking comparison The Kendall tau rank distance is a metric that counts the number of pairwise disagreements between two ranking lists Kendall rank correlation coefficient, commonly referred to as Kendall s tau coefficient τ = n c n d n(n 1)/2 n c - number of concordant pairs, n d - number of discordant pairs 1 τ 1, perfect agreement τ = 1, reversed τ = 1 Example Rank 1 A B C D E Rank 2 C D A B E τ = 6 4 5(5 1)/2 = 0.2 Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 23 / 25
References Centrality in Social Networks. Conceptual Clarification, Linton C. Freeman, Social Networks, 1, 215-239, 1979 Power and Centrality: A Family of Measures, Phillip Bonacich, The American Journal of Sociology, Vol. 92, No. 5 (Mar., 1987), 1170-1182. Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 24 / 25
References A new status index dervided from sociometric analysis, L. Katz, Psychometrika, 19, 39-43, 1953. Power and centrality: A family of measures, P. Bonacich, American Journal of Sociology, 92,1170-1182,1987. Graph structure in the Web, Andrei Broder et all, Procs of the 9th international World Wide Web conference on Computer networks, 2000. The PageRank Citation Ranknig: Bringing Order to the Web, S. Brin, L. Page, R. Motwany, T. Winograd, Stanford Digital Library Technologies Project, 1998 Authoritative Sources in a Hyperlinked Environment, Jon M. Kleinberg, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 25 / 25