Randomization and Gossiping in Techno-Social Networks

Size: px

Start display at page:

Download "Randomization and Gossiping in Techno-Social Networks"

Drusilla Peters
6 years ago
Views:

1 Randomization and Gossiping in Techno-Social Networks Roberto Tempo CNR-IEIIT Consiglio Nazionale delle Ricerche Politecnico ditorino

2 CPSN Social Network Layer humans Physical Layer GPS cyberphysical social networks

3 Techno-Social Networks Social networks (opinion dynamics) Centrality measures Technological networks (PageRank) Tools: randomization and gossiping Properties: ergodicity

4 Opinion Dynamics in Social Networks

5 Model of Opinion x is a numerical value representing the opinion that each agent (human) has about a specific topic Example: How much do you like soccer? Agents discuss the topic and exchange information with other agents

6 Stubborn and Open-Minded Agents Some agents are very stubborn Others are open-minded and willing to change their opinions Opinions oscillate around a mean value

7 Time Average Opinions Time average opinions do not show oscillations

8 Aggregation and Partial Consensus Each agent reaches a stable opinion which is not a global consensus Some agents aggregate into opinion clusters, others don t Need to model the opinions: bounded confidence models don t explain persistent disagreement

9 Friedkin and Johnsen Model of Opinions - 1 Discrete time model of opinions x(k+1) = ΛW x(k) + (I-Λ) v x(0) = v x is the belief or opinion (state) v is the prejudices (input) W interpersonal influences beetwen agents Λ (diag) sensitivity to opinion of other agents (weights) - W is row stochastic (W1 = 1)

10 Friedkin and Johnsen Model of Opinions - 2 Discrete time model of opinions x(k+1) = ΛW x(k) + (I-Λ) v x(0) = v endogeneously exogeneously W interpersonal influences beetwen agents Λ (diag) sensitivity to the opinion of other agents - W is row stochastic (W1 = 1) - Λ = I - diag(w)

11 Opinion Profile The opinion profile of agents is given by k-1 k j x( k) ( W ) ( W ) (I-) v j0 Question: Do the opinions converge to a stable opinion profile for k

12 Convergence of Opinion Dynamics Assumption (stubbornness): For any i, the i-th agent is either stubborn or is influenced (indirectly) by a stubborn agent This is N&S to establish convergence of opinions for k x opd = x(k) = (I - Λ W) -1 (I-Λ) v

13 Example (Friedkin and Johnsen) - 1 v W T diag v = x(0) prejudices W strength of interactions agent 3 is stubborn Λ sensitivity x opd T x opd final opinion

14 Example (Friedkin and Johnsen) - 2 Study opinion profile Red (stubborn) and cyan (open minded) agents reach a consensus Two distinct opinion clusters are formed Global consensus is not achieved

15 Model of Interpersonal Influences this model of social influence will be imperfect at some level it is obvious that interpersonal influences do not occur in the simultaneous way and that there are complex sequences of interpersonal influences in a group N.E. Friedkin and E.C. Johnsen (1999)

16 Global vs Local Information Interpersonal influences do not occur simultaneously Simultaneous access to the entire graph of opinions is not realistic No global exchange of information Agents discuss the topic within small groups (e.g. in pairs or in triples) Example: When a human needs to take a difficult decision about health (surgery or medical treatment), he/she discusses the matter within the family or friends

17 Key Point: Models for Information Exchange Consider directed graphs G (V,E) Synchronous model where all the agents (nodes) simultaneously exchange information through links Asynchronous model based on a local communication protocol (two agents)

18 Communications between Humans are becoming Increasingly Asynchronous Examples of asynchronous communications: text-based messages, , bulletin boards, blogs, forum, They are delivered via web technology and they are independent of time and place Examples of synchronous communications: phone and conference calls which require humans to decide a common time

Randomized Algorithm Randomized Algorithm (RA): An algorithm that makes random choices during its execution to produce a result (it is an algorithm that may fail to provide the correct

19 Randomized Algorithm Randomized Algorithm (RA): An algorithm that makes random choices during its execution to produce a result (it is an algorithm that may fail to provide the correct answer, but the probability of this event can be made arbitrarily small) set_r =1:0.01:3; for k =1:length(set_r) if (rand > 0.5) then a_opt(k) = hel(k); else a_opt(k) = 3.7; end if end

20 Randomization in Sociology Jon Elster: Randomization in individual and social decisions Importance of randomization for designing experiments Example: Decide which patients may be selected to receive a standard or a new treatment for a disease

21 Key Ingredient 1: Randomized Gossip Protocol Gossip protocol based on (uniform) edge randomization Let θ(k) E be a sequence of independent identically distributed random variables (clock) Can we recover the global solution using only local information? Need to establish convergence properties of this protocol

22 Randomized Algorithm based on Local Opinions -1 Gossip interaction: at time k directed link (i,j) E is randomly sampled according to a (uniform) distribution in E

23 Randomized Algorithm based on Local Opinions - 2 At time k agents i and j exchange information Agent i updates its opinion based on the its previous opinion, the opinion of agent j and the initial prejudices v i j

24 Randomized Algorithm based on Local Opinions - 3 Agent i changes opinion based on interactions with j x ( k 1) h (1- γ ) x ( k) γ x ( k)) (1- h ) v where h i 0,1] and γ ij 0,1 are given coefficients The new opinion is a convex combination of opinions and of prejudices ( i i ij i ij j i i (1- γ ) x ( k) γ x ( k) ij i ij j h ((1- γ ) x ( k) γ x ( k)) (1- h ) v i ij i ij j i i

25 Randomized Algorithm based on Local Opinions - 3 Agent i changes opinion based on interactions with j ( x ( k 1) h (1- γ ) x ( k) γ x ( k)) (1- h ) v i i ij i ij j i i where h i 0,1] and γ ij 0,1 are given coefficients The new opinion is a convex combination of opinions (1- γ ) x ( k) γ x ( k) ij i ij j

26 Randomized Algorithm based on Local Opinions - 3 Agent i changes opinion based on interactions with j ( x ( k 1) h (1- γ ) x ( k) γ x ( k)) (1- h ) v i i ij i ij j i i where h i 0,1] and γ ij 0,1 are given coefficients The new opinion is a convex combination of opinions and of prejudices h ((1- γ ) x ( k) γ x ( k)) (1- h ) v i ij i ij j i i

27 Randomized Algorithm based on Local Opinions - 4 Agent i changes opinion based on interactions with j ( x ( k 1) h (1- γ ) x ( k) γ x ( k)) (1- h ) v i i ij i ij j i i The other agents l (l i) do not change opinion x ( k 1) x ( k) l Asymmetric update of information between i and j l

28 Weighting Coefficients h i 0,1] The weighting coefficients are given by h i 1 (1 λ i ) / d i if d i 1 0 otherwise where - d i degree of the vertex i (sum of # incoming edges to the node i and # outgoing edges from node i, also counting self loops) - λ i i-th entry of the sensitivity matrix Λ

29 Weighting Coefficients γ ij 0,1 The weighting coefficients γ ij are given by γ ij di (1 hi ) hi (1 λw i ii ) if i j, di 1 h i λw i ij if i j, di 1 hi 1 if i j, di 1 0 if i j, di 1

30 Undesired Oscillations The dynamics of the randomized gossip protocol x(k) oscillates and there is no convergence of the protocol!

31 Key Ingredient 2: Time Averaging Time averaging was introduced in the seventies to accelerate convergence of stochastic approximation algorithms

32 Time Average Gossip Opinion With time average we remove oscillations k 1 y( k) x( i) k 1 i0

33 Ergodicity and Limiting Behavior Theorem (convergence properties) Let stubbornness assumption hold The time average local opinions y(k) are mean-square ergodic and converge to x opd lim E [ y( k ) x ] 0 k opd 2 2 x opd = (I - Λ W) -1 (I-Λ) v P. Frasca, C. Ravazzi, R. Tempo, H. Ishii (2015)

34 Other Convergence Properties Randomized gossip protocol enjoys convergence w.p.1 Observation: randomized gossip protocol is a Markov jump system

are correlated New model defined using Kronecker products

35 Multidimensional Model of Opinions - 1 Motivations: Agents discuss two topics (soccer and tennis) Opinions are correlated New model defined using Kronecker products of stochastic matrices x(k+1) = (ΛW C) x(k) + ((I-Λ) I) v x(0)=v

36 Multidimensional Model of Opinions - 2 Extension of previous ergodicity results Given prejudices and final opinions, find correlation matrix C System is overdetermined Find an approximation of C, solving a convex regularized l 1 optimization problem S. Parsegov, A. Proskurnikov, R. Tempo, N. Friedkin (2015)

37 Centrality Measures in Social and Complex Networks

38 Network Centrality Measures How central is an individual in a social network? Degree Closeness Beetweenness PageRank

39 Degree Centrality Degree centrality: for each node count the number of incoming links

40 Closeness Centrality Closeness centrality: a node is more central if it is closer to most of the other nodes Defined as the total distance from all the other nodes 2 => 1 dist = 1 3 => 1 dist = 2 4 => 1 dist = 3 5 => 1 dist = 5 6 => 1 dist = 4 total = 15

41 Betweenness Centrality B 1 # shortest paths i j passing through 1, i j 1 # shortest paths i j, i j 1 2 => => => => => 4 1/2 total = 1/2 + 1/3 = 5/6 2 => 5 1/3 2 => => => => 6 0

42 PageRank Problem

PageRank for Oberwolfach PageRank is a numerical value in the interval [0,1] Using a PageRank checker we compute PageRank is Google s view of the importance of this page PageRank reflects our view of

43 PageRank for Oberwolfach PageRank is a numerical value in the interval [0,1] Using a PageRank checker we compute PageRank is Google s view of the importance of this page PageRank reflects our view of the importance of Web pages by considering more than 500 million variables and 2 billion terms. Pages that are considered important receive a higher PageRank and are more likely to appear at the top of the search results

44 Random Surfer Model Network consisting of servers (nodes) connected by directed communication links Web surfer moves along randomly following the hyperlink structure When arriving at a page with several outgoing links, one is chosen at random, then the random surfer moves to a new page, and so on

45 Graph Representation Directed graph with nodes (pages) and links representing the web Graph is constructed using crawlers and spiders moving continuously along the web Hyperlink matrix: column substochastic

46 Hyperlink Matrix Page 5 is a Dangling Node A / / 2 0 1/ / 2 0 1/ Example: pdf file with no hyperlink random surfer is stuck!

47 Benchmark Benchmark: Web Lincoln University, New Zealand 3756 nodes total #outgoing links H. Ishii, R. Tempo (2014)

48 Dangling Nodes Red dots outgoing links toward dangling nodes 3255 dangling nodes (85%) Blue dots are normal links White area corresponds to no-links

49 Easy Fix: Back Button Random surfer gets stuck when visiting a pdf file In this case the back button of the browser is used Easy fix: Add new links to make the matrix stochastic

50 Easy Fix: Add New Link We add a new outgoing link from page 5 to page / / 2 0 A 1/ / 2 1 1/ In the benchmark this fix increases the #links from to to 40646

51 Assumption: No Dangling Nodes Hyperlink matrix A is a nonnegative stochastic matrix (instead of substochastic)

52 Random Surfer Model and Markov Chains Random surfer model is represented as a Markov chain x( k 1) Ax( k) where x(k) is a probability vector x(k) [0,1] n and i x i (k) = 1 x i (k) represents the importance of the page i at time k

53 Convergence of the Markov Chain Question: Does the Markov chain converge to a stationary value x(k) x* for k representing the probability that the pages are visited? Answer: No Example: A x(0) x 1 (k) k

54 Teleportation Model Recall that the matrix A is a nonnegative stochastic matrix We introduce a different model Teleportation: After a while the random surfer gets bored and decides to jump to another page not directly connected to that currently visited New page may be geographically or content-based located far away

55 Convex Combination of Matrices Teleportation model is represented as a convex combination of matrices A and S/n 1 1 S = 1 1 T is a rank-one matrix S 1 vector with all entries equal to one 1 1 Consider a matrix M defined as M = (1 - m) A + m/n S m (0,1) where n is the number of pages The value m = 0.15 is used at Google

56 Matrix M M is a convex combination of two nonnegative stochastic matrices and m (0,1) M is a strictly positive stochastic matrix

57 Convergence of the Markov Chain Consider the Markov chain x(k+1) = M x(k) where M is a strictly positive stochastic matrix If i x i (0) = 1 convergence is guaranteed by Perron Theorem x(k) x* for k x* = M x* = [(1 - m) A + m/n S] x* m (0,1) Corresponding graph is strongly connected

58 PageRank: Bringing Order to the Web Rank n web pages in order of importance Ranking is provided by x* PageRank x* of the hyperlink matrix M is defined as x*=m x* where x* [0,1] n and i x i * = 1 S. Brin, L. Page (1998)

59 PageRank: Bringing Order to the Web Rank n web pages in order of importance Ranking is provided by x* PageRank x* of the hyperlink matrix M is defined as x*=m x* where x* [0,1] n and i x i * = 1 x* is the stationary distribution of the Markov Chain (steady-state probability that pages are visited is x* ) x* is a nonnegative unit eigenvector corresponding to the eigenvalue 1 of M

60 PageRank Computation

61 PageRank Computation with Power Method PageRank is computed with the power method x(k+1) = M x(k) PageRank computation requires iterations (40 in the benchmark) This computation takes about a week and it is performed centrally at Google once a month

62 Why m = 0.15? Asymptotic rate of convergence of power method is exponential and given by We have λ 2 λ 1 (M) = 1 2 (M) 1 - m = 0.85 For N it = 50 we have For N it = 100 we have Larger m implies faster convergence, but numerically unstable 1 1

63 PageRank Computation with Power Method / / 2 1/ 3 A 0 1/ 2 0 1/ 3 0 1/ 2 1/ 2 0 M m x* T

64 Size of the Web The size of M is more than 8 billion (and it is increasing)! Sparsity in the web: entries non-zero entries

65 Distributed Viewpoint More and more computing power is needed develop distributed algorithms for PageRank computation H. Ishii and R. Tempo (IEEE TAC 2011) W.-X. Zhao, H. F. Chen, H. Fang (IEEE TAC 2013) O. Fercoq, M. Akian, M. Bouhtou S. Gaubert (IEEE TAC 2013) H. Ishii, R. Tempo, E.-W. Bai (IEEE TAC 2013)

66 Conclusions: Ranking (Control) Journals

67 Ranking Journals: Impact Factor Impact Factor IF IF 2013 number citations in 2013 of articles published in number of articles published in Census period (2013) of one year and a window period ( ) of two years Remark: Impact Factor is a flat criterion (it does not take into account where the citations come from)

68 ISI Web of Knowledge

Ranking Journals: Eigenfactor Eigenfactor EF Ranking journals using ideas from PageRank computation in Google In Eigenfactor journals are

69 Ranking Journals: Eigenfactor Eigenfactor EF Ranking journals using ideas from PageRank computation in Google In Eigenfactor journals are considered influential if they are cited often by other influential journals What is the probability that a journal is cited? C. T. Bergstrom (2007)

70 2013 Impact Factor 2013 Eigenfactor TM 3.4 IEEE CSM CNR-IEIIT 1 Automatica IEEE TAC 2 IEEE TAC Automatica 3 SIAM J Contr & Opt Int J Rob Nonlin Contr 4 Syst & Contr Lett IEEE TCST 5 IEEE TCST J Proc Contr 6 Int J Contr Contr Eng Pract 7 Int J Rob Nonlin Contr Syst & Contr Lett 8 J Proc Contr SIAM J Contr & Opt 9 Contr Eng Pract Math Contr Sign Sys 10 IEEE CSM Int J Contr 11 Europ J Contr Europ J Contr 12 Math Contr Sig Sys 0.001

Uncertainty and Randomization

Uncertainty and Randomization The PageRank Computation in Google Roberto Tempo IEIIT-CNR Politecnico di Torino tempo@polito.it 1993: Robustness of Linear Systems 1993: Robustness of Linear Systems 16 Years