Randomization and Gossiping in Techno-Social Networks

Similar documents
Uncertainty and Randomization

The PageRank Computation in Google: Randomization and Ergodicity

Dinamiche di opinioni: un po di testardaggine non fa mai male

The PageRank Problem, Multi-Agent. Consensus and Web Aggregation

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports

Graph Models The PageRank Algorithm

A Note on Google s PageRank

Link Analysis Ranking

Topics in Social Networks: Opinion Dynamics and Control

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Math 304 Handout: Linear algebra, graphs, and networks.

ASOCIAL network is an important and attractive case study

Page rank computation HPC course project a.y

How does Google rank webpages?

Link Mining PageRank. From Stanford C246

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

ECEN 689 Special Topics in Data Science for Communications Networks

1998: enter Link Analysis

Novel Multidimensional Models of Opinion Dynamics in Social Networks

Distributed Optimization over Networks Gossip-Based Algorithms

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Computing PageRank using Power Extrapolation

Online Social Networks and Media. Link Analysis and Web Search

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Lecture: Local Spectral Methods (1 of 4)

Data Mining and Matrices

Google Page Rank Project Linear Algebra Summer 2012

Mathematical Properties & Analysis of Google s PageRank

Node Centrality and Ranking on Networks

Online Social Networks and Media. Link Analysis and Web Search

0.1 Naive formulation of PageRank

Link Analysis. Leonid E. Zhukov

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

eigenvalues, markov matrices, and the power method

Calculating Web Page Authority Using the PageRank Algorithm

Updating PageRank. Amy Langville Carl Meyer

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

The Google Markov Chain: convergence speed and eigenvalues

Majorizations for the Eigenvectors of Graph-Adjacency Matrices: A Tool for Complex Network Design

Slides based on those in:

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Link Analysis. Stony Brook University CSE545, Fall 2016

IR: Information Retrieval

Data Mining Recitation Notes Week 3

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

Node and Link Analysis

Monte Carlo methods in PageRank computation: When one iteration is sufficient

Intelligent Data Analysis. PageRank. School of Computer Science University of Birmingham

Applications of The Perron-Frobenius Theorem

Pr[positive test virus] Pr[virus] Pr[positive test] = Pr[positive test] = Pr[positive test]

Agreement algorithms for synchronization of clocks in nodes of stochastic networks

Web Structure Mining Nodes, Links and Influence

The Second Eigenvalue of the Google Matrix

Quantized Average Consensus on Gossip Digraphs

Data and Algorithms of the Web

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University

Algebraic Representation of Networks

On the mathematical background of Google PageRank algorithm

MATH36001 Perron Frobenius Theory 2015

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Ten good reasons to use the Eigenfactor TM metrics

CS6220: DATA MINING TECHNIQUES

Finding central nodes in large networks

Lecture 12: Link Analysis for Web Retrieval

Google Matrix, dynamical attractors and Ulam networks Dima Shepelyansky (CNRS, Toulouse)

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Application. Stochastic Matrices and PageRank

Complex Social System, Elections. Introduction to Network Analysis 1

Today. Next lecture. (Ch 14) Markov chains and hidden Markov models

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

Applications. Nonnegative Matrices: Ranking

Conditioning of the Entries in the Stationary Vector of a Google-Type Matrix. Steve Kirkland University of Regina

A hybrid reordered Arnoldi method to accelerate PageRank computations

Analysis of Google s PageRank

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Krylov Subspace Methods to Calculate PageRank

Analysis and Computation of Google s PageRank

MATH3200, Lecture 31: Applications of Eigenvectors. Markov Chains and Chemical Reaction Systems

COMPSCI 514: Algorithms for Data Science

CONVERGENCE ANALYSIS OF A PAGERANK UPDATING ALGORITHM BY LANGVILLE AND MEYER

Degree Distribution: The case of Citation Networks

Applications to network analysis: Eigenvector centrality indices Lecture notes

The Push Algorithm for Spectral Ranking

Computational Economics and Finance

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

Lesson Plan. AM 121: Introduction to Optimization Models and Methods. Lecture 17: Markov Chains. Yiling Chen SEAS. Stochastic process Markov Chains

Asymptotics, asynchrony, and asymmetry in distributed consensus

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

Transcription:

Randomization and Gossiping in Techno-Social Networks Roberto Tempo CNR-IEIIT Consiglio Nazionale delle Ricerche Politecnico ditorino roberto.tempo@polito.it

CPSN Social Network Layer humans Physical Layer GPS cyberphysical social networks

Techno-Social Networks Social networks (opinion dynamics) Centrality measures Technological networks (PageRank) Tools: randomization and gossiping Properties: ergodicity

Opinion Dynamics in Social Networks

Model of Opinion x is a numerical value representing the opinion that each agent (human) has about a specific topic Example: How much do you like soccer? Agents discuss the topic and exchange information with other agents

Stubborn and Open-Minded Agents Some agents are very stubborn Others are open-minded and willing to change their opinions Opinions oscillate around a mean value

Time Average Opinions Time average opinions do not show oscillations

Aggregation and Partial Consensus Each agent reaches a stable opinion which is not a global consensus Some agents aggregate into opinion clusters, others don t Need to model the opinions: bounded confidence models don t explain persistent disagreement

Friedkin and Johnsen Model of Opinions - 1 Discrete time model of opinions x(k+1) = ΛW x(k) + (I-Λ) v x(0) = v x is the belief or opinion (state) v is the prejudices (input) W interpersonal influences beetwen agents Λ (diag) sensitivity to opinion of other agents (weights) - W is row stochastic (W1 = 1)

Friedkin and Johnsen Model of Opinions - 2 Discrete time model of opinions x(k+1) = ΛW x(k) + (I-Λ) v x(0) = v endogeneously exogeneously W interpersonal influences beetwen agents Λ (diag) sensitivity to the opinion of other agents - W is row stochastic (W1 = 1) - Λ = I - diag(w)

Opinion Profile The opinion profile of agents is given by k-1 k j x( k) ( W ) ( W ) (I-) v j0 Question: Do the opinions converge to a stable opinion profile for k

Convergence of Opinion Dynamics Assumption (stubbornness): For any i, the i-th agent is either stubborn or is influenced (indirectly) by a stubborn agent This is N&S to establish convergence of opinions for k x opd = x(k) = (I - Λ W) -1 (I-Λ) v

Example (Friedkin and Johnsen) - 1 v 25 25 75 85 0.220 0.120 0.360.300.147.215.344.294 W 0 0 1 0.090.178.446.286 T diag.780.785 0.714 v = x(0) prejudices W strength of interactions agent 3 is stubborn Λ sensitivity x opd 60 60 75 75 T x opd final opinion

Example (Friedkin and Johnsen) - 2 Study opinion profile Red (stubborn) and cyan (open minded) agents reach a consensus Two distinct opinion clusters are formed Global consensus is not achieved

Model of Interpersonal Influences this model of social influence will be imperfect at some level it is obvious that interpersonal influences do not occur in the simultaneous way and that there are complex sequences of interpersonal influences in a group N.E. Friedkin and E.C. Johnsen (1999)

Global vs Local Information Interpersonal influences do not occur simultaneously Simultaneous access to the entire graph of opinions is not realistic No global exchange of information Agents discuss the topic within small groups (e.g. in pairs or in triples) Example: When a human needs to take a difficult decision about health (surgery or medical treatment), he/she discusses the matter within the family or friends

Key Point: Models for Information Exchange Consider directed graphs G (V,E) Synchronous model where all the agents (nodes) simultaneously exchange information through links Asynchronous model based on a local communication protocol (two agents)

Communications between Humans are becoming Increasingly Asynchronous Examples of asynchronous communications: text-based messages, email, bulletin boards, blogs, forum, They are delivered via web technology and they are independent of time and place Examples of synchronous communications: phone and conference calls which require humans to decide a common time

Randomized Algorithm Randomized Algorithm (RA): An algorithm that makes random choices during its execution to produce a result (it is an algorithm that may fail to provide the correct answer, but the probability of this event can be made arbitrarily small) set_r =1:0.01:3; for k =1:length(set_r) if (rand > 0.5) then a_opt(k) = hel(k); else a_opt(k) = 3.7; end if end

Randomization in Sociology Jon Elster: Randomization in individual and social decisions Importance of randomization for designing experiments Example: Decide which patients may be selected to receive a standard or a new treatment for a disease

Key Ingredient 1: Randomized Gossip Protocol Gossip protocol based on (uniform) edge randomization Let θ(k) E be a sequence of independent identically distributed random variables (clock) Can we recover the global solution using only local information? Need to establish convergence properties of this protocol

Randomized Algorithm based on Local Opinions -1 Gossip interaction: at time k directed link (i,j) E is randomly sampled according to a (uniform) distribution in E

Randomized Algorithm based on Local Opinions - 2 At time k agents i and j exchange information Agent i updates its opinion based on the its previous opinion, the opinion of agent j and the initial prejudices v i j

Randomized Algorithm based on Local Opinions - 3 Agent i changes opinion based on interactions with j x ( k 1) h (1- γ ) x ( k) γ x ( k)) (1- h ) v where h i 0,1] and γ ij 0,1 are given coefficients The new opinion is a convex combination of opinions and of prejudices ( i i ij i ij j i i (1- γ ) x ( k) γ x ( k) ij i ij j h ((1- γ ) x ( k) γ x ( k)) (1- h ) v i ij i ij j i i

Randomized Algorithm based on Local Opinions - 3 Agent i changes opinion based on interactions with j ( x ( k 1) h (1- γ ) x ( k) γ x ( k)) (1- h ) v i i ij i ij j i i where h i 0,1] and γ ij 0,1 are given coefficients The new opinion is a convex combination of opinions (1- γ ) x ( k) γ x ( k) ij i ij j

Randomized Algorithm based on Local Opinions - 3 Agent i changes opinion based on interactions with j ( x ( k 1) h (1- γ ) x ( k) γ x ( k)) (1- h ) v i i ij i ij j i i where h i 0,1] and γ ij 0,1 are given coefficients The new opinion is a convex combination of opinions and of prejudices h ((1- γ ) x ( k) γ x ( k)) (1- h ) v i ij i ij j i i

Randomized Algorithm based on Local Opinions - 4 Agent i changes opinion based on interactions with j ( x ( k 1) h (1- γ ) x ( k) γ x ( k)) (1- h ) v i i ij i ij j i i The other agents l (l i) do not change opinion x ( k 1) x ( k) l Asymmetric update of information between i and j l

Weighting Coefficients h i 0,1] The weighting coefficients are given by h i 1 (1 λ i ) / d i if d i 1 0 otherwise where - d i degree of the vertex i (sum of # incoming edges to the node i and # outgoing edges from node i, also counting self loops) - λ i i-th entry of the sensitivity matrix Λ

Weighting Coefficients γ ij 0,1 The weighting coefficients γ ij are given by γ ij di (1 hi ) hi (1 λw i ii ) if i j, di 1 h i λw i ij if i j, di 1 hi 1 if i j, di 1 0 if i j, di 1

Undesired Oscillations The dynamics of the randomized gossip protocol x(k) oscillates and there is no convergence of the protocol!

Key Ingredient 2: Time Averaging Time averaging was introduced in the seventies to accelerate convergence of stochastic approximation algorithms

Time Average Gossip Opinion With time average we remove oscillations k 1 y( k) x( i) k 1 i0

Ergodicity and Limiting Behavior Theorem (convergence properties) Let stubbornness assumption hold The time average local opinions y(k) are mean-square ergodic and converge to x opd lim E [ y( k ) x ] 0 k opd 2 2 x opd = (I - Λ W) -1 (I-Λ) v P. Frasca, C. Ravazzi, R. Tempo, H. Ishii (2015)

Other Convergence Properties Randomized gossip protocol enjoys convergence w.p.1 Observation: randomized gossip protocol is a Markov jump system

Multidimensional Model of Opinions - 1 Motivations: Agents discuss two topics (soccer and tennis) Opinions are correlated New model defined using Kronecker products of stochastic matrices x(k+1) = (ΛW C) x(k) + ((I-Λ) I) v x(0)=v

Multidimensional Model of Opinions - 2 Extension of previous ergodicity results Given prejudices and final opinions, find correlation matrix C System is overdetermined Find an approximation of C, solving a convex regularized l 1 optimization problem S. Parsegov, A. Proskurnikov, R. Tempo, N. Friedkin (2015)

Centrality Measures in Social and Complex Networks

Network Centrality Measures How central is an individual in a social network? Degree Closeness Beetweenness PageRank

Degree Centrality Degree centrality: for each node count the number of incoming links

Closeness Centrality Closeness centrality: a node is more central if it is closer to most of the other nodes Defined as the total distance from all the other nodes 2 => 1 dist = 1 3 => 1 dist = 2 4 => 1 dist = 3 5 => 1 dist = 5 6 => 1 dist = 4 total = 15

Betweenness Centrality B 1 # shortest paths i j passing through 1, i j 1 # shortest paths i j, i j 1 2 => 3 0 4 => 5 0 4 => 6 0 5 => 6 0 2 => 4 1/2 total = 1/2 + 1/3 = 5/6 2 => 5 1/3 2 => 6 0 3 => 4 0 3 => 5 0 3 => 6 0

PageRank Problem

PageRank for Oberwolfach PageRank is a numerical value in the interval [0,1] Using a PageRank checker we compute PageRank is Google s view of the importance of this page PageRank reflects our view of the importance of Web pages by considering more than 500 million variables and 2 billion terms. Pages that are considered important receive a higher PageRank and are more likely to appear at the top of the search results

Random Surfer Model Network consisting of servers (nodes) connected by directed communication links Web surfer moves along randomly following the hyperlink structure When arriving at a page with several outgoing links, one is chosen at random, then the random surfer moves to a new page, and so on

Graph Representation 1 2 3 5 4 Directed graph with nodes (pages) and links representing the web Graph is constructed using crawlers and spiders moving continuously along the web Hyperlink matrix: column substochastic

Hyperlink Matrix Page 5 is a Dangling Node 1 2 3 4 A 0 1 0 0 0 1/ 3 0 0 1/ 2 0 1/ 3 0 0 1/ 2 0 1/ 3 0 0 0 0 0 0 1 0 0 5 Example: pdf file with no hyperlink random surfer is stuck!

Benchmark Benchmark: Web Lincoln University, New Zealand 3756 nodes 31718 total #outgoing links H. Ishii, R. Tempo (2014)

Dangling Nodes Red dots outgoing links toward dangling nodes 3255 dangling nodes (85%) Blue dots are normal links White area corresponds to no-links

Easy Fix: Back Button Random surfer gets stuck when visiting a pdf file In this case the back button of the browser is used Easy fix: Add new links to make the matrix stochastic

1 2 3 5 4 Easy Fix: Add New Link We add a new outgoing link from page 5 to page 3 0 1 0 0 0 1/ 3 0 0 1/ 2 0 A 1/ 3 0 0 1/ 2 1 1/ 3 0 0 0 0 0 0 1 0 0 In the benchmark this fix increases the #links from to 31718 to 40646

Assumption: No Dangling Nodes Hyperlink matrix A is a nonnegative stochastic matrix (instead of substochastic)

Random Surfer Model and Markov Chains Random surfer model is represented as a Markov chain x( k 1) Ax( k) where x(k) is a probability vector x(k) [0,1] n and i x i (k) = 1 x i (k) represents the importance of the page i at time k

Convergence of the Markov Chain Question: Does the Markov chain converge to a stationary value x(k) x* for k representing the probability that the pages are visited? Answer: No Example: 0 1 0 0 0 0 1 0 A 0 0 0 1 1 0 0 0 0 1 x(0) 0 0 1 - x 1 (k) k

Teleportation Model Recall that the matrix A is a nonnegative stochastic matrix We introduce a different model Teleportation: After a while the random surfer gets bored and decides to jump to another page not directly connected to that currently visited New page may be geographically or content-based located far away

Convex Combination of Matrices Teleportation model is represented as a convex combination of matrices A and S/n 1 1 S = 1 1 T is a rank-one matrix S 1 vector with all entries equal to one 1 1 Consider a matrix M defined as M = (1 - m) A + m/n S m (0,1) where n is the number of pages The value m = 0.15 is used at Google

Matrix M M is a convex combination of two nonnegative stochastic matrices and m (0,1) M is a strictly positive stochastic matrix

Convergence of the Markov Chain Consider the Markov chain x(k+1) = M x(k) where M is a strictly positive stochastic matrix If i x i (0) = 1 convergence is guaranteed by Perron Theorem x(k) x* for k x* = M x* = [(1 - m) A + m/n S] x* m (0,1) Corresponding graph is strongly connected

PageRank: Bringing Order to the Web Rank n web pages in order of importance Ranking is provided by x* PageRank x* of the hyperlink matrix M is defined as x*=m x* where x* [0,1] n and i x i * = 1 S. Brin, L. Page (1998)

PageRank: Bringing Order to the Web Rank n web pages in order of importance Ranking is provided by x* PageRank x* of the hyperlink matrix M is defined as x*=m x* where x* [0,1] n and i x i * = 1 x* is the stationary distribution of the Markov Chain (steady-state probability that pages are visited is x* ) x* is a nonnegative unit eigenvector corresponding to the eigenvalue 1 of M

PageRank Computation

PageRank Computation with Power Method PageRank is computed with the power method x(k+1) = M x(k) PageRank computation requires 50-100 iterations (40 in the benchmark) This computation takes about a week and it is performed centrally at Google once a month

Why m = 0.15? Asymptotic rate of convergence of power method is exponential and given by We have λ 2 λ 1 (M) = 1 2 (M) 1 - m = 0.85 For N it = 50 we have 0.85 50 2.95 10-4 For N it = 100 we have 0.85 100 8.74 10-8 Larger m implies faster convergence, but numerically unstable 1 1

PageRank Computation with Power Method 1 4 2 3 0 0 0 1/ 3 1 0 1/ 2 1/ 3 A 0 1/ 2 0 1/ 3 0 1/ 2 1/ 2 0 M m 0.15 0.038 0.037 0.037 0.321 0.887 0.037 0.462 0.321 0.037 0.462 0.037 0.321 0.037 0.462 0.462 0.037 x* 0.12 0.33 0.26 0.29 T

Size of the Web The size of M is more than 8 billion (and it is increasing)! Sparsity in the web: 10 20 entries 10 12 non-zero entries

Distributed Viewpoint More and more computing power is needed develop distributed algorithms for PageRank computation H. Ishii and R. Tempo (IEEE TAC 2011) W.-X. Zhao, H. F. Chen, H. Fang (IEEE TAC 2013) O. Fercoq, M. Akian, M. Bouhtou S. Gaubert (IEEE TAC 2013) H. Ishii, R. Tempo, E.-W. Bai (IEEE TAC 2013)

Conclusions: Ranking (Control) Journals

Ranking Journals: Impact Factor Impact Factor IF IF 2013 number citations in 2013 of articles published in 2011-2012 number of articles published in 2011-2012 Census period (2013) of one year and a window period (2011-2012) of two years Remark: Impact Factor is a flat criterion (it does not take into account where the citations come from)

ISI Web of Knowledge

Ranking Journals: Eigenfactor Eigenfactor EF Ranking journals using ideas from PageRank computation in Google In Eigenfactor journals are considered influential if they are cited often by other influential journals What is the probability that a journal is cited? C. T. Bergstrom (2007)

2013 Impact Factor 2013 Eigenfactor TM 3.4 IEEE CSM CNR-IEIIT 1 Automatica 0.052 3.2 IEEE TAC 2 IEEE TAC 0.051 3.1 Automatica 3 SIAM J Contr & Opt 0.016 2.6 Int J Rob Nonlin Contr 4 Syst & Contr Lett 0.013 2.5 IEEE TCST 5 IEEE TCST 0.013 2.2 J Proc Contr 6 Int J Contr 0.009 1.9 Contr Eng Pract 7 Int J Rob Nonlin Contr 0.009 1.9 Syst & Contr Lett 8 J Proc Contr 0.008 1.4 SIAM J Contr & Opt 9 Contr Eng Pract 0.008 1.2 Math Contr Sign Sys 10 IEEE CSM 0.004 1.1 Int J Contr 11 Europ J Contr 0.002 0.8 Europ J Contr 12 Math Contr Sig Sys 0.001