Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg, Éva Tardos SIGKDD 03

Similar documents
Maximizing the Spread of Influence through a Social Network

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

CSCI 3210: Computational Game Theory. Cascading Behavior in Networks Ref: [AGT] Ch 24

Diffusion of Innovation and Influence Maximization

On the Submodularity of Influence in Social Networks

Cascading Behavior in Networks: Algorithmic and Economic Issues

9. Submodular function optimization

Probability Models of Information Exchange on Networks Lecture 6

Viral Marketing and the Diffusion of Trends on Social Networks

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Greedy Maximization Framework for Graph-based Influence Functions

Lecture 11 October 11, Information Dissemination through Social Networks

Web Structure Mining Nodes, Links and Influence

A Note on Maximizing the Spread of Influence in Social Networks

Submodularity of Infuence in Social. Networks: From Local to Global

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

arxiv: v4 [cs.si] 21 May 2017

Time-Critical Influence Maximization in Social Networks with Time-Delayed Diffusion Process

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

1 Maximizing a Submodular Function

Diffusion of Innovation

Submodular Functions Properties Algorithms Machine Learning

Performance Evaluation. Analyzing competitive influence maximization problems with partial information: An approximation algorithmic framework

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan

ECS 289 F / MAE 298, Lecture 15 May 20, Diffusion, Cascades and Influence

CSE541 Class 22. Jeremy Buhler. November 22, Today: how to generalize some well-known approximation results

Diffusion of Innovations in Social Networks

Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model

Time Constrained Influence Maximization in Social Networks

Learning to Predict Opinion Share in Social Networks

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

Tractable Models for Information Diffusion in Social Networks

Viewing the minimum dominating set and maximum coverage problems motivated by word of mouth marketing in a problem decomposition context

Monotone Submodular Maximization over a Matroid

Explaining Snapshots of Network Diffusions: Structural and Hardness Results

From Competition to Complementarity: Comparative Influence Diffusion and Maximization

Time sensitive influence maximization in social networks

Marketing Impact on Diffusion in Social Networks

Influence Maximization in Dynamic Social Networks

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 24: Introduction to Submodular Functions. Instructor: Shaddin Dughmi

Influence Spreading Path and its Application to the Time Constrained Social Influence Maximization Problem and Beyond

Information Dissemination in Social Networks under the Linear Threshold Model

A Game-theoretic Analysis of a Competitive Diffusion Process over Social Networks

Submodular Functions and Their Applications

On the Efficiency of Influence-and-Exploit Strategies for Revenue Maximization under Positive Externalities

Multi-Round Influence Maximization

Minimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network

Least cost influence propagation in (social) networks

Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization

Info-Cluster Based Regional Influence Analysis in Social Networks

WITH the recent advancements of information technologies,

Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm

Monitoring Stealthy Diffusion

CO759: Algorithmic Game Theory Spring 2015

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

Influence Maximization in Continuous Time Diffusion Networks

Tightness of LP Relaxations for Almost Balanced Models

Computing Competing Cascades on Signed Networks

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

Analytically tractable processes on networks

Monitoring Stealthy Diffusion

Minimum-sized Positive Influential Node Set Selection for Social Networks: Considering Both Positive and Negative Influences

ECS 253 / MAE 253, Lecture 15 May 17, I. Probability generating function recap

Influence Maximization in Undirected Networks

WITH the popularity of online social networks, viral. Boosting Information Spread: An Algorithmic Approach. arxiv: v3 [cs.

How to influence people with partial incentives

Simultaneous Influencing and Mapping for Health Interventions

Machine Learning

Activity Maximization by Effective Information Diffusion in Social Networks

Discrete Optimization in Machine Learning. Colorado Reed

Project in Computational Game Theory: Communities in Social Networks

Epidemics and information spreading

ECS 253 / MAE 253, Lecture 13 May 15, Diffusion, Cascades and Influence Mathematical models & generating functions

CSI 445/660 Part 3 (Networks and their Surrounding Contexts)

Online Social Networks and Media. Opinion formation on social networks

Approximability of Adaptive Seeding under Knapsack Constraints

Query and Computational Complexity of Combinatorial Auctions

Dynamic Influence Maximization Under Increasing Returns to Scale

CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis

CONTAINMENT OF MISINFORMATION PROPAGATION IN ONLINE SOCIAL NETWORKS WITH GIVEN DEADLINE

Stability of Influence Maximization

In Search of Influential Event Organizers in Online Social Networks

Online Learning, Mistake Bounds, Perceptron Algorithm

Using first-order logic, formalize the following knowledge:

Stability and Robustness in Influence Maximization

Marketing Impact on Diffusion in Social Networks

Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization

1 Submodular functions

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

Pengju

Influence Maximization in Social Networks When Negative Opinions May Emerge and Propagate

1 Primals and Duals: Zero Sum Games

Profit Maximization for Viral Marketing in Online Social Networks

Information Aggregation in Networks

STA 247 Solutions to Assignment #1

Machine Learning

A necessary and sufficient condition for the existence of a spanning tree with specified vertices having large degrees

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

An Online Algorithm for Maximizing Submodular Functions

1 More finite deterministic automata

Transcription:

Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg, Éva Tardos SIGKDD 03

Influence and Social Networks Economics, sociology, political science, etc. all have studied and modeled behaviors arising from information Online Undoubtedly we are influenced by those within our social context

Why study diffusion? Influence models have been studied for years Original mathematical models by Schelling ( 70, 78) & Granovetter 78 Viral Marketing Strategies modeled by Domingos & Richardson 01 Not just about maximizing revenue Can study diseases or contagions (medicine, health, etc.) The spread of beliefs and/or ideas (sociology, economics, etc.) On the CS side, need to develop fast and efficient algorithms that seek to maximize the spread of influence

Diffusion Models Two models Linear Threshold Independent Cascade Operation: Social Network G represented as a directed graph Individual nodes are active (adopter of innovation ) or inactive Monotonicity: Once a node is activated, it can never deactivate Both work under the following general framework: Start with initial set of active nodes A0 Process runs for t steps and ends when no more activations are possible

So what s the problem? Influence of a set of nodes A, denoted σ(a) Expected number of active nodes at the end of the process The Influence Maximization Problem asks: For a parameter k, find a k-node set of maximum influence Meaning, I give you k (i.e. budget) and you give me set A that maximizes σ(a) So, we are solving a constrained maximization problem with σ(a) as the objective function Determining the optimum set is NP-hard

The Linear Threshold Model A node v is influenced by each neighbor w according to a weight b v, w : w neighbors of v b v, w 1 Each node v chooses a threshold uniformly at random θ v ~ U [0,1] So, θ v represents the weighted fraction of v s neighbors that must become active in order to activate v. In other words, v will become active when at least θ v become active: w active neighbors of w b v, w θ v

Linear Threshold Model: Example Rock 0.2 Can t go any more! 0.3 Lida Inactive Node Hadi 0.1 0.4 Neal 0.5 0.2 0.2 Active Node Activation Threshold Neighbor s Faez 0.2 0.6 Saba

The Independent Cascade Model When node v becomes active, it is given a single chance to activate each currently inactive neighbor w Succeeds with a probability p v,w (system parameter) Independent of history This probability is generally a coin flip (U [0,1]) If v succeeds, then w will become active in step t+1; but whether or not v succeeds, it cannot make any further attempts to activate w in subsequent rounds. If w has multiple newly activated neighbors, their attempts are sequenced in an arbitrary order.

Independent Cascade Model: Example Rock 0.2 Can t go any more! 0.3 Lida Inactive Node 0.1 Neal 0.2 0.2 Newly Active Node Perm Active Node Hadi 0.4 0.5 Faez Successful Roll Failed Roll 0.2 0.6 *flip coin* Saba

Let s begin! Theorem 2.1 For a non-negative, monotone, submodular function f, let S be a set of size k obtained by selecting elements one at a time, each time choosing an element that provides the largest marginal increase in the function value. Let S* be a set that maximizes the value of f over all k-element sets. Then f S 1 1/e f(s ); in other words, S provides a 1 1/e -approximation. In short, f(s) needs to have the following properties: Non-negative Monotone: f S v f(s) Submodular

Let s talk about submodularity A function f is submodular if it satisfies a natural diminishing returns property the marginal gain from adding an element to a set S is at least as high as the marginal gain from adding the same element to a superset of S Or more formally: f S v f S f T v f T v and S T For our case, even though the problem remains NP-hard, we will see how a greedy algorithm can yield optimum within 1 1/e

alt+tab Refer to: Tutorial on Submodularity in Machine Learning and Computer Vision by Stefanie Jagelka and Andreas Krause More (great) references available at www.submodularity.org We will look at a short example about placing sensors around a house (and marginal yield)

Proving Submodularity for I.C. Model Theorem 2.2: For an arbitrary instance of the Independent Cascade Model, the resulting influence function σ( ) is submodular. Problems: In essence, what increase do we get in the expected number of overall activations when we add v to the set A? This gain is very difficult to analyze because of the form σ(a) takes I.C. Model is underspecified no defined order in which newly activated notes in step t will attempt to activate neighbors

0.1 0.2 0.3 0.4 0.2 0.5 0.2 0.6 0.2 In the original I.C. model, we flip a coin to determine if the path from v to it s neighbors (w) should be taken But note that this probability is not dependent on any factor within the model Idea: Why not just pre-flip all coins from the start and store the outcome to be revealed in the event that v is activated (while w is still inactive)? View blue arrows as live View red arrows as blocked Claim 2.3 Active nodes are reachable via live-edge Reachability

Proving Submodularity for I.C. Model Let X be collection of coin flips on edges, and R(v, X) be the set of all nodes that can be reached from v on a path consisting entirely of live edges. We can obtain # of nodes reachable from any node in A So σ x A = vεa R(v, X) Assume S T (two sets of nodes) and consider: σ x S {v} σ x S Equal to the # of elements in R v, X that aren t already in vεa R v, X Therefore, it s at least as large as the # of elements in R v, X vεa R v, X Gives: σ x S v σ x S σ x T v σ x (T) σ A = outcomes X Prob X σ x A Note: Non-negative linear combinations of submodular functions is also submodular, which is why σ is submodular.

Fix Blue Graph G; G(S) are nodes reachable from S in G By Submodularity: for S T G(S) G T v G T G S v G(S) G S v G(S) nodes reachable from S v but NOT from S We see submodularity criterion satisfied, therefore G is submodular. G G(T) T S σ S = G Prob G is Blue Graph G g S

Proving Submodularity for the L.T. Model Theorem 2.5: For an arbitrary instance of the Linear Threshold Model, the resulting influence function σ( ) is submodular. For I.C., we constructed an equivalency process to resolve the outcomes of some random choices. However, L.T. assumes pre-defined thresholds, therefore the number of activated nodes is not (in general) a submodular function of the targeted set. Idea: Have each node choose 1 edge with activation probability = edge weight Lets us translate an L.T. model to I.C. model For this fixed graph, we can re-apply the reachability concept (same as I.C.) The proof is more about proving the above reduction more-so than submod.

What about f(s)? We know f(s) is non-negative, monotone, and submodular Can utilize a greedy hill-climbing strategy Start with an empty set, and repeatedly add elements that gives the maximum marginal gain Simulating the process and sampling the resulting active sets yields approximations close to real σ(a) Generalization of algorithm provides approximation close to 1 1/e Better techniques left for you to discover!

Experiments The Network Data Collaboration graph obtained from co-authorships in papers from arxiv s high-energy physics theory section Claim: co-authorship networks capture many key features Simple settings of the influence parameters For each paper with 2 or more authors, edge was placed between them Resulting graph has 10,748 nodes with edges between ~53,000 pairs of nodes Also resulted in numerous parallel edges but kept to simulate stronger social ties

Experiments - Models Use # parallel edges to determine edge weights: L.T.: edge(u,v) = c u,v /d v edge(v,u) = c u,v /d u Independent Cascade Model: Trial 1: For nodes u,v, u has a total probability of 1 1 p c u,v of activating v (for p = 1% and 10%) weighted cascade edge from u to v assigned probability 1/d v of activating v Compare greedy algorithm with: Distance Centrality, Degree Centrality, and Random Nodes Simulate the process 10,000 times for each targeted set, re-choosing thresholds or edge outcomes pseudo-randomly from [0, 1] every time

Results: Models side-by-side Linear Threshold Model Independent Cascade Model

Independent Cascade Model (sensitivity) probability = 1% probability = 10%

Addendum Both models discussed are NP-hard (Theorems 2.4, 2.7) Vertex Cover & Set Cover problems Generality Models L.T. Model: Same as discussed; S = v s neighbors & f S = uεs b v, w I.C. Model: Same as discussed; S = v s neighbors that have already tried (and failed) to activate v; so we have p v u, S instead. Method to convert between them Non-Progressive Process What if nodes can deactivate? At each step t, each node v chooses a new value θ v (t) ~ U 0,1 Node v will be activated in step t if f v S θ v (t) Where again S = set of v s neighbors active in step t-1 General Marketing Strategies For m ε M set of marketing actions available, each one can affect some subset of nodes more Increases likelihood of activating the node without deterministically ensuring activation

(Some) Releated Works Mining the Network Value of Customers (Domingos & Richardson) Efficient Influence Maximization in Social Networks (Chen et. al) Maximizing Social Influence in Nearly Optimal Time (Borgs et. al) How to Influence People with Partial Incentives (Demaine, Hajiaghayi et. al)

Thank You!