Greedy Maximization Framework for Graph-based Influence Functions

Similar documents
Diffusion of Innovation and Influence Maximization

Distance-Based Influence in Networks: Computation and Maximization

CSCI 3210: Computational Game Theory. Cascading Behavior in Networks Ref: [AGT] Ch 24

9. Submodular function optimization

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg, Éva Tardos SIGKDD 03

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

Monotone Estimation Framework and Applications for Scalable Analytics of Large Data Sets

Submodular Functions Properties Algorithms Machine Learning

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Diffusion of Innovation

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 24: Introduction to Submodular Functions. Instructor: Shaddin Dughmi

arxiv: v4 [cs.si] 21 May 2017

Submodular Functions and Their Applications

Minimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network

A Note on Maximizing the Spread of Influence in Social Networks

Influence Maximization in Continuous Time Diffusion Networks

Monotone Submodular Maximization over a Matroid

Symmetry and hardness of submodular maximization problems

Profit Maximization for Viral Marketing in Online Social Networks

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Web Structure Mining Nodes, Links and Influence

Probability Models of Information Exchange on Networks Lecture 6

Query and Computational Complexity of Combinatorial Auctions

Time-Critical Influence Maximization in Social Networks with Time-Delayed Diffusion Process

Performance Evaluation. Analyzing competitive influence maximization problems with partial information: An approximation algorithmic framework

Jure Leskovec Stanford University

Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users?

Robust Influence Maximization

Approximability of Adaptive Seeding under Knapsack Constraints

Welfare Maximization with Friends-of-Friends Network Externalities

More Approximation Algorithms

On Multiset Selection with Size Constraints

Submodularity in Machine Learning

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

1 Submodular functions

In Search of Influential Event Organizers in Online Social Networks

On the Submodularity of Influence in Social Networks

Subset Selection under Noise

Time sensitive influence maximization in social networks

Combinatorial Multi-Armed Bandit: General Framework, Results and Applications

An Online Algorithm for Maximizing Submodular Functions

Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization

Multi-Round Influence Maximization

Applications of Submodular Functions in Speech and NLP

Learning Influence Functions from Incomplete Observations

1 Maximizing a Submodular Function

Restricted Strong Convexity Implies Weak Submodularity

Maximizing the Spread of Influence through a Social Network

Time Constrained Influence Maximization in Social Networks

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan

Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model

Influence Maximization in Dynamic Social Networks

Computing Competing Cascades on Signed Networks

Clustering Perturbation Resilient

A Submodular Framework for Graph Comparison

ON SPATIAL GOSSIP ALGORITHMS FOR AVERAGE CONSENSUS. Michael G. Rabbat

Optimization of Submodular Functions Tutorial - lecture I

Discrete Optimization in Machine Learning. Colorado Reed

ECS 253 / MAE 253, Lecture 13 May 15, Diffusion, Cascades and Influence Mathematical models & generating functions

Analytically tractable processes on networks

Gross Substitutes Tutorial

Subset Selection under Noise

Simultaneous Influencing and Mapping for Health Interventions

Online Social Networks and Media. Link Analysis and Web Search

Locally Adaptive Optimization: Adaptive Seeding for Monotone Submodular Functions

Influence Spreading Path and its Application to the Time Constrained Social Influence Maximization Problem and Beyond

Active Learning and Optimized Information Gathering

Online Social Networks and Media. Link Analysis and Web Search

Semi-Markov/Graph Cuts

Scalable Influence Estimation in Continuous-Time Diffusion Networks

Monitoring Stealthy Diffusion

CSE541 Class 22. Jeremy Buhler. November 22, Today: how to generalize some well-known approximation results

Topical Sequence Profiling

Efficient Subgraph Matching by Postponing Cartesian Products. Matthias Barkowsky

Heat Kernel Based Community Detection

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Planning and Model Selection in Data Driven Markov models

WITH the popularity of online social networks, viral. Boosting Information Spread: An Algorithmic Approach. arxiv: v3 [cs.

A Fast Approximation for Influence Maximization in Large Social Networks

Minimum-sized Positive Influential Node Set Selection for Social Networks: Considering Both Positive and Negative Influences

Epidemics and information spreading

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

Viral Marketing and the Diffusion of Trends on Social Networks

On the Complexity of Computing an Equilibrium in Combinatorial Auctions

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms

Learning to Predict Opinion Share in Social Networks

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

Optimizing Roadside Unit (RSU) Placement in Vehicular CPS

The Ties that Bind Characterizing Classes by Attributes and Social Ties

Information Dissemination in Social Networks under the Linear Threshold Model

Robust Optimization for Non-Convex Objectives

Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization

Influence Maximization with ε-almost Submodular Threshold Functions

Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm

In-core computation of distance distributions and geometric centralities with HyperBall: A hundred billion nodes and beyond

From Competition to Complementarity: Comparative Influence Diffusion and Maximization

All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

CSC 373: Algorithm Design and Analysis Lecture 12

Transcription:

Greedy Maximization Framework for Graph-based Influence Functions Edith Cohen Google Research Tel Aviv University HotWeb '16 1

Large Graphs Model relations/interactions (edges) between entities (nodes) Explicit: Call detail, email exchanges, Web links, social networks (friend, follow, like), commercial transactions, video views, Implicit: Images, search queries, (edges are shared features or close embedding vectors) Some nodes are more central than others

Diffusion in Networks Edges model direct connections between entities Diffusion: Contagion, information (news, opinions), can spread from seed nodes through edges to nodes multiple hops away Influence: A measure of the combined power/ importance/ coverage of a set of seed nodes. (according to the diffusion process)

Influence in Networks Applications : Inf S = quality of a seed entities S as: anchors, representatives, cluster centers, hubs in distribution networks, candidates for active learning of labels/properties, seeds for viral marketing Computational Problems: Influence queries: Given seed set S, compute (approximate) Inf(S) Influence maximization arg max Inf S * *,- Find a set of entities with maximum influence for its size. or With a budget s, who should we select?

Overview of contributions A unified model of graph-based influence functions: Includes functions proposed in previous work and extends to allow general submodular aggregations. A meta-algorithm for influence maximization: Modular design, near linear computation, statistical worst-case guarantees on approximation quality HotWeb '16 5

Unified model: Pairwise utility to influence Graph structure, diffusion process pairwise utility u 56 of node i to node j The utility of a seed set S to a node j: u *6 = aggregate u <= 5 * e.g. aggregate = max The influence of seed set S is the sum over j of the utility of S to j Inf S =? u *6 =? aggregate 6 6 5 * Centrality: Influence of a single node Inf i = u 56 6 u <=

Utility of S to j Aggregation functions u *6 = aggregate u <= 5 * Max: value equal to that the utility of the most useful seed node u *6 = 5 Sum: The more the merrier u *6 = 5 + 4 + 2 Top-2: sum of top two seed utility values u *6 = 5 + 4 (limited capacity) S = {,, } +diminishing return u *6 = 5 + H I 4 5 2 4 Submodular top-l : Up to top l seeds contribute, non-increasing marginal contribution Influence function is submodular and monotone When S = 1 (centrality): All aggregate are the same u *6

Pairwise utility from graph structure Shorter paths, more paths, stronger edges on paths higher u ij Ways to define utility u ij from graph structure: Reachability u 56 = 1 i j [Kempe Kleinberg Tardos 2003]++ Distance u 56 = α(d 56 ) with decaying α [Bavelas 1948]++ [CK 2004] [Bloch Jackson 2007]++ Threshold: u 56 = 1 d 56 T [Gomez Rodriguez et al ICML 11] [Du et al NIPS 13]++.. Reverse-rank u 56 = α(π 65 ) [Korn Muthu 01, Buchnik C 16] Survival time [ C 16] (inspired by survival analysis) + randomized models to generate edge lengths/presence [Kempe Kleinberg Tardos KDD 2003, Gomez Rogriguez et al ICML 11, Abraho et al KDD13, Cohen et al COSN 13, Du et al NIPS 13]

Simplest Model: Reachability Utility: u 56 = I 6 5 Inf S =? u *6 aggregate=max: u *6 = max 6 * u 56 = I 6 * -.`.6 5 = i j S, j i 6 = #nodes reachable from at least one node in S. Inf = 5 HotWeb 2016 9

Simplest Model: Reachability + max aggregation Inf abc = 9 Submodular and monotone! HotWeb 2016 10

Reachability + top-l submodular aggregation Utility: u 56 = I 6 5 Inf S =? u *6 6 u *6 = f(#(j S s. t. j i) ) f monotone concave Inf HotWeb 2016 11

Randomized edge presence Utility u 56 should decrease with path length and increase with path multiplicity. Independent Cascade (IC) model [KKT 03]: Edge e active with probability p i (independent) deterministic = randomized

Distance-based Influence Max aggregate Utility: u <= = e qr st u *6 = e qu vw Inf S = u *6 6 Inf = 1 + x y + H y z + H y { + H y

Distance-based Influence Max aggregate Utility: u <= = e qr st u *6 = e qu vw Inf S = u *6 6 Inf = 2 + } y + I y z + H y { + H y

Randomized edge lengths Utility u 56 should decrease with path length and increase with path multiplicity. Randomized: Edge lengths Exp[w y ] drawn independently from Weibull/ Exponential distribution deterministic = randomized

Reverse-rank Influence (special case) reverse NN Max aggregate Utility: u <= = I 5, (6) u *6 = I = Inf S = 6 u *6 Inf =2

Reverse-rank Influence π 65 : position of i in increasing distances from j Utility: u <= = α(π 65 ) π 65 : rank of by j with u <= = H ˆw : 4 4 2 3 3 Inf = 1 + 1 2 + 3 1 3 + +4 1 4 + 1 5 + 2 1 8 4 4 8 8 5 3 1

Survival time Influence Each edge has weight that corresponds to its lifetime. t 56 maximum t such that j i when using edges with lifetime t (connectivity survival time) 1 1 5 5 Utility: u <= = t <= 5 Inf = 4 5 + 4 1 1 0 5 1 2 5 5 2 2 1 5 1 0 2 5 5 0 0

Overview of contributions A unified model of graph-based influence functions: Includes functions proposed in previous work and extends to allow general submodular aggregations. A meta-algorithm for influence maximization: Modular design, near linear computation, statistical worst-case guarantees on approximation quality HotWeb '16 19

Influence maximization Given s, find a set of seed nodes S of size s with maximum influence arg max Inf S *,- NP hard, even to approximate [Feige 98] Influence function is submodular and monotone Greedy algorithm is polynomial with approximation ratio 1 1 H - H > 1 of opt [NWF 78] - y Greedy iteration: Select i S with maximum Inf S {i} Inf S S S {i} Greedy sequence approximates the full size/quality tradeoff But Exact greedy too slow - need near-linear algorithms

Meta-SKIM Sketch Based Influence Maximization Computes an approximate greedy sequence. Key property: Approximate the marginal influence of nodes to identify approximate maximizer at each iteration. Scales by maintaining and updating weighted samples (sketches) of marginal influence sets.

Meta-SKIM influence maximization SKIM: reachability+max [CDPW ICDM 14] General decay + distances +max [CDPW 15] Threshold-decay+ reverse-rank + max [BC Sigmetrics 16] Scales to networks with 10 edges, actual approximation within few percent Meta-SKIM: unifies and generalizes previous work General decay (with distance, reverse rank, ) Reachability, distances, reverse ranks, survival threshold Submodular top-l aggregations Inherits scalability, statistical guarantees, computation bounds HotWeb 2016 22

Meta-SKIM Maintain weighted samples of marginal influence sets of nodes. Repeat: Sample until estimates are accurate for near-maximizers of marginal influence. Add the approximate maximizer to seed set. Update residual problem Approximation ratio guarantee: 1 1 H - - ε times Opt Near linear worst-case bound on computation!!

Meta-SKIM Maintain weighted samples of marginal influence sets of nodes. Repeat: Sample until estimates are accurate for near-maximizers of marginal influence. Add the approximate maximizer to seed set. Update residual problem Modular: Specified through oracle access to utility matrix Sampling uses reverse sorted access oracle: from j returns nodes in order of non-increasing utility u 56. Implemented as graph search Updates use forward search oracle from a new seed. Returns all nodes with updated marginal utility. Also a graph search.

Meta-SKIM Maintain weighted samples of marginal influence sets of nodes. Repeat: Sample until estimates are accurate for near-maximizers of marginal influence. Add the approximate maximizer to seed set. Update residual problem Randomization handled using multiple MC simulations and optimizing for the average over simulations

Influence vs. #seeds: full approx greedy sequence IC Model: Reach+ max aggregation + randomization [CDPW ICDM 2014] data sets from SNAP Implementation by T. Pajor (available) 26

Influence vs. #seeds: full approx greedy sequence Distance utility with harmonic or exponential decay, max aggregation +randomization [CDPW 2015] data sets from SNAP Implementation by T. Pajor HotWeb 2016 27

Influence vs. #seeds: full approx greedy sequence Reverse Rank Utility, threshold decay (with different T)!(!"#'!"#$%&'()$'*+"+,!"#&!"#%!"#$ 45("" 45(""" 45(""""!"!(!("!(""!("""!(""""!("""""!()("! *+,-./!01!2..3!*03.2 [BuchnikC Sigmetrics 2016] Live Journal data set from SNAP Implementation by E. Buchnik (available) 28

Summary of contributions Unified model of graph-based influence functions Influence functions specified by pairwise utility values (reach/distance/reverserank/survival; decay function ; randomized generation) Submodular aggregation function of seeds utility Meta-SKIM Algorithm: Compute an approximate greedy maximizing sequence using near linear computation for all functions Follow up: Applications (seed selection for active learning,...) modular implementation HotWeb 2016 29

HotWeb 2016 30