CS224W: Analysis of Networks Jure Leskovec, Stanford University

Similar documents
Social Influence in Online Social Networks. Epidemiological Models. Epidemic Process

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Jure Leskovec Stanford University

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan

Epidemics and information spreading

SIS epidemics on Networks

Machine Learning and Modeling for Social Networks

Linear Algebra Methods for Data Mining

CSCI 3210: Computational Game Theory. Cascading Behavior in Networks Ref: [AGT] Ch 24

Diffusion of Innovation and Influence Maximization

Stochastic modelling of epidemic spread

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

Diffusion of information and social contagion

ECS 253 / MAE 253, Lecture 15 May 17, I. Probability generating function recap

Thursday. Threshold and Sensitivity Analysis

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

Mathematical modelling and controlling the dynamics of infectious diseases

Three Disguises of 1 x = e λx

Stochastic modelling of epidemic spread

Inferring the origin of an epidemic with a dynamic message-passing algorithm

Modeling with Exponential Functions

Influence Maximization in Dynamic Social Networks

A note on modeling retweet cascades on Twitter

Explosion in branching processes and applications to epidemics in random graph models

CS 6604: Data Mining Large Networks and Time-series. B. Aditya Prakash Lecture #8: Epidemics: Thresholds

Epidemics in Complex Networks and Phase Transitions

ECS 289 F / MAE 298, Lecture 15 May 20, Diffusion, Cascades and Influence

Mathematical Modeling and Analysis of Infectious Disease Dynamics

Die-out Probability in SIS Epidemic Processes on Networks

Epidemics on networks

A Note on the Spread of Infectious Diseases. in a Large Susceptible Population

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Toward Understanding Spatial Dependence on Epidemic Thresholds in Networks

GEMF: GENERALIZED EPIDEMIC MODELING FRAMEWORK SOFTWARE IN PYTHON

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

ECS 289 / MAE 298, Lecture 7 April 22, Percolation and Epidemiology on Networks, Part 2 Searching on networks

Mathematical Epidemiology Lecture 1. Matylda Jabłońska-Sabuka

(e) Use Newton s method to find the x coordinate that satisfies this equation, and your graph in part (b) to show that this is an inflection point.

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

LAW OF LARGE NUMBERS FOR THE SIRS EPIDEMIC

Non-Linear Models Cont d: Infectious Diseases. Non-Linear Models Cont d: Infectious Diseases

Web Structure Mining Nodes, Links and Influence

Friends or Foes: Detecting Dishonest Recommenders in Online Social Networks

Lecture VI Introduction to complex networks. Santo Fortunato

Global Stability of a Computer Virus Model with Cure and Vertical Transmission

Analysis and Control of Epidemics

Dynamical models of HIV-AIDS e ect on population growth

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Supplementary Information Activity driven modeling of time varying networks

Epidemic model for influenza A (H1N1)

Diffusion of Innovation

Section 8.1 Def. and Examp. Systems

Models of Infectious Disease Formal Demography Stanford Summer Short Course James Holland Jones, Instructor. August 15, 2005

ECS 253 / MAE 253, Lecture 13 May 15, Diffusion, Cascades and Influence Mathematical models & generating functions

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg, Éva Tardos SIGKDD 03

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Exam 2. Principles of Ecology. March 25, Name

Applications in Biology

Spotlight on Modeling: The Possum Plague

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Australian Journal of Basic and Applied Sciences

Structural Trend Analysis for Online Social Networks

The Exponential function f with base b is f (x) = b x where b > 0, b 1, x a real number

arxiv:cond-mat/ v2 [cond-mat.stat-mech] 7 Jan 2000

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach

Probability Models of Information Exchange on Networks Lecture 6

Time varying networks and the weakness of strong ties

CS341 info session is on Thu 3/1 5pm in Gates415. CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Exponential and Logarithmic Functions

Kasetsart University Workshop. Mathematical modeling using calculus & differential equations concepts

Analytically tractable processes on networks

The Spreading of Epidemics in Complex Networks

Simulating stochastic epidemics

Exact solution of site and bond percolation. on small-world networks. Abstract

process on the hierarchical group

Cascading Behavior in Networks: Algorithmic and Economic Issues

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

KINETICS OF SOCIAL CONTAGION. János Kertész Central European University. SNU, June

arxiv: v1 [cs.si] 14 Dec 2018

Epidemics on networks

Transmission in finite populations

On the Spread of Epidemics in a Closed Heterogeneous Population

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Topic 2: Review of Probability Theory

The topics in this section concern with the first course objective.

1. Is the set {f a,b (x) = ax + b a Q and b Q} of all linear functions with rational coefficients countable or uncountable?

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

New Journal of Physics

Network Science & Telecommunications

Limited Growth (Logistic Equation)

Lecture 10. Under Attack!

Announcements. Topics: Work On: - sec0ons 1.2 and 1.3 * Read these sec0ons and study solved examples in your textbook!

MATH 56A: STOCHASTIC PROCESSES CHAPTER 0

Statistical NLP for the Web

Slides based on those in:

Project 1 Modeling of Epidemics

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

Transcription:

Announcements: Please fill HW Survey Weekend Office Hours starting this weekend (Hangout only) Proposal: Can use 1 late period CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 So far: Decision Based Models Utility based Deterministic Node centric: A node observes decisions of its neighbors and makes its own decision Require us to know too much about the data Next: Probabilistic Models Lets you do things by observing data We lose why people do things

Simple probabilistic model of cascades where we will learn about the reproductive number

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4 Epidemic Model based on Random Trees (a variant of a branching processes) A patient meets d other people With probability q > 0 she infects each of them Q: For which values of d and q does the epidemic run forever? Run forever: At least 1infected lim P é ù > h ê node at depth h ú ë û Root node, patient 0 Start of epidemic Die out: -- -- = 0 0 d subtrees

p h = prob. there is an infected node at depth h We need: lim ( * p ( =? (based on q and d) We are reasoning about a behavior at the root of the tree. Once we get a level out, we are left with identical problem of depth h-1. Need recurrence for p h p ( = 1 1 q p (34 5 No infected node at depth h from the root lim h * p h = result of iterating f x = 1 1 q x 5 Starting at the root: x = 1 (since p 4 = 1) d subtrees We iterate: x 1 =f(1) x 2 =f(x 1 ) x 3 =f(x 2 ) 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6 f(x) Fixed point: f(x)=x This means that prob. there is an infected node at depth h is constant (>0) Going to the first fixed point x 1 y=x=1 y = f x x prob. there is an infected node at level h-1. We start at x=1 because p 1 =1. f(x) prob. there is an infected node at level h q infection prob. d degree We iterate: x 1 =f(1) x 2 =f(x 1 ) x 3 =f(x 2 ) If we want to epidemic to die out, then iterating f(x) must go to zero. So, f(x) must be below y=x

f(x) Going to the first fixed point y=x=1 y = f x x prob. there is an infected node at level h-1 f(x) prob. there is an infected node at level h q infection prob. d degree What do we know about the shape of f(x)? f 0 = 0 f 1 = 1 1 q 5 < 1 f @ x = q d 1 qx 534 f @ 0 = q d 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7 x 1 f(x) is monotone: If g (y)>0 for all y then g(y) is monotone. In our case, 0 x,q 1, d>1 so f (x)>0 so f(x) is monotone. f (x) non-increasing: since term (1-qx) d-1 in f (x) is decreasing as x decreases. f (x) is monotone non-increasing on [0,1]!

f(x) y=x y = f x Reproductive number R 0 = q d: There is an epidemic if R 0 ³ 1 For the epidemic to die out we need f(x) to be below y=x! So: f @ 0 = q d < 1 lim p ( = 0 when q d < 1 ( * q d = expected # of people that get infected 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8 1 x

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9 Reproductive number R 0 = q d: There is an epidemic if R 0 1 Only R 0 matters: R 0 1: epidemic never dies and the number of infected people increases exponentially R 0 < 1: Epidemic dies out exponentially quickly

We will learn about the epidemic threshold

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11 Virus Propagation: 2 Parameters: (Virus) Birth rate β: probability than an infected neighbor attacks (Virus) Death rate δ: Probability that an infected node heals Prob. β Prob. δ N 2 Healthy N 1 Infected N N 3

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12 General scheme for epidemic models: Each node can go through phases: Transition probs. are governed by the model parameters S susceptible E exposed I infected R recovered Z immune

SIR model: Node goes through phases β δ Susceptible Infected Recovered Models chickenpox or plague: Once you heal, you can never get infected again Assuming perfect mixing (The network is a complete graph) the S(t) model dynamics is: ds dt = βsi di = βsi δi dt dr dt = δi time 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13 Number of nodes I(t) R(t)

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14 Susceptible-Infective-Susceptible (SIS) model Cured nodes immediately become susceptible Virus strength : s = β / δ Node state transition diagram: Infected by neighbor with prob. β Susceptible Cured with prob. δ Infective

Number of nodes Susceptible time Infected 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu I(t) S(t) Models flu: Susceptible node becomes infected The node then heals and become susceptible again Assuming perfect mixing (complete graph): ds dt di dt = -b SI +di = bsi -di 15

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16 SIS Model: Epidemic threshold of an arbitrary graph G is τ, such that: If virus strength s = β / δ < τ the epidemic can not happen (it eventually dies out) Given a graph what is its epidemic threshold?

[Wang et al. 2003] 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17 Fact: We have no epidemic if: (Virus) Death rate Epidemic threshold β/δ < τ = 1/ λ 1,A (Virus) Birth rate largest eigenvalue of adj. matrix A of G λ 1,G alone captures the property of the graph!

[Wang et al. 2003] 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18 Number of Infected Nodes 500 400 300 200 100 Oregon β = 0.001 10,900 nodes and 31,180 edges s=β/δ > τ (above threshold) s=β/δ = τ (at the threshold) 0 0 250 500 750 1000 Time δ: 0.05 0.06 0.07 s=β/δ < τ (below threshold)

Does it matter how many people are initially infected? 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

[Gomes et al., Assessing the International Spreading Risk Associated with the 2014 West African Ebola Outbreak, PLOS Current Outbreaks, 2014] 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20

[Gomes et al., 2014] Read an article about how to estimate R 0 of ebola. 10/19/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21

[Gomes et al., 2014] 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24 Initially some nodes S are active Each edge (u,v) has probability (weight) p uv b 0.4 0.2 0.4 0.3 c f a 0.3 0.3 0.4 g 0.4 0.2 0.2 0.4 e 0.3 d 0.3 0.2 i 0.3 h When node u becomes active/infected: It activates each out-neighbor v with prob. p uv Activations spread through the network!

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25 Independent cascade model is simple but requires many parameters! Estimating them from data is very hard [Goyal et al. 2010] Solution: Make all edges have the same weight (which brings us back to the SIR model) 0.4 Simple, but too simple Can we do something better? b c 0.4 0.2 0.3 f a 0.3 0.4 0.3 g 0.2 0.4 0.2 0.4 e 0.3 d 0.3 0.2 i 0.3 h

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu [KDD 12] 26 From exposures to adoptions Exposure: Node s neighbor exposes the node to the contagion Adoption: The node acts on the contagion

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 27 Exposure curve: Probability of adopting new behavior depends on the total number of friends who have already adopted What s the dependence? adopters Prob. of adoption k = number of friends adopting Diminishing returns: Viruses, Information Prob. of adoption k = number of friends adopting Critical mass: Decision making

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu [KDD 12] 28 From exposures to adoptions Exposure: Node s neighbor exposes the node to information Adoption: The node acts on the information Adoption curve: Prob(Infection) Probability of infection ever increases Nodes build resistance # exposures

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29 Marketing agency would like you to adopt/buy product X They estimate the adoption curve Should they expose you to X three times? Or, is it better to expose you X, then Y and then X again? 3

[Leskovec et al., TWEB 07] 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30 Senders and followers of recommendations receive discounts on products 10% credit 10% off Data: Incentivized Viral Marketing program 16 million recommendations 4 million people, 500k products

10/18/17 [Leskovec et al., TWEB 07] Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31 Probability of purchasing 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 10 20 30 40 # recommendations received DVD recommendations (8.2 million observations)

10/18/17 [Backstrom et al. KDD 06] Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 32 Group memberships spread over the network: Red circles represent existing group members Yellow squares may join Question: How does prob. of joining a group depend on the number of friends already in the group?

[Backstrom et al., KDD 06] 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 33 LiveJournal group membership Prob. of joining k (number of friends in the group)

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 34 Twitter [Romero et al. 11] Aug 09 to Jan 10, 3B tweets, 60M users Avg. exposure curve for the top 500 hashtags What are the most important aspects of the shape of exposure curves? Curve reaches peak fast, decreases after!

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 35 Persistence of P is the ratio of the area under the curve P and the area of the rectangle of height max(p), width max(d(p)) D(P) is the domain of P Persistence measures the decay of exposure curves Stickiness of P is max(p) Stickiness is the probability of usage at the most effective exposure

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 36 Manually identify 8 broad categories with at least 20 HTs in each Persistence True Rnd. subset Idioms and Music have lower persistence than that of a random subset of hashtags of the same size Politics and Sports have higher persistence than that of a random subset of hashtags of the same size

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 37 Technology and Movies have lower stickiness than that of a random subset of hashtags Music has higher stickiness than that of a random subset of hashtags (of the same size)

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 39 So far we considered pieces of information as independently propagating. Do pieces of information interact? Did 1st cat video decrease adoption probability of 2nd cat video? ` Did cat videos increase adoption probability of dog video?

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 40 Goal: Model interaction between many pieces of information Some pieces of information may help each other in adoption Other may compete for attention

c0 c2 c1 c3 Neighbors The User P(adopt P(adopt c3 c2 P(adopt exposed c1 exposed c0) to to c2 to c1, c1, c0), c0) c0) 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 41

Time 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 42 You are reading posts on Twitter: You examine posts one by one Currently you are examining X How does your probability of reposting X depend on what you have seen in the past? Contagions adopted by neighbors (X is exposed by them in order): Y2 Y1 X c5 c4 c3 c2 Adopt?

Time 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 43 We assume K most recent exposures effect a user s adoption: P(adopt X=c 0 exposed Y 1 =c 1, Y 2 =c 2,..., Y K =c k ) Contagion the user is viewing now. Contagions adopted by neighbors: Contagions the user previously viewed. Y1 Y2 X c5 c4 c3 c2 c1 Adopt?

Time 10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 44 We assume K most recent exposures effect a user s adoption: P(adopt X=c 0 exposed Y 1 =c 1, Y 2 =c 2,..., Y K =c k ) Contagion the user is viewing now. Contagions adopted by neighbors: Contagions the user previously viewed. Y1 Y2 X c5 c4 c3 c2 c1 c0 Adopt?

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 45 We want to estimate: P(X Y 1, Y 5 ) What s the problem? What s the size of probability table P(X Y 1, Y 5 )? = (Num. Contagions) 5 Simplification: Assume Y i is independent of Y j P X Y 1,, Y K = 1.9x10 21 K 1 P X K31 X P k(x Y k ) k[1 We apply Bayes theorem twice and use the independence assumption

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 46 Goal: Model P(adopt X Y 1,, Y K ) Assume: Next, assume topics : Prior infection prob. Interaction term k Δ cont u i, u j models the change in infection prob. of u j given that exposure k-steps ago was u j We estimate P(X) and Δ (k) (u i, u j ) by simply counting P(X) fraction of people exposed to X that got infected by X Δ (k) (u i, u j ) P(X) fraction of people first exposed to u i and then to u j and then got infected by u j.

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 47 Data from Twitter Complete data from Jan 2011: 3 billion tweets All URLs tweeted by at least 50 users: 191k Task: Predict whether a user will post URL X What do we learn from the model?

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 48 How P(post u 2 exp. u 1 ) changes if u 2 and u 1 are similar/different in the content? LCS (low content similarity), HCS (high content similarity) u 1 is highly viral? Prob. of infection P(u): Observations: If u 1 is not viral, this boost u 2 If u 1 is highly viral, this kills u 2 BUT: Only if u 1 and u 2 are of low content similarity (LCS) else, u 1 helps u 2 Relative change in infection prob.

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 49 Modeling contagion interactions 71% of the adoption probability comes from the topic interactions!

10/18/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 50 R 0 : Epidemics die out if R 0 <1 R 0 : reproductive number Epidemic Threshold: Virus strength s = β / δ < τ the epidemic can not happen (it eventually dies out) Shape of the adoption curve: Modeling interactions between contagions