CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
|
|
- Daniel Long
- 5 years ago
- Views:
Transcription
1 CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
2 Intro sessions to SNAP C++ and SNAP.PY: SNAP.PY: Friday 9/27, 4:5 5:30pm in Gates B03 SNAP C++: Thursday 0/3, 4:5 5:30pm in Gates B03 Sessions will be recorded and available via SCPD About the software libraries: TAs support SNAP C++ (Justin, Bell), SNAP.PY (Christie, Yoni) You can use other libraries: NetworkX, JUNG, Boost, R They will do the job but we don t offer support for them Start early on HW0 since these packages are new to you, complex and non trivial to use! Review of: Probability: Friday, 0/4, 4:5 5:30pm in Gates B03 Linear algebra: Tuesday, 0/8, 2:5 3:30pm,Gates B03 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 2
3 Observations Models Algorithms Small diameter, Edge clustering Patterns of signed edge creation Viral Marketing, Blogosphere, Memetracking Scale Free Densification power law, Shrinking diameters Strength of weak ties, Core periphery Erdös Renyi model, Small world model Structural balance, Theory of status Independent cascade model, Game theoretic model Preferential attachment, Copying model Microscopic model of evolving networks Kronecker Graphs Decentralized search Models for predicting edge signs Influence maximization, Outbreak detection, LIM PageRank, Hubs and authorities Link prediction, Supervised random walks Community detection: Girvan Newman, Modularity 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 3
4 For example, last time we talked about Observations and Models for the Web graph: ) We took a real system: the Web 2) We represented it as a directed graph 3) We used the language of graph theory Strongly Connected Components 4) We designed a computational experiment: Find In and Out components of a given node v 5) We learned something about the structure of the Web: BOWTIE! v Out(v) 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 4
5 Undirected graphs Links: undirected (symmetrical, reciprocal relations) A L F M Directed graphs Links: directed (asymmetrical relations) B C D I D B G E G A H C F Undirected links: Directed links: Collaborations Phone calls Friendship on Facebook Following on Twitter 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 5
6 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 6 A ij = if there is a link from node i to node j A ij = 0 otherwise A A Note that for a directed graph (right) the matrix is not symmetric
7 Undirected Directed A B G A C Source: node with k in = 0 Sink: node with k out = 0 D F E Node degree, k i : the number of edges adjacent to node i k A 4 Avg. degree: In directed networks we define an in degree and out degree. The (total) degree of a node is the sum of in and out degrees. out k 2 k 3 in C k 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 7 E N C k k N k in k C N k i 2E i N k out
8 The maximum number of edges in an undirected graph on N nodes is E max N 2 N ( N 2 ) A graph with the number of edges E = E max is a complete graph, and its average degree is N- 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 8
9 Most real world networks are sparse E << E max (or k << N-) WWW (Stanford Berkeley): N=39,77 k=9.65 Social networks (LinkedIn): N=6,946,668 k=8.87 Communication (MSN IM): N=242,720,596 k=. Coauthorships (DBLP): N=37,080 k=6.62 Internet (AS Skitter): N=,79,037 k=4.9 Roads (California): N=,957,027 k=2.82 Proteins (S. Cerevisiae): N=,870 k=2.39 (Source: Leskovec et al., Internet Mathematics, 2009) Consequence: Adjacency matrix is filled with zeros! (Density of the matrix (E/N 2 ): WWW=.50-5, MSN IM = ) 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 9
10 Unweighted (undirected) 4 Weighted (undirected) 4 A ij E 2 A ii 0 N i, j A ij A ij A ji k 2E N Examples: Friendship, Hyperlink 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 0 E A ij N A i, j ii 0 nonzero( A ij ) Aij Aji 2E k N Examples: Collaboration, Internet, Roads
11 Self edges (self loops) (undirected) 4 Multigraph (undirected) 4 E A ij A ii N 0 A N ij i, j, i j i A ii A Examples: Proteins, Hyperlink ij? A ji nonzero( A Aij Aji 2E k N 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, E A ij N A i, j ii 0 Examples: Communication, Collaboration ij )
12 WWW > Facebook friendships > Citation networks > Collaboration networks > Mobile phone calls > Protein Interactions > > directed multigraph with self edges > undirected, unweighted > unweighted, directed, acyclic > undirected multigraph or weighted graph > directed, (weighted?) multigraph > undirected, unweighted with self interactions 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 2
13 Bipartite graph is a graph whose nodes can be divided into two disjoint sets U and V such that every link connects a node in U to one in V; that is, U and V are independent sets A B C Examples: Authors to papers (they authored) Actors to Movies (they appeared in) Users to Movies (they rated) Folded networks: Author collaboration networks Movie co rating networks 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 3 D E U E A B C V D Folded version of the graph above
14
15 Degree distribution P(k): Probability that a randomly chosen node has degree k N k = # nodes with degree k Normalized histogram: P(k) = N k / N plot P(k) k N k 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 5 k
16 A path is a sequence of nodes in which each node is linked to the next one P n {i 0,i,i 2,...,i n } P n {(i 0,i ),(i,i 2 ),(i 2,i 3 ),...,(i n,i n )} Path can intersect itself and pass through the same edge multiple times E.g.: ACBDCDEG In a directed graph a path can only follow the direction of the arrow A C B D E H F G 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 6
17 Number of paths between nodes u and v : Length h=: If there is a link between u and v, A uv = else A uv =0 Length h=2: If there is a path of length two between u and v then A uk A kv = else A uk A kv =0 N ( 2 ) 2 H uv A uk A kv [ A ] k Length h: If there is a path of length h between u and v then A uk... A kv = else A uk... A kv =0 So, the no. of paths of length h between u and v is ( h ) h H uv [ A ] uv (holds for both directed and undirected graphs) 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 7 uv
18 B A C D Distance (shortest path, geodesic) between a pair of nodes is defined as the number of edges along the shortest path connecting the nodes h B,D = 2 A D *If the two nodes are disconnected, the distance is usually defined as infinite In directed graphs paths need to follow the direction of the arrows C B h B,C =, h C,B = 2 Consequence: Distance is not symmetric: h A,C h C, A 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 8
19 Diameter: the maximum (shortest path) distance between any pair of nodes in a graph Average path length for a connected graph (component) or a strongly connected (component of a) directed graph h 2E max i, ji h ij where h ij is the distance from node i to node j Many times we compute the average only over the connected pairs of nodes (we ignore infinite length paths) 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 9
20 Breath First Search: Start with node u, mark it to be at distance h u (u)=0, add u to the queue While the queue not empty: Take node v off the queue, put its unmarked neighbors w into the queue and mark h u (w)=h u (v) u /25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 20
21 Clustering coefficient: What portion of i s neighbors are connected? Node i with degree k i C i [0,] where e i is the number of edges between the neighbors of node i i i i C i =0 C i =/3 C i = Average Clustering Coefficient: C N N i C i 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 2
22 Clustering coefficient: What portion of i s neighbors are connected? Node i with degree k i where e i is the number of edges between the neighbors of node i A C B D E F G k B =2, e B =, C B =2/2 = k D =4, e D =2, C D =4/2 = /3 H 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 22
23 Degree distribution: Path length: Clustering coefficient: P(k) h C 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 23
24
25 MSN Messenger activity in June 2006: 50Gb/day (compressed) 4.5Tb / month 245 million users logged in 80 million users engaged in conversations More than 30 billion conversations More than 255 billion exchanged messages 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 25
26 26
27 Network: 80M people,.3b edges 27
28 Buddy Conversation Communication graph Edge (u,v) if users u and v exchanged at least msg N=80 million people E=.3 billion edges 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 28
29 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 29
30 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 30
31 We plot the same data as on the previous slide, just the axes are now logarithmic. 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 3
32 Avg. clustering of the MSN: C = 0.40 C k : average C i of nodes i of degree k: C k i 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 32 N k i: k k C i
33 Steps #Nodes ,96 4 8,648 Number of links between pairs of nodes Avg. path length % of the people can be reached in < 8 hops # nodes as we do BFS out of a random node 5 3,299, ,395, ,059, ,995, ,32,008 0,955,007 58, , ,66 4 3, ,476 6, /25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,
34 Degree distribution: heavily skewed avg. degree= 4.4 Path length: 6.6 Clustering coefficient: 0. Are these metrics expected? Are they surprising? To answer this we need a null model! 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 34
35 P(k) = (k-4) C = ½ Path length: k i =4 for all nodes h max N 2 The average shortest path length: all as N O(N) h O(N) So, we have: Constant degree, Constant avg. clustering coeff. Linear avg. path length Note about calculations: We are interested in quantities as graphs get large (N ) We will use big-o: f(x) = O(g(x)) as x if f(x) < g(x)*c for all x > x 0 and some constant c. 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 35
36 P(k) = (k-6) k =6 for each inside node C = 6/5 for inside nodes Path length: h max O( N ) In general, for lattices: Average path length is / D (D lattice dimensionality) Constant degree, constant clustering coefficient h 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 36 N
37
38 Erdös Renyi Random Graphs [Erdös Renyi, 60] Two variants: G n,p : undirected graph on n nodes and each edge (u,v) appears i.i.d. with probability p G n,m : undirected graph with n nodes, and m uniformly at random picked edges What kinds of networks does such model produce? 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 38
39 n and p do not uniquely determine the graph! The graph is a result of a random process We can have many different realizations given the same n and p n = 0 p= /6 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 39
40 How likely is a graph on E edges? P(E): the probability that a given G np generates a graph on exactly E edges: P( E) E max E p E ( p) E where E max =n(n-)/2 is the maximum possible number of edges in an undirected graph of n nodes P(E) is exactly the Binomial distribution >>> Number of successes in a sequence of n independent yes/no experiments E max 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 40
41 What is expected degree of a node? Let X v be a rnd. var. measuring the degree of node v n We want to know: E[ X ] j P ( X j) j 0 For the calculation we will need: Linearity of expectation For any random variables Y,Y 2,,Y k If Y=Y +Y 2 + Y k, then E[Y]= i E[Y i ] An easier way: Decompose X v to X v = X v, +X v,2 + +X v,n- where X v,u is a {0,} random variable which tells if edge (v,u) exists or not E[X v ] n u E[X vu ] (n )p v How to think about this? Prob. of node u linking to node v is p u can link (flips a coin) to all other (n-) nodes Thus, the expected degree of node u is: p(n-) 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 4 v
42 Degree distribution: Path length: Clustering coefficient: P(k) h C What are values of these properties for G np? 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 42
43 Fact: Degree distribution of G np is Binomial. Let P(k) denote a fraction of nodes with degree k: P( k ) Select k nodes out of n- n k p k ( Probability of having k edges p) n k Probability of missing the rest of the n--k edges P(k) k Mean, variance of a binomial distribution k p( n ) 2 p( p)(n ) k p p (n ) /2 (n ) /2 As the network size increases, the distribution becomes increasingly narrow we are increasingly confident that the degree of a node is in the vicinity of k. 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 43
44 2ei Remember: Ci ki ( ki ) Edges in G np appear i.i.d with prob. p So: Then: e i p k i(k i ) Each pair is connected with prob. p 2 Number of distinct pairs of neighbors of node i of degree k i C pk i(k i ) k i (k i ) p k N Clustering coefficient of a random graph is small. For a fixed avg. degree, C decreases with the graph size N. Where e i is the number of edges between i s neighbors 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 44
45 Degree distribution: n P( k) p k k ( p) n k Clustering coefficient: C=p=k/n Path length: next! 0/4/20 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 45
46 To prove the diameter of a G np we define few concepts Random k Regular graph: Assume each node has k spokes (half edges) k=: Graph is a set of pairs k=2: Graph is a set of cycles k=3: Arbitrarily complicated graphs Randomly pair them up! 0/4/20 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 46
47 Graph G(V, E) has expansion α: if S V: # of edges leaving S α min( S, V\S ) Or equivalently: min S V # edges leaving min( S, V \ S S ) S V \ S 0/4/20 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 47
48 S nodes α S edges S nodes α S edges min S V # edges leaving min( S, V \ S (A big) graph with good expansion S ) 0/4/20 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 48
49 Expansion is measure of robustness: To disconnect l nodes, we need to cut α l edges Low expansion: High expansion: Social networks: Communities min SV # edges leaving S min( S, V \ S ) 0/4/20 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 49
50 k regular graph (every node has degree k): Expansion is at most k (when S is a single node) Is there a graph on n nodes (n), of fixed max deg. k, so that expansion α remains const? Examples: nngrid:k=4: α =2n/(n 2 /4)0 (S=n/2 n/2 square in the center) S min SV # edges leaving S min( S, V \ S ) Complete binary tree: α 0 for S =(n/2)- S Fact: For a random 3 regular graph on n nodes, there is some const α (α >0, independent. of n) such that w.h.p. the expansion of the graph is α 0/4/20 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 50
51 Fact: In a graph on n nodes with expansion α for all pairs of nodes s and t there is a path of O((log n) / α) edges connecting them. Proof: Proof strategy: We want to show that from any node s there is a path of length O((log n)/α) to any other node t Let S j be a set of all nodes S 2 found within j steps of BFS from s. How does S j increase as a function of j? S 0 S s 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 5
52 Proof (continued): Let S j be a set of all nodes found within j steps of BFS from s. We want to relate S j and S j+ S j S j Expansion S k j At most k edges collide at a node S 0 S S 2 S j nodes s S j+ nodes S j S j k k j At least α S j edges Each of degree k 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 52
53 Proof (continued): In how many steps of BFS we reach >n/2 nodes? j Need j so that: S j Let s set: Then: k log k log 2 j log 2 2 k 2 n n In 2k/α log n steps S j grows to Θ(n). So, the diameter of G is O(log(n)/ α) n n k n 2 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 53 n 2 In j steps, we reach >n/2 nodes s In log(n) steps, we reach >n/2 nodes In j steps, we reach >n/2 nodes In log(n) steps, we reach >n/2 nodes Claim: k log 2 n 2 k t log Remember n>0, α k then: if k : k if 0 then x : and x e x log log 2 lim n x 2 n e log Diameter = 2 j Diameter = 2 log(n) 2 2 n 2 n log 2 n 2 x log x 2 n
54 Degree distribution: n P( k) p k k ( p) n k Path length: O(log n) Clustering coefficient: C = p = k / n 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 54
55 MSN G np Degree distribution: Path length: 6.6 O(log n) h 8.2 Clustering coefficient: 0. k / n C /25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 55
56 Are real networks like random graphs? Giant connected component: Average path length: Clustering Coefficient: Degree Distribution: Problems with the random network model: Degreed distribution differs from that of real networks Giant component in most real network does NOT emerge through a phase transition No local structure clustering coefficient is too low Most important: Are real networks random? The answer is simply: NO! 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 56
57 If G np is wrong, why did we spend time on it? It is the reference model for the rest of the class. It will help us calculate many quantities, that can then be compared to the real data It will help us understand to what degree is a particular property the result of some random process So, while G np is WRONG, it will turn out to be extremly USEFUL! 9/25/203 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 57
58 What happens to G np when we vary p?
59 Remember, expected degree We want E[X v ] be independent of n So let: p=c/(n-) Observation: If we build random graph G np with p=c/(n-) we have many isolated nodes Why? 0/4/20 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 59 c n n n e n c p v P ) ( 0] degree has [ c c x x c x n n e x x n c lim lim n c x Use substitution e x x x e lim By definition: p n X E v ) ( ] [
60 How big do we have to make p before we are likely to have no isolated nodes? We know: P[v has degree 0] = e -c Event we are asking about is: I = some node is isolated I where I v is the event that v is isolated vn I v We have: P I P I v P Iv vn vn 0/4/20 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 60 ne c Union bound A i A i i A i i
61 We just learned: P(I) = n e -c Let s try: c = ln n then: n e -c = n e -ln n =n/n= c= 2 ln n then: n e -2 ln n = n/n 2 = /n So if: p= ln n then: P(I) = p= 2 ln n then: P(I) = /n 0 as n 0/4/20 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 6
62 Graph structure of G np as p changes: p /(n-) 0 Giant component Avg. deg const. Fewer isolated No isolated nodes. Empty graph appears c/(n-) Lots of isolated nodes. log(n)/(n-) nodes. 2*log(n)/(n-) Complete graph Emergence of a Giant Component: avg. degree k=2e/n or p=k/(n-) k=-ε: all components are of size Ω(log n) k=+ε: component of size Ω(n), others have size Ω(log n) 0/4/20 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 62
63 Fraction of nodes in the largest component G np, n=00k, p(n-) = /4/20 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 63
CS224W: Analysis of Networks Jure Leskovec, Stanford University
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/30/17 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Non-overlapping vs. overlapping communities 11/10/2010 Jure Leskovec, Stanford CS224W: Social
More information1 Mechanistic and generative models of network structure
1 Mechanistic and generative models of network structure There are many models of network structure, and these largely can be divided into two classes: mechanistic models and generative or probabilistic
More information1 Complex Networks - A Brief Overview
Power-law Degree Distributions 1 Complex Networks - A Brief Overview Complex networks occur in many social, technological and scientific settings. Examples of complex networks include World Wide Web, Internet,
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Find most influential set S of size k: largest expected cascade size f(s) if set S is activated
More informationMini course on Complex Networks
Mini course on Complex Networks Massimo Ostilli 1 1 UFSC, Florianopolis, Brazil September 2017 Dep. de Fisica Organization of The Mini Course Day 1: Basic Topology of Equilibrium Networks Day 2: Percolation
More information6.207/14.15: Networks Lecture 12: Generalized Random Graphs
6.207/14.15: Networks Lecture 12: Generalized Random Graphs 1 Outline Small-world model Growing random networks Power-law degree distributions: Rich-Get-Richer effects Models: Uniform attachment model
More informationMining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationProblem Set 4. General Instructions
CS224W: Analysis of Networks Fall 2017 Problem Set 4 General Instructions Due 11:59pm PDT November 30, 2017 These questions require thought, but do not require long answers. Please be as concise as possible.
More informationAdventures in random graphs: Models, structures and algorithms
BCAM January 2011 1 Adventures in random graphs: Models, structures and algorithms Armand M. Makowski ECE & ISR/HyNet University of Maryland at College Park armand@isr.umd.edu BCAM January 2011 2 Complex
More informationWeb Structure Mining Nodes, Links and Influence
Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.
More informationOverlapping Communities
Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH
More informationCS 224w: Problem Set 1
CS 224w: Problem Set 1 Tony Hyun Kim October 8, 213 1 Fighting Reticulovirus avarum 1.1 Set of nodes that will be infected We are assuming that once R. avarum infects a host, it always infects all of the
More informationDegree Distribution: The case of Citation Networks
Network Analysis Degree Distribution: The case of Citation Networks Papers (in almost all fields) refer to works done earlier on same/related topics Citations A network can be defined as Each node is
More informationLink Prediction. Eman Badr Mohammed Saquib Akmal Khan
Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling
More informationNetwork Analysis and Modeling
lecture 0: what are networks and how do we talk about them? 2017 Aaron Clauset 003 052 002 001 Aaron Clauset @aaronclauset Assistant Professor of Computer Science University of Colorado Boulder External
More informationBasics and Random Graphs con0nued
Basics and Random Graphs con0nued Social and Technological Networks Rik Sarkar University of Edinburgh, 2017. Random graphs on jupyter notebook Solu0on to exercises 1 is out If your BSc/MSc/PhD work is
More information6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search
6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)
More informationDS504/CS586: Big Data Analytics Graph Mining II
Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6-8:50PM Thursday Location: AK233 Spring 2018 v Course Project I has been graded. Grading was based on v 1. Project report
More informationNetwork Observational Methods and. Quantitative Metrics: II
Network Observational Methods and Whitney topics Quantitative Metrics: II Community structure (some done already in Constraints - I) The Zachary Karate club story Degree correlation Calculating degree
More information6.207/14.15: Networks Lecture 4: Erdös-Renyi Graphs and Phase Transitions
6.207/14.15: Networks Lecture 4: Erdös-Renyi Graphs and Phase Transitions Daron Acemoglu and Asu Ozdaglar MIT September 21, 2009 1 Outline Phase transitions Connectivity threshold Emergence and size of
More informationModelling self-organizing networks
Paweł Department of Mathematics, Ryerson University, Toronto, ON Cargese Fall School on Random Graphs (September 2015) Outline 1 Introduction 2 Spatial Preferred Attachment (SPA) Model 3 Future work Multidisciplinary
More informationDS504/CS586: Big Data Analytics Graph Mining II
Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6:00pm 8:50pm Mon. and Wed. Location: SL105 Spring 2016 Reading assignments We will increase the bar a little bit Please
More information6.207/14.15: Networks Lecture 3: Erdös-Renyi graphs and Branching processes
6.207/14.15: Networks Lecture 3: Erdös-Renyi graphs and Branching processes Daron Acemoglu and Asu Ozdaglar MIT September 16, 2009 1 Outline Erdös-Renyi random graph model Branching processes Phase transitions
More informationNetwork models: random graphs
Network models: random graphs Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis
More informationOverlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach
Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Author: Jaewon Yang, Jure Leskovec 1 1 Venue: WSDM 2013 Presenter: Yupeng Gu 1 Stanford University 1 Background Community
More informationCS345a: Data Mining Jure Leskovec and Anand Rajaraman Stanford University
CS345a: Data Mining Jure Leskovec and Anand Rajaraman Stanford University TheFind.com Large set of products (~6GB compressed) For each product A=ributes Related products Craigslist About 3 weeks of data
More informationAlmost giant clusters for percolation on large trees
for percolation on large trees Institut für Mathematik Universität Zürich Erdős-Rényi random graph model in supercritical regime G n = complete graph with n vertices Bond percolation with parameter p(n)
More informationComplex networks: an introduction
Alain Barrat Complex networks: an introduction CPT, Marseille, France ISI, Turin, Italy http://www.cpt.univ-mrs.fr/~barrat http://cxnets.googlepages.com Plan of the lecture I. INTRODUCTION II. I. Networks:
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart
More informationSusceptible-Infective-Removed Epidemics and Erdős-Rényi random
Susceptible-Infective-Removed Epidemics and Erdős-Rényi random graphs MSR-Inria Joint Centre October 13, 2015 SIR epidemics: the Reed-Frost model Individuals i [n] when infected, attempt to infect all
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: Find coalitions in signed networks Incentives: European
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University.
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive
More information1 Adjacency matrix and eigenvalues
CSC 5170: Theory of Computational Complexity Lecture 7 The Chinese University of Hong Kong 1 March 2010 Our objective of study today is the random walk algorithm for deciding if two vertices in an undirected
More informationA = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1
Lecture I jacques@ucsd.edu Notation: Throughout, P denotes probability and E denotes expectation. Denote (X) (r) = X(X 1)... (X r + 1) and let G n,p denote the Erdős-Rényi model of random graphs. 10 Random
More informationLecture 6. Statistical Processes. Irreversibility. Counting and Probability. Microstates and Macrostates. The Meaning of Equilibrium Ω(m) 9 spins
Lecture 6 Statistical Processes Irreversibility Counting and Probability Microstates and Macrostates The Meaning of Equilibrium Ω(m) 9 spins -9-7 -5-3 -1 1 3 5 7 m 9 Lecture 6, p. 1 Irreversibility Have
More informationModularity in several random graph models
Modularity in several random graph models Liudmila Ostroumova Prokhorenkova 1,3 Advanced Combinatorics and Network Applications Lab Moscow Institute of Physics and Technology Moscow, Russia Pawe l Pra
More informationHW 3: Heavy-tails and Small Worlds. 1 When monkeys type... [20 points, collaboration allowed]
CS/EE/CMS 144 Assigned: 01/18/18 HW 3: Heavy-tails and Small Worlds Guru: Yu Su/Rachael Due: 01/25/18 by 10:30am We encourage you to discuss collaboration problems with others, but you need to write up
More informationAlgebraic Representation of Networks
Algebraic Representation of Networks 0 1 2 1 1 0 0 1 2 0 0 1 1 1 1 1 Hiroki Sayama sayama@binghamton.edu Describing networks with matrices (1) Adjacency matrix A matrix with rows and columns labeled by
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering
More informationNetwork Science (overview, part 1)
Network Science (overview, part 1) Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California rgera@nps.edu Excellence Through Knowledge Overview Current research Section 1:
More informationarxiv:physics/ v3 [physics.soc-ph] 28 Jan 2007
arxiv:physics/0603229v3 [physics.soc-ph] 28 Jan 2007 Graph Evolution: Densification and Shrinking Diameters Jure Leskovec School of Computer Science, Carnegie Mellon University, Pittsburgh, PA Jon Kleinberg
More informationNetwork models: dynamical growth and small world
Network models: dynamical growth and small world Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics
More informationErdős-Renyi random graphs basics
Erdős-Renyi random graphs basics Nathanaël Berestycki U.B.C. - class on percolation We take n vertices and a number p = p(n) with < p < 1. Let G(n, p(n)) be the graph such that there is an edge between
More informationProject in Computational Game Theory: Communities in Social Networks
Project in Computational Game Theory: Communities in Social Networks Eldad Rubinstein November 11, 2012 1 Presentation of the Original Paper 1.1 Introduction In this section I present the article [1].
More informationCSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68
CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 References 1 L. Freeman, Centrality in Social Networks: Conceptual Clarification, Social Networks, Vol. 1, 1978/1979, pp. 215 239. 2 S. Wasserman
More informationCS224W: Analysis of Networks Jure Leskovec, Stanford University
Announcements: Please fill HW Survey Weekend Office Hours starting this weekend (Hangout only) Proposal: Can use 1 late period CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu
More informationLecture 2: Rumor Spreading (2)
Alea Meeting, Munich, February 2016 Lecture 2: Rumor Spreading (2) Benjamin Doerr, LIX, École Polytechnique, Paris-Saclay Outline: Reminder last lecture and an answer to Kosta s question Definition: Rumor
More informationCS341 info session is on Thu 3/1 5pm in Gates415. CS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS341 info session is on Thu 3/1 5pm in Gates415 CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/28/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets,
More informationSlide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.
Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org #1: C4.5 Decision Tree - Classification (61 votes) #2: K-Means - Clustering
More informationComplex (Biological) Networks
Complex (Biological) Networks Today: Measuring Network Topology Thursday: Analyzing Metabolic Networks Elhanan Borenstein Some slides are based on slides from courses given by Roded Sharan and Tomer Shlomi
More informationCS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine
CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,
More informationCS 350 Algorithms and Complexity
CS 350 Algorithms and Complexity Winter 2019 Lecture 15: Limitations of Algorithmic Power Introduction to complexity theory Andrew P. Black Department of Computer Science Portland State University Lower
More informationA New Random Graph Model with Self-Optimizing Nodes: Connectivity and Diameter
A New Random Graph Model with Self-Optimizing Nodes: Connectivity and Diameter Richard J. La and Maya Kabkab Abstract We introduce a new random graph model. In our model, n, n 2, vertices choose a subset
More informationCS 350 Algorithms and Complexity
1 CS 350 Algorithms and Complexity Fall 2015 Lecture 15: Limitations of Algorithmic Power Introduction to complexity theory Andrew P. Black Department of Computer Science Portland State University Lower
More informationDATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS
DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information
More informationLecture 13: Spectral Graph Theory
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationChaos, Complexity, and Inference (36-462)
Chaos, Complexity, and Inference (36-462) Lecture 21: More Networks: Models and Origin Myths Cosma Shalizi 31 March 2009 New Assignment: Implement Butterfly Mode in R Real Agenda: Models of Networks, with
More informationRandom Graphs. 7.1 Introduction
7 Random Graphs 7.1 Introduction The theory of random graphs began in the late 1950s with the seminal paper by Erdös and Rényi [?]. In contrast to percolation theory, which emerged from efforts to model
More information1 Random graph models
1 Random graph models A large part of understanding what structural patterns in a network are interesting depends on having an appropriate reference point by which to distinguish interesting from non-interesting.
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 3 Luca Trevisan August 31, 2017
U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 3 Luca Trevisan August 3, 207 Scribed by Keyhan Vakil Lecture 3 In which we complete the study of Independent Set and Max Cut in G n,p random graphs.
More informationSelf Similar (Scale Free, Power Law) Networks (I)
Self Similar (Scale Free, Power Law) Networks (I) E6083: lecture 4 Prof. Predrag R. Jelenković Dept. of Electrical Engineering Columbia University, NY 10027, USA {predrag}@ee.columbia.edu February 7, 2007
More informationRandom Lifts of Graphs
27th Brazilian Math Colloquium, July 09 Plan of this talk A brief introduction to the probabilistic method. A quick review of expander graphs and their spectrum. Lifts, random lifts and their properties.
More informationChaos, Complexity, and Inference (36-462)
Chaos, Complexity, and Inference (36-462) Lecture 21 Cosma Shalizi 3 April 2008 Models of Networks, with Origin Myths Erdős-Rényi Encore Erdős-Rényi with Node Types Watts-Strogatz Small World Graphs Exponential-Family
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 12/3/2013 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
More informationMore Asymptotic Analysis Spring 2018 Discussion 8: March 6, 2018
CS 61B More Asymptotic Analysis Spring 2018 Discussion 8: March 6, 2018 Here is a review of some formulas that you will find useful when doing asymptotic analysis. ˆ N i=1 i = 1 + 2 + 3 + 4 + + N = N(N+1)
More informationECEN 689 Special Topics in Data Science for Communications Networks
ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs
More information6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities
6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities 1 Outline Outline Dynamical systems. Linear and Non-linear. Convergence. Linear algebra and Lyapunov functions. Markov
More informationComplex (Biological) Networks
Complex (Biological) Networks Today: Measuring Network Topology Thursday: Analyzing Metabolic Networks Elhanan Borenstein Some slides are based on slides from courses given by Roded Sharan and Tomer Shlomi
More informationGraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)
GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018) Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, Jure Leskovec Presented by: Jesse Bettencourt and Harris Chan March 9, 2018 University
More informationAnalysis & Generative Model for Trust Networks
Analysis & Generative Model for Trust Networks Pranav Dandekar Management Science & Engineering Stanford University Stanford, CA 94305 ppd@stanford.edu ABSTRACT Trust Networks are a specific kind of social
More informationComputational Boolean Algebra. Pingqiang Zhou ShanghaiTech University
Computational Boolean Algebra Pingqiang Zhou ShanghaiTech University Announcements Written assignment #1 is out. Due: March 24 th, in class. Programming assignment #1 is out. Due: March 24 th, 11:59PM.
More informationCS 6820 Fall 2014 Lectures, October 3-20, 2014
Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given
More informationNP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness
Lecture 19 NP-Completeness I 19.1 Overview In the past few lectures we have looked at increasingly more expressive problems that we were able to solve using efficient algorithms. In this lecture we introduce
More informationMidterm Exam 1 (Solutions)
EECS 6 Probability and Random Processes University of California, Berkeley: Spring 07 Kannan Ramchandran February 3, 07 Midterm Exam (Solutions) Last name First name SID Name of student on your left: Name
More informationA simple branching process approach to the phase transition in G n,p
A simple branching process approach to the phase transition in G n,p Béla Bollobás Department of Pure Mathematics and Mathematical Statistics Wilberforce Road, Cambridge CB3 0WB, UK b.bollobas@dpmms.cam.ac.uk
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com
More informationTime Complexity (1) CSCI Spring Original Slides were written by Dr. Frederick W Maier. CSCI 2670 Time Complexity (1)
Time Complexity (1) CSCI 2670 Original Slides were written by Dr. Frederick W Maier Spring 2014 Time Complexity So far we ve dealt with determining whether or not a problem is decidable. But even if it
More informationAnd now for something completely different
And now for something completely different David Aldous July 27, 2012 I don t do constants. Rick Durrett. I will describe a topic with the paradoxical features Constants matter it s all about the constants.
More information. p.1. Mathematical Models of the WWW and related networks. Alan Frieze
. p.1 Mathematical Models of the WWW and related networks Alan Frieze The WWW is an example of a large real-world network.. p.2 The WWW is an example of a large real-world network. It grows unpredictably
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/26/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 More algorithms
More informationNetworks: Lectures 9 & 10 Random graphs
Networks: Lectures 9 & 10 Random graphs Heather A Harrington Mathematical Institute University of Oxford HT 2017 What you re in for Week 1: Introduction and basic concepts Week 2: Small worlds Week 3:
More information1 More finite deterministic automata
CS 125 Section #6 Finite automata October 18, 2016 1 More finite deterministic automata Exercise. Consider the following game with two players: Repeatedly flip a coin. On heads, player 1 gets a point.
More informationCommunities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices
Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into
More informationGraph Detection and Estimation Theory
Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and
More informationThanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides
Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank
More informationCentral Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom
Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the
More informationDistributed Systems Gossip Algorithms
Distributed Systems Gossip Algorithms He Sun School of Informatics University of Edinburgh What is Gossip? Gossip algorithms In a gossip algorithm, each node in the network periodically exchanges information
More informationNew Directions in Computer Science
New Directions in Computer Science John Hopcroft Cornell University Time of change The information age is a revolution that is changing all aspects of our lives. Those individuals, institutions, and nations
More informationAditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016
Lecture 1: Introduction and Review We begin with a short introduction to the course, and logistics. We then survey some basics about approximation algorithms and probability. We also introduce some of
More informationOn Defining and Computing Communities
AU Herning Aarhus University martino@hih.au.dk UPF, Barcelona, November 29th 2012 Zacharys Karate Club 25 26 12 5 17 7 11 28 32 6 24 27 29 20 1 22 9 30 34 3 2 13 15 33 4 14 18 16 31 8 21 10 23 19 A community
More informationLecture 14: Random Walks, Local Graph Clustering, Linear Programming
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These
More informationEntropy for Sparse Random Graphs With Vertex-Names
Entropy for Sparse Random Graphs With Vertex-Names David Aldous 11 February 2013 if a problem seems... Research strategy (for old guys like me): do-able = give to Ph.D. student maybe do-able = give to
More informationMidterm Exam 1 Solution
EECS 126 Probability and Random Processes University of California, Berkeley: Fall 2015 Kannan Ramchandran September 22, 2015 Midterm Exam 1 Solution Last name First name SID Name of student on your left:
More informationNetworks as vectors of their motif frequencies and 2-norm distance as a measure of similarity
Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity CS322 Project Writeup Semih Salihoglu Stanford University 353 Serra Street Stanford, CA semih@stanford.edu
More informationCS224W: Social and Information Network Analysis
CS224W: Social and Information Network Analysis Reaction Paper Adithya Rao, Gautam Kumar Parai, Sandeep Sripada Keywords: Self-similar networks, fractality, scale invariance, modularity, Kronecker graphs.
More informationECS 253 / MAE 253, Lecture 15 May 17, I. Probability generating function recap
ECS 253 / MAE 253, Lecture 15 May 17, 2016 I. Probability generating function recap Part I. Ensemble approaches A. Master equations (Random graph evolution, cluster aggregation) B. Network configuration
More informationCS 246 Review of Proof Techniques and Probability 01/14/19
Note: This document has been adapted from a similar review session for CS224W (Autumn 2018). It was originally compiled by Jessica Su, with minor edits by Jayadev Bhaskaran. 1 Proof techniques Here we
More information