Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to ge
|
|
- Britton Robbins
- 6 years ago
- Views:
Transcription
1 Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU Thank you! Sharad Mehrotra UCI 2007 C. Faloutsos 2 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 3 1
2 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the master-mind? Problem#5: Track communities over time UCI 2007 C. Faloutsos 4 Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.) UCI 2007 C. Faloutsos 5 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez 91] Friendship Network [Moody 01] Protein Interactions [genomebiology.com] UCI 2007 C. Faloutsos 6 2
3 Graphs - why should we care? IR: bi-partite graphs (doc-terms) D T 1 web: hyper-text graph D N T M... and more: UCI 2007 C. Faloutsos 7 Graphs - why should we care? network of companies & board-of-directors members viral marketing web-log ( blog ) news propagation computer network security: /ip traffic and anomaly detection... UCI 2007 C. Faloutsos 8 Problem #1 - network and graph mining How does the Internet look like? How does the web look like? What is normal / abnormal? which patterns/laws hold? UCI 2007 C. Faloutsos 9 3
4 Graph mining Are real graphs random? UCI 2007 C. Faloutsos 10 Laws and patterns Are real graphs random? A: NO!! Diameter in- and out- degree distributions other (surprising) patterns UCI 2007 C. Faloutsos 11 Solution#1 Power law in the degree distribution [SIGCOMM99] internet domains att.com log(degree) ibm.com log(rank) UCI 2007 C. Faloutsos 12 4
5 Solution#1 : Eigen Exponent E Eigenvalue Exponent = slope E = May 2001 Rank of decreasing eigenvalue A2: power law in the eigenvalues of the adjacency matrix UCI 2007 C. Faloutsos 13 But: How about graphs from other domains? UCI 2007 C. Faloutsos 14 The Peer-to-Peer Topology [Jovanovic+] Frequency versus degree Number of adjacent peers follows a power-law UCI 2007 C. Faloutsos 15 5
6 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) 100 cited.pdf log(count) log count 10 Ullman log # citations log(#citations) UCI 2007 C. Faloutsos 16 More power laws: web hit counts [w/ A. Montgomery] log(count) Web Site Traffic Zipf ``ebay users sites log(in-degree) UCI 2007 C. Faloutsos 17 count epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] trusts-2000-people user (out) degree UCI 2007 C. Faloutsos 18 6
7 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 19 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the master-mind? Problem#5: Track communities over time UCI 2007 C. Faloutsos 20 Problem#2: Time evolution with Jure Leskovec (CMU/MLD) and Jon Kleinberg (Cornell CMU) UCI 2007 C. Faloutsos 21 7
8 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? UCI 2007 C. Faloutsos 22 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time UCI 2007 C. Faloutsos 23 Diameter ArXiv citation graph Citations among physics papers One graph per year diameter time [years] UCI 2007 C. Faloutsos 24 8
9 Diameter Autonomous Systems Graph of Internet One graph per day diameter number of nodes UCI 2007 C. Faloutsos 25 Diameter Affiliation Network Graph of collaborations in physics authors linked to papers 10 years of data diameter time [years] UCI 2007 C. Faloutsos 26 Diameter Patents Patent citation network 25 years of data diameter time [years] UCI 2007 C. Faloutsos 27 9
10 Temporal Evolution of the Graphs N(t) nodes at time t E(t) edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) UCI 2007 C. Faloutsos 28 Temporal Evolution of the Graphs N(t) nodes at time t E(t) edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! But obeying the ``Densification Power Law UCI 2007 C. Faloutsos 29 Densification Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations E(t)?? UCI 2007 C. Faloutsos 30 N(t) 10
11 Densification Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations E(t) 1.69 UCI 2007 C. Faloutsos 31 N(t) Densification Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations E(t) : tree UCI 2007 C. Faloutsos 32 N(t) Densification Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations E(t) clique: UCI 2007 C. Faloutsos 33 N(t) 11
12 Densification Patent Citations Citations among patents granted million nodes 16.5 million edges Each year is a datapoint E(t) 1.66 N(t) UCI 2007 C. Faloutsos 34 Densification Autonomous Systems Graph of Internet ,000 nodes 26,000 edges One graph per day E(t) 1.18 N(t) UCI 2007 C. Faloutsos 35 Densification Affiliation Network Authors linked to their publications ,000 nodes 20,000 authors 38,000 papers 133,000 edges E(t) 1.15 N(t) UCI 2007 C. Faloutsos 36 12
13 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 37 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the master-mind? Problem#5: Track communities over time UCI 2007 C. Faloutsos 38 Problem#3: Generation Given a growing graph with count of nodes N 1, N 2, Generate a realistic sequence of graphs that will obey all the patterns UCI 2007 C. Faloutsos 39 13
14 Problem Definition Given a growing graph with count of nodes N 1, N 2, Generate a realistic sequence of graphs that will obey all the patterns Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters UCI 2007 C. Faloutsos 40 Problem Definition Given a growing graph with count of nodes N 1, N 2, Generate a realistic sequence of graphs that will obey all the patterns Idea: Self-similarity Leads to power laws Communities within communities UCI 2007 C. Faloutsos 41 Kronecker Product a Graph Intermediate stage Adjacency matrix Adjacency matrix UCI 2007 C. Faloutsos 42 14
15 Kronecker Product a Graph Continuing multiplying with G 1 we obtain G 4 and so on G 4 adjacency matrix UCI 2007 C. Faloutsos 43 Kronecker Product a Graph Continuing multiplying with G 1 we obtain G 4 and so on G 4 adjacency matrix UCI 2007 C. Faloutsos 44 Kronecker Product a Graph Continuing multiplying with G 1 we obtain G 4 and so on G 4 adjacency matrix UCI 2007 C. Faloutsos 45 15
16 Properties: We can PROVE that Degree distribution is multinomial ~ power law Diameter: constant Eigenvalue distribution: multinomial First eigenvector: multinomial See [Leskovec+, PKDD 05] for proofs UCI 2007 C. Faloutsos 46 Problem Definition Given a growing graph with nodes N 1, N 2, Generate a realistic sequence of graphs that will obey all the patterns Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters First and only generator for which we can prove all these properties UCI 2007 C. Faloutsos 47 Stochastic Kronecker Graphs skip Create N 1 N 1 probability matrix P 1 Compute the k th Kronecker power P k For each entry p uv of P k include an edge (u,v) with probability p uv P Kronecker multiplication P k Instance Matrix G 2 flip biased coins UCI 2007 C. Faloutsos 48 16
17 Experiments How well can we match real graphs? Arxiv: physics citations: 30,000 papers, 350,000 citations 10 years of data U.S. Patent citation network 4 million patents, 16 million citations 37 years of data Autonomous systems graph of internet Single snapshot from January ,400 nodes, 26,000 edges We show both static and temporal patterns UCI 2007 C. Faloutsos 49 Arxiv Degree Distribution Real graph Deterministic Kronecker Stochastic Kronecker count degree degree degree UCI 2007 C. Faloutsos 50 Arxiv Scree Plot Real graph Deterministic Kronecker Stochastic Kronecker Eigenvalue Rank Rank Rank UCI 2007 C. Faloutsos 51 17
18 Arxiv Densification Real graph Deterministic Kronecker Stochastic Kronecker Edges Nodes(t) Nodes(t) Nodes(t) UCI 2007 C. Faloutsos 52 Arxiv Effective Diameter Real graph Deterministic Kronecker Stochastic Kronecker Diameter Nodes(t) Nodes(t) Nodes(t) UCI 2007 C. Faloutsos 53 (Q: how to fit the parm s?) A: Stochastic version of Kronecker graphs + Max likelihood + Metropolis sampling [Leskovec+, ICML 07] UCI 2007 C. Faloutsos 54 18
19 Experiments on real AS graph Degree distribution Hop plot Adjacency matrix eigen values Network value UCI 2007 C. Faloutsos 55 Conclusions Kronecker graphs have: All the static properties Heavy tailed degree distributions Small diameter All the temporal properties Densification Power Law Shrinking/Stabilizing Diameters We can formally prove these results Multinomial eigenvalues and eigenvectors UCI 2007 C. Faloutsos 56 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 57 19
20 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the master-mind? Problem#5: Track communities over time UCI 2007 C. Faloutsos 58 Problem#4: MasterMind CePS w/ Hanghang Tong, KDD 2006 htong <at> cs.cmu.edu UCI 2007 C. Faloutsos 59 Center-Piece Subgraph(Ceps) Given Q query nodes Find Center-piece ( b ) B App. Social Networks Law Inforcement, A C Idea: Proximity -> random walk with restarts UCI 2007 C. Faloutsos 60 20
21 Case Study: AND query R. Agrawal Jiawei Han V. Vapnik M. Jordan UCI 2007 C. Faloutsos 61 Case Study: AND query 15 H.V. Jagadish 10 Laks V.S. Lakshmanan R. Agrawal Jiawei Han 10 Heikki 1 1 Mannila Christos 1 Padhraic 1 Faloutsos Smyth 1 V. Vapnik 3 M. Jordan 1 4 Corinna 6 Daryl Cortes Pregibon UCI 2007 C. Faloutsos Case Study: AND query 15 H.V. Jagadish 10 Laks V.S. Lakshmanan R. Agrawal Jiawei Han 10 Heikki 1 1 Mannila Christos Padhraic 1 1 Faloutsos Smyth 1 V. Vapnik 3 M. Jordan 1 4 Corinna 6 Daryl Cortes Pregibon UCI 2007 C. Faloutsos
22 A B C 15 H.V. Jagadish 10 Laks V.S. Lakshmanan databases 13 R. Agrawal Jiawei Han Umeshwar 3 3 Dayal 5 Bernhard Scholkopf 2 Peter L. Bartlett ML/Statistics 2 V. Vapnik M. Jordan _SoftAnd query Alex J. Smola UCI 2007 C. Faloutsos 64 Conclusions Q1:How to measure the importance? A1: RWR+K_SoftAnd Q2: How to find connection subgraph? A2: Extract Alg. Q3:How to do it efficiently? A3:Graph Partition (Fast CePS) ~90% quality 6:1 speedup; 150x speedup (ICDM 06, b.p. award) UCI 2007 C. Faloutsos 65 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 66 22
23 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the master-mind? Problem#5: Track communities over time UCI 2007 C. Faloutsos 67 Tensors for time evolving graphs [Jimeng Sun+ KDD 06] [, SDM 07] [ CF, Kolda, Sun, SDM 07 tutorial] UCI 2007 C. Faloutsos 68 Social network analysis Static: find community structures 1990 Keywords Authors DB UCI 2007 C. Faloutsos 69 23
24 Social network analysis Static: find community structures Dynamic: monitor community structure evolution; spot abnormal individuals; abnormal time-stamps 2004 DM Keywords 1990 DB Authors DB UCI 2007 C. Faloutsos authors Application 1: Multiway latent semantic indexing (LSI) 2004 DB DM keyword DB Pattern U keyword Query U authors Philip Yu Projection matrices specify the clusters Core tensors give cluster activation level Michael Stonebraker UCI 2007 C. Faloutsos 71 Bibliographic data (DBLP) Papers from VLDB and KDD conferences Construct 2nd order tensors with yearly windows with <author, keywords> Each tensor: timestamps (years) UCI 2007 C. Faloutsos 72 24
25 Authors michael carey, michael stonebraker, h. jagadish, hector garcia-molina surajit chaudhuri,mitch cherniack,michael stonebraker,ugur etintemel Multiway LSI DB jiawei han,jian pei,philip s. yu, jianyong wang,charu c. aggarwal DM Keywords queri,parallel,optimization,concurr, objectorient distribut,systems,view,storage,servic,pr ocess,cache streams,pattern,support, cluster, index,gener,queri Year Two groups are correctly identified: Databases and Data mining People and concepts are drifting over time UCI 2007 C. Faloutsos 73 Conclusions Tensor-based methods (WTA/DTA/STA): spot patterns and anomalies on time evolving graphs, and on streams (monitoring) UCI 2007 C. Faloutsos 74 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 75 25
26 Virus propagation How do viruses/rumors propagate? Will a flu-like virus linger, or will it become extinct soon? UCI 2007 C. Faloutsos 76 The model: SIS Flu like: Susceptible-Infected-Susceptible Virus strength s= β/δ N1 Infected Prob. β Prob. δ N Prob. β N2 N3 Healthy UCI 2007 C. Faloutsos 77 Epidemic threshold τ of a graph: the value of τ, such that if strength s = β / δ < τ an epidemic can not happen Thus, given a graph compute its epidemic threshold UCI 2007 C. Faloutsos 78 26
27 Epidemic threshold τ What should τ depend on? avg. degree? and/or highest degree? and/or variance of degree? and/or third moment of degree? and/or diameter? UCI 2007 C. Faloutsos 79 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A UCI 2007 C. Faloutsos 80 Epidemic threshold [Theorem] We have no epidemic, if epidemic threshold recovery prob. β/δ <τ = 1/ λ 1,A attack prob. Proof: [Wang+03] largest eigenvalue of adj. matrix A UCI 2007 C. Faloutsos 81 27
28 Experiments (Oregon) Number of Infected Nodes Oregon β = β/δ > τ (above threshold) β/δ = τ (at the threshold) Time δ: β/δ < τ (below threshold) UCI 2007 C. Faloutsos 82 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 83 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU UCI 2007 C. Faloutsos 84 28
29 E-bay Fraud detection - NetProbe UCI 2007 C. Faloutsos 85 OVERALL CONCLUSIONS Graphs pose a wealth of fascinating problems self-similarity and power laws work, when textbook methods fail! New patterns (shrinking diameter!) New generator: Kronecker UCI 2007 C. Faloutsos 86 Philosophical observation Graph mining brings together: ML/AI / IR Stat, Num. analysis, Systems (DB (Gb/Tb), Networks ) sociology, epidemiology physics, ++ UCI 2007 C. Faloutsos 87 29
30 References Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast Random Walk with Restart and Its Applications ICDM 2006, Hong Kong. Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA UCI 2007 C. Faloutsos 88 References Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations KDD 2005, Chicago, IL. ("Best Research Paper" award). Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication (ECML/PKDD 2005), Porto, Portugal, UCI 2007 C. Faloutsos 89 References Jure Leskovec and Christos Faloutsos, Scalable Modeling of Real Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR, USA Jimeng Sun, Dacheng Tao, Christos Faloutsos Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA UCI 2007 C. Faloutsos 90 30
31 References Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr [pdf] UCI 2007 C. Faloutsos 91 Contact info: www. cs.cmu.edu /~christos (w/ papers, datasets, code, etc) UCI 2007 C. Faloutsos 92 31
CMU SCS. Large Graph Mining. Christos Faloutsos CMU
Large Graph Mining Christos Faloutsos CMU Thank you! Hillol Kargupta NGDM 2007 C. Faloutsos 2 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; fraud
More informationData mining in large graphs
Data mining in large graphs Christos Faloutsos University www.cs.cmu.edu/~christos ALLADIN 2003 C. Faloutsos 1 Outline Introduction - motivation Patterns & Power laws Scalability & Fast algorithms Fractals,
More informationFinding patterns in large, real networks
Thanks to Finding patterns in large, real networks Christos Faloutsos CMU www.cs.cmu.edu/~christos/talks/ucla-05 Deepayan Chakrabarti (CMU) Michalis Faloutsos (UCR) George Siganos (UCR) UCLA-IPAM 05 (c)
More informationCMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools
Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU (on sabbatical at google) Tamara Kolda Thank you! C. Faloutsos (CMU + Google) 2 Our goal: Open source system for mining
More informationThanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides
Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank
More informationFaloutsos, Tong ICDE, 2009
Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity
More informationNetworks as vectors of their motif frequencies and 2-norm distance as a measure of similarity
Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity CS322 Project Writeup Semih Salihoglu Stanford University 353 Serra Street Stanford, CA semih@stanford.edu
More informationIncremental Pattern Discovery on Streams, Graphs and Tensors
Incremental Pattern Discovery on Streams, Graphs and Tensors Jimeng Sun Thesis Committee: Christos Faloutsos Tom Mitchell David Steier, External member Philip S. Yu, External member Hui Zhang Abstract
More informationRealistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication
Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication Jurij Leskovec 1, Deepayan Chakrabarti 1, Jon Kleinberg 2, and Christos Faloutsos 1 1 School of Computer
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart
More informationMust-read Material : Multimedia Databases and Data Mining. Indexing - Detailed outline. Outline. Faloutsos
Must-read Material 15-826: Multimedia Databases and Data Mining Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. Technical Report SAND2007-6702, Sandia National Laboratories,
More informationThanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides
Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank
More informationCMU SCS : Multimedia Databases and Data Mining CMU SCS Must-read Material CMU SCS Must-read Material (cont d)
15-826: Multimedia Databases and Data Mining Lecture #26: Graph mining - patterns Christos Faloutsos Must-read Material Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, On Power-Law Relationships
More informationLarge Graph Mining: Power Tools and a Practitioner s guide
Large Graph Mining: Power Tools and a Practitioner s guide Task 3: Recommendations & proximity Faloutsos, Miller & Tsourakakis CMU KDD'09 Faloutsos, Miller, Tsourakakis P3-1 Outline Introduction Motivation
More informationDynamics of Real-world Networks
Dynamics of Real-world Networks Thesis proposal Jurij Leskovec Machine Learning Department Carnegie Mellon University May 2, 2007 Thesis committee: Christos Faloutsos, CMU Avrim Blum, CMU John Lafferty,
More informationarxiv:physics/ v3 [physics.soc-ph] 28 Jan 2007
arxiv:physics/0603229v3 [physics.soc-ph] 28 Jan 2007 Graph Evolution: Densification and Shrinking Diameters Jure Leskovec School of Computer Science, Carnegie Mellon University, Pittsburgh, PA Jon Kleinberg
More informationCS224W: Analysis of Networks Jure Leskovec, Stanford University
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/30/17 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2
More informationRTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu Mary McGlohon Christos Faloutsos Carnegie Mellon University, School of Computer Science {lakoglu, mmcgloho, christos}@cs.cmu.edu
More informationTalk 2: Graph Mining Tools - SVD, ranking, proximity. Christos Faloutsos CMU
Talk 2: Graph Mining Tools - SVD, ranking, proximity Christos Faloutsos CMU Outline Introduction Motivation Task 1: Node importance Task 2: Recommendations Task 3: Connection sub-graphs Conclusions Lipari
More informationMust-read material (1 of 2) : Multimedia Databases and Data Mining. Must-read material (2 of 2) Main outline
Must-read material (1 of 2) 15-826: Multimedia Databases and Data Mining Lecture #29: Graph mining - Generators & tools Fully Automatic Cross-Associations, by D. Chakrabarti, S. Papadimitriou, D. Modha
More informationople are co-authors if there edge to each of them. derived from the five largest PH, HEP TH, HEP PH, place a time-stamp on each h paper, and for each perission to the arxiv. The the period from April 1992
More informationCS224W: Analysis of Networks Jure Leskovec, Stanford University
Announcements: Please fill HW Survey Weekend Office Hours starting this weekend (Hangout only) Proposal: Can use 1 late period CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu
More informationRTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Submitted for Blind Review Abstract How do real, weighted graphs change over time? What patterns, if any, do they obey? Earlier studies
More informationCommunities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices
Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into
More informationCMU SCS Mining Large Graphs: Patterns, Anomalies, and Fraud Detection
Mining Large Graphs: Patterns, Anomalies, and Fraud Detection Christos Faloutsos CMU Thank you! Prof. Julian McAuley Nicholas Urioste UCSD'16 (c) 2016, C. Faloutsos 2 Roadmap Introduction Motivation Why
More informationSubgraph Detection Using Eigenvector L1 Norms
Subgraph Detection Using Eigenvector L1 Norms Benjamin A. Miller Lincoln Laboratory Massachusetts Institute of Technology Lexington, MA 02420 bamiller@ll.mit.edu Nadya T. Bliss Lincoln Laboratory Massachusetts
More informationLink Prediction. Eman Badr Mohammed Saquib Akmal Khan
Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling
More informationAnalysis & Generative Model for Trust Networks
Analysis & Generative Model for Trust Networks Pranav Dandekar Management Science & Engineering Stanford University Stanford, CA 94305 ppd@stanford.edu ABSTRACT Trust Networks are a specific kind of social
More informationarxiv: v2 [stat.ml] 21 Aug 2009
KRONECKER GRAPHS: AN APPROACH TO MODELING NETWORKS Kronecker graphs: An Approach to Modeling Networks arxiv:0812.4905v2 [stat.ml] 21 Aug 2009 Jure Leskovec Computer Science Department, Stanford University
More informationCMU SCS Talk 3: Graph Mining Tools Tensors, communities, parallelism
Talk 3: Graph Mining Tools Tensors, communities, parallelism Christos Faloutsos CMU Overall Outline Introduction Motivation Talk#1: Patterns in graphs; generators Talk#2: Tools (Ranking, proximity) Talk#3:
More informationSupporting Statistical Hypothesis Testing Over Graphs
Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,
More informationCS224W: Methods of Parallelized Kronecker Graph Generation
CS224W: Methods of Parallelized Kronecker Graph Generation Sean Choi, Group 35 December 10th, 2012 1 Introduction The question of generating realistic graphs has always been a topic of huge interests.
More informationSlide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.
Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org #1: C4.5 Decision Tree - Classification (61 votes) #2: K-Means - Clustering
More informationJure Leskovec Stanford University
Jure Leskovec Stanford University 2 Part 1: Models for networks Part 2: Information flows in networks Information spread in social media 3 Information flows through networks Analyzing underlying mechanisms
More informationOnline Sampling of High Centrality Individuals in Social Networks
Online Sampling of High Centrality Individuals in Social Networks Arun S. Maiya and Tanya Y. Berger-Wolf Department of Computer Science University of Illinois at Chicago 85 S. Morgan, Chicago, IL 667,
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 2. Basic Linear Algebra continued Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki
More informationDATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS
DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information
More informationApplying Latent Dirichlet Allocation to Group Discovery in Large Graphs
Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed
More informationHub, Authority and Relevance Scores in Multi-Relational Data for Query Search
Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Xutao Li 1 Michael Ng 2 Yunming Ye 1 1 Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology,
More informationInfluence Maximization in Dynamic Social Networks
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang and Xiaoming Sun Department of Computer Science and Technology, Tsinghua University Department of Computer
More informationOnline Social Networks and Media. Link Analysis and Web Search
Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information
More informationEpidemics in Complex Networks and Phase Transitions
Master M2 Sciences de la Matière ENS de Lyon 2015-2016 Phase Transitions and Critical Phenomena Epidemics in Complex Networks and Phase Transitions Jordan Cambe January 13, 2016 Abstract Spreading phenomena
More informationCS224W: Social and Information Network Analysis
CS224W: Social and Information Network Analysis Reaction Paper Adithya Rao, Gautam Kumar Parai, Sandeep Sripada Keywords: Self-similar networks, fractality, scale invariance, modularity, Kronecker graphs.
More informationOnline Social Networks and Media. Link Analysis and Web Search
Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information
More informationAnomaly Detection in Temporal Graph Data: An Iterative Tensor Decomposition and Masking Approach
Proceedings 1st International Workshop on Advanced Analytics and Learning on Temporal Data AALTD 2015 Anomaly Detection in Temporal Graph Data: An Iterative Tensor Decomposition and Masking Approach Anna
More informationOverlapping Communities
Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH
More informationCS60021: Scalable Data Mining. Dimensionality Reduction
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a
More informationData Mining and Matrices
Data Mining and Matrices 10 Graphs II Rainer Gemulla, Pauli Miettinen Jul 4, 2013 Link analysis The web as a directed graph Set of web pages with associated textual content Hyperlinks between webpages
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering
More informationWiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte
Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a
More informationTHE POWER OF CERTAINTY
A DIRICHLET-MULTINOMIAL APPROACH TO BELIEF PROPAGATION Dhivya Eswaran* CMU deswaran@cs.cmu.edu Stephan Guennemann TUM guennemann@in.tum.de Christos Faloutsos CMU christos@cs.cmu.edu PROBLEM MOTIVATION
More informationSocial Influence in Online Social Networks. Epidemiological Models. Epidemic Process
Social Influence in Online Social Networks Toward Understanding Spatial Dependence on Epidemic Thresholds in Networks Dr. Zesheng Chen Viral marketing ( word-of-mouth ) Blog information cascading Rumor
More informationQuilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs
Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Hyokun Yun (work with S.V.N. Vishwanathan) Department of Statistics Purdue Machine Learning Seminar November 9, 2011 Overview
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #9: Link Analysis Seoul National University 1 In This Lecture Motivation for link analysis Pagerank: an important graph ranking algorithm Flow and random walk formulation
More information6.207/14.15: Networks Lecture 12: Generalized Random Graphs
6.207/14.15: Networks Lecture 12: Generalized Random Graphs 1 Outline Small-world model Growing random networks Power-law degree distributions: Rich-Get-Richer effects Models: Uniform attachment model
More informationAnalytically tractable processes on networks
University of California San Diego CERTH, 25 May 2011 Outline Motivation 1 Motivation Networks Random walk and Consensus Epidemic models Spreading processes on networks 2 Networks Motivation Networks Random
More informationLink Analysis Ranking
Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query
More informationRaRE: Social Rank Regulated Large-scale Network Embedding
RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat
More informationMAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds
MAE 298, Lecture 8 Feb 4, 2008 Web search and decentralized search on small-worlds Search for information Assume some resource of interest is stored at the vertices of a network: Web pages Files in a file-sharing
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Intro sessions to SNAP C++ and SNAP.PY: SNAP.PY: Friday 9/27, 4:5 5:30pm in Gates B03 SNAP
More informationWindow-based Tensor Analysis on High-dimensional and Multi-aspect Streams
Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Jimeng Sun Spiros Papadimitriou Philip S. Yu Carnegie Mellon University Pittsburgh, PA, USA IBM T.J. Watson Research Center Hawthorne,
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationAn Efficient reconciliation algorithm for social networks
An Efficient reconciliation algorithm for social networks Silvio Lattanzi (Google Research NY) Joint work with: Nitish Korula (Google Research NY) ICERM Stochastic Graph Models Outline Graph reconciliation
More informationTemporal Multi-View Inconsistency Detection for Network Traffic Analysis
WWW 15 Florence, Italy Temporal Multi-View Inconsistency Detection for Network Traffic Analysis Houping Xiao 1, Jing Gao 1, Deepak Turaga 2, Long Vu 2, and Alain Biem 2 1 Department of Computer Science
More informationA Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks
A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks Joydeep Chandra 1, Ingo Scholtes 2, Niloy Ganguly 1, Frank Schweitzer 2 1 - Dept. of Computer Science and Engineering,
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationIdea: Select and rank nodes w.r.t. their relevance or interestingness in large networks.
Graph Databases and Linked Data So far: Obects are considered as iid (independent and identical diributed) the meaning of obects depends exclusively on the description obects do not influence each other
More informationCS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University
CS 322: (Social and Inormation) Network Analysis Jure Leskovec Stanord University Initially some nodes S are active Each edge (a,b) has probability (weight) p ab b 0.4 0.4 0.2 a 0.4 0.2 0.4 g 0.2 Node
More informationAnomaly Detection. Davide Mottin, Konstantina Lazaridou. HassoPlattner Institute. Graph Mining course Winter Semester 2016
Anomaly Detection Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 and Next week February 7, third round presentations Slides are due by February 6,
More informationLarge Graph Mining: Power Tools and a Practitioner s guide
Large Graph Mining: Power Tools and a Practitioner s guide Task 5: Graphs over time & tensors Faloutsos, Miller, Tsourakakis CMU KDD '9 Faloutsos, Miller, Tsourakakis P5-1 Outline Introduction Motivation
More informationSimilarity Measures for Link Prediction Using Power Law Degree Distribution
Similarity Measures for Link Prediction Using Power Law Degree Distribution Srinivas Virinchi and Pabitra Mitra Dept of Computer Science and Engineering, Indian Institute of Technology Kharagpur-72302,
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationSpectral Analysis of k-balanced Signed Graphs
Spectral Analysis of k-balanced Signed Graphs Leting Wu 1, Xiaowei Ying 1, Xintao Wu 1, Aidong Lu 1 and Zhi-Hua Zhou 2 1 University of North Carolina at Charlotte, USA, {lwu8,xying, xwu,alu1}@uncc.edu
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
More informationCom2: Fast Automatic Discovery of Temporal ( Comet ) Communities
Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities Miguel Araujo CMU/University of Porto maraujo@cs.cmu.edu Christos Faloutsos* Carnegie Mellon University christos@cs.cmu.edu Spiros Papadimitriou
More informationNetwork Science & Telecommunications
Network Science & Telecommunications Piet Van Mieghem 1 ITC30 Networking Science Vision Day 5 September 2018, Vienna Outline Networks Birth of Network Science Function and graph Outlook 1 Network: service(s)
More informationEstimating network degree distributions from sampled networks: An inverse problem
Estimating network degree distributions from sampled networks: An inverse problem Eric D. Kolaczyk Dept of Mathematics and Statistics, Boston University kolaczyk@bu.edu Introduction: Networks and Degree
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com
More informationMultimedia Databases - 68A6 Final Term - exercises
Multimedia Databases - 8A Final Term - exercises Exercises for the preparation to the final term June, the 1th 00 quiz 1. approximation of cosine similarity An approximate computation of the cosine similarity
More informationLong Term Evolution of Networks CS224W Final Report
Long Term Evolution of Networks CS224W Final Report Jave Kane (SUNET ID javekane), Conrad Roche (SUNET ID conradr) Nov. 13, 2014 Introduction and Motivation Many studies in social networking and computer
More informationAnalytic Theory of Power Law Graphs
Analytic Theory of Power Law Graphs Jeremy Kepner This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations
More informationLearning with Temporal Point Processes
Learning with Temporal Point Processes t Manuel Gomez Rodriguez MPI for Software Systems Isabel Valera MPI for Intelligent Systems Slides/references: http://learning.mpi-sws.org/tpp-icml18 ICML TUTORIAL,
More informationCS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine
CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,
More informationCS341 info session is on Thu 3/1 5pm in Gates415. CS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS341 info session is on Thu 3/1 5pm in Gates415 CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/28/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets,
More informationInfluence Maximization in Continuous Time Diffusion Networks
Manuel Gomez-Rodriguez 1,2 Bernhard Schölkopf 1 1 MPI for Intelligent Systems and 2 Stanford University MANUELGR@STANFORD.EDU BS@TUEBINGEN.MPG.DE Abstract The problem of finding the optimal set of source
More informationOutline : Multimedia Databases and Data Mining. Indexing - similarity search. Indexing - similarity search. C.
Outline 15-826: Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos Goal: Find similar / interesting things Intro to DB Indexing - similarity search Points Text Time sequences; images
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal
More informationTime Series. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. Mining and Forecasting
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Time Series Mining and Forecasting Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech
More informationIntroduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa
Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis
More informationLINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
LINK ANALYSIS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link analysis Models
More informationMining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More information1 Searching the World Wide Web
Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on
More informationOn Influential Node Discovery in Dynamic Social Networks
On Influential Node Discovery in Dynamic Social Networks Charu Aggarwal Shuyang Lin Philip S. Yu Abstract The problem of maximizing influence spread has been widely studied in social networks, because
More informationSupplementary Information Activity driven modeling of time varying networks
Supplementary Information Activity driven modeling of time varying networks. Perra, B. Gonçalves, R. Pastor-Satorras, A. Vespignani May 11, 2012 Contents 1 The Model 1 1.1 Integrated network......................................
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More information6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search
6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)
More informationChapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining
Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of
More informationVirus Propagation on Time-Varying Networks: Theory and Immunization Algorithms
Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms B. Aditya Prakash *, Hanghang Tong *, Nicholas Valler +, Michalis Faloutsos +, and Christos Faloutsos * * Computer Science
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/26/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 More algorithms
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More information