Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to ge

Size: px
Start display at page:

Download "Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to ge"

Transcription

1 Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU Thank you! Sharad Mehrotra UCI 2007 C. Faloutsos 2 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 3 1

2 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the master-mind? Problem#5: Track communities over time UCI 2007 C. Faloutsos 4 Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.) UCI 2007 C. Faloutsos 5 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez 91] Friendship Network [Moody 01] Protein Interactions [genomebiology.com] UCI 2007 C. Faloutsos 6 2

3 Graphs - why should we care? IR: bi-partite graphs (doc-terms) D T 1 web: hyper-text graph D N T M... and more: UCI 2007 C. Faloutsos 7 Graphs - why should we care? network of companies & board-of-directors members viral marketing web-log ( blog ) news propagation computer network security: /ip traffic and anomaly detection... UCI 2007 C. Faloutsos 8 Problem #1 - network and graph mining How does the Internet look like? How does the web look like? What is normal / abnormal? which patterns/laws hold? UCI 2007 C. Faloutsos 9 3

4 Graph mining Are real graphs random? UCI 2007 C. Faloutsos 10 Laws and patterns Are real graphs random? A: NO!! Diameter in- and out- degree distributions other (surprising) patterns UCI 2007 C. Faloutsos 11 Solution#1 Power law in the degree distribution [SIGCOMM99] internet domains att.com log(degree) ibm.com log(rank) UCI 2007 C. Faloutsos 12 4

5 Solution#1 : Eigen Exponent E Eigenvalue Exponent = slope E = May 2001 Rank of decreasing eigenvalue A2: power law in the eigenvalues of the adjacency matrix UCI 2007 C. Faloutsos 13 But: How about graphs from other domains? UCI 2007 C. Faloutsos 14 The Peer-to-Peer Topology [Jovanovic+] Frequency versus degree Number of adjacent peers follows a power-law UCI 2007 C. Faloutsos 15 5

6 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) 100 cited.pdf log(count) log count 10 Ullman log # citations log(#citations) UCI 2007 C. Faloutsos 16 More power laws: web hit counts [w/ A. Montgomery] log(count) Web Site Traffic Zipf ``ebay users sites log(in-degree) UCI 2007 C. Faloutsos 17 count epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] trusts-2000-people user (out) degree UCI 2007 C. Faloutsos 18 6

7 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 19 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the master-mind? Problem#5: Track communities over time UCI 2007 C. Faloutsos 20 Problem#2: Time evolution with Jure Leskovec (CMU/MLD) and Jon Kleinberg (Cornell CMU) UCI 2007 C. Faloutsos 21 7

8 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? UCI 2007 C. Faloutsos 22 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time UCI 2007 C. Faloutsos 23 Diameter ArXiv citation graph Citations among physics papers One graph per year diameter time [years] UCI 2007 C. Faloutsos 24 8

9 Diameter Autonomous Systems Graph of Internet One graph per day diameter number of nodes UCI 2007 C. Faloutsos 25 Diameter Affiliation Network Graph of collaborations in physics authors linked to papers 10 years of data diameter time [years] UCI 2007 C. Faloutsos 26 Diameter Patents Patent citation network 25 years of data diameter time [years] UCI 2007 C. Faloutsos 27 9

10 Temporal Evolution of the Graphs N(t) nodes at time t E(t) edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) UCI 2007 C. Faloutsos 28 Temporal Evolution of the Graphs N(t) nodes at time t E(t) edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! But obeying the ``Densification Power Law UCI 2007 C. Faloutsos 29 Densification Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations E(t)?? UCI 2007 C. Faloutsos 30 N(t) 10

11 Densification Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations E(t) 1.69 UCI 2007 C. Faloutsos 31 N(t) Densification Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations E(t) : tree UCI 2007 C. Faloutsos 32 N(t) Densification Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations E(t) clique: UCI 2007 C. Faloutsos 33 N(t) 11

12 Densification Patent Citations Citations among patents granted million nodes 16.5 million edges Each year is a datapoint E(t) 1.66 N(t) UCI 2007 C. Faloutsos 34 Densification Autonomous Systems Graph of Internet ,000 nodes 26,000 edges One graph per day E(t) 1.18 N(t) UCI 2007 C. Faloutsos 35 Densification Affiliation Network Authors linked to their publications ,000 nodes 20,000 authors 38,000 papers 133,000 edges E(t) 1.15 N(t) UCI 2007 C. Faloutsos 36 12

13 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 37 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the master-mind? Problem#5: Track communities over time UCI 2007 C. Faloutsos 38 Problem#3: Generation Given a growing graph with count of nodes N 1, N 2, Generate a realistic sequence of graphs that will obey all the patterns UCI 2007 C. Faloutsos 39 13

14 Problem Definition Given a growing graph with count of nodes N 1, N 2, Generate a realistic sequence of graphs that will obey all the patterns Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters UCI 2007 C. Faloutsos 40 Problem Definition Given a growing graph with count of nodes N 1, N 2, Generate a realistic sequence of graphs that will obey all the patterns Idea: Self-similarity Leads to power laws Communities within communities UCI 2007 C. Faloutsos 41 Kronecker Product a Graph Intermediate stage Adjacency matrix Adjacency matrix UCI 2007 C. Faloutsos 42 14

15 Kronecker Product a Graph Continuing multiplying with G 1 we obtain G 4 and so on G 4 adjacency matrix UCI 2007 C. Faloutsos 43 Kronecker Product a Graph Continuing multiplying with G 1 we obtain G 4 and so on G 4 adjacency matrix UCI 2007 C. Faloutsos 44 Kronecker Product a Graph Continuing multiplying with G 1 we obtain G 4 and so on G 4 adjacency matrix UCI 2007 C. Faloutsos 45 15

16 Properties: We can PROVE that Degree distribution is multinomial ~ power law Diameter: constant Eigenvalue distribution: multinomial First eigenvector: multinomial See [Leskovec+, PKDD 05] for proofs UCI 2007 C. Faloutsos 46 Problem Definition Given a growing graph with nodes N 1, N 2, Generate a realistic sequence of graphs that will obey all the patterns Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters First and only generator for which we can prove all these properties UCI 2007 C. Faloutsos 47 Stochastic Kronecker Graphs skip Create N 1 N 1 probability matrix P 1 Compute the k th Kronecker power P k For each entry p uv of P k include an edge (u,v) with probability p uv P Kronecker multiplication P k Instance Matrix G 2 flip biased coins UCI 2007 C. Faloutsos 48 16

17 Experiments How well can we match real graphs? Arxiv: physics citations: 30,000 papers, 350,000 citations 10 years of data U.S. Patent citation network 4 million patents, 16 million citations 37 years of data Autonomous systems graph of internet Single snapshot from January ,400 nodes, 26,000 edges We show both static and temporal patterns UCI 2007 C. Faloutsos 49 Arxiv Degree Distribution Real graph Deterministic Kronecker Stochastic Kronecker count degree degree degree UCI 2007 C. Faloutsos 50 Arxiv Scree Plot Real graph Deterministic Kronecker Stochastic Kronecker Eigenvalue Rank Rank Rank UCI 2007 C. Faloutsos 51 17

18 Arxiv Densification Real graph Deterministic Kronecker Stochastic Kronecker Edges Nodes(t) Nodes(t) Nodes(t) UCI 2007 C. Faloutsos 52 Arxiv Effective Diameter Real graph Deterministic Kronecker Stochastic Kronecker Diameter Nodes(t) Nodes(t) Nodes(t) UCI 2007 C. Faloutsos 53 (Q: how to fit the parm s?) A: Stochastic version of Kronecker graphs + Max likelihood + Metropolis sampling [Leskovec+, ICML 07] UCI 2007 C. Faloutsos 54 18

19 Experiments on real AS graph Degree distribution Hop plot Adjacency matrix eigen values Network value UCI 2007 C. Faloutsos 55 Conclusions Kronecker graphs have: All the static properties Heavy tailed degree distributions Small diameter All the temporal properties Densification Power Law Shrinking/Stabilizing Diameters We can formally prove these results Multinomial eigenvalues and eigenvectors UCI 2007 C. Faloutsos 56 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 57 19

20 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the master-mind? Problem#5: Track communities over time UCI 2007 C. Faloutsos 58 Problem#4: MasterMind CePS w/ Hanghang Tong, KDD 2006 htong <at> cs.cmu.edu UCI 2007 C. Faloutsos 59 Center-Piece Subgraph(Ceps) Given Q query nodes Find Center-piece ( b ) B App. Social Networks Law Inforcement, A C Idea: Proximity -> random walk with restarts UCI 2007 C. Faloutsos 60 20

21 Case Study: AND query R. Agrawal Jiawei Han V. Vapnik M. Jordan UCI 2007 C. Faloutsos 61 Case Study: AND query 15 H.V. Jagadish 10 Laks V.S. Lakshmanan R. Agrawal Jiawei Han 10 Heikki 1 1 Mannila Christos 1 Padhraic 1 Faloutsos Smyth 1 V. Vapnik 3 M. Jordan 1 4 Corinna 6 Daryl Cortes Pregibon UCI 2007 C. Faloutsos Case Study: AND query 15 H.V. Jagadish 10 Laks V.S. Lakshmanan R. Agrawal Jiawei Han 10 Heikki 1 1 Mannila Christos Padhraic 1 1 Faloutsos Smyth 1 V. Vapnik 3 M. Jordan 1 4 Corinna 6 Daryl Cortes Pregibon UCI 2007 C. Faloutsos

22 A B C 15 H.V. Jagadish 10 Laks V.S. Lakshmanan databases 13 R. Agrawal Jiawei Han Umeshwar 3 3 Dayal 5 Bernhard Scholkopf 2 Peter L. Bartlett ML/Statistics 2 V. Vapnik M. Jordan _SoftAnd query Alex J. Smola UCI 2007 C. Faloutsos 64 Conclusions Q1:How to measure the importance? A1: RWR+K_SoftAnd Q2: How to find connection subgraph? A2: Extract Alg. Q3:How to do it efficiently? A3:Graph Partition (Fast CePS) ~90% quality 6:1 speedup; 150x speedup (ICDM 06, b.p. award) UCI 2007 C. Faloutsos 65 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 66 22

23 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the master-mind? Problem#5: Track communities over time UCI 2007 C. Faloutsos 67 Tensors for time evolving graphs [Jimeng Sun+ KDD 06] [, SDM 07] [ CF, Kolda, Sun, SDM 07 tutorial] UCI 2007 C. Faloutsos 68 Social network analysis Static: find community structures 1990 Keywords Authors DB UCI 2007 C. Faloutsos 69 23

24 Social network analysis Static: find community structures Dynamic: monitor community structure evolution; spot abnormal individuals; abnormal time-stamps 2004 DM Keywords 1990 DB Authors DB UCI 2007 C. Faloutsos authors Application 1: Multiway latent semantic indexing (LSI) 2004 DB DM keyword DB Pattern U keyword Query U authors Philip Yu Projection matrices specify the clusters Core tensors give cluster activation level Michael Stonebraker UCI 2007 C. Faloutsos 71 Bibliographic data (DBLP) Papers from VLDB and KDD conferences Construct 2nd order tensors with yearly windows with <author, keywords> Each tensor: timestamps (years) UCI 2007 C. Faloutsos 72 24

25 Authors michael carey, michael stonebraker, h. jagadish, hector garcia-molina surajit chaudhuri,mitch cherniack,michael stonebraker,ugur etintemel Multiway LSI DB jiawei han,jian pei,philip s. yu, jianyong wang,charu c. aggarwal DM Keywords queri,parallel,optimization,concurr, objectorient distribut,systems,view,storage,servic,pr ocess,cache streams,pattern,support, cluster, index,gener,queri Year Two groups are correctly identified: Databases and Data mining People and concepts are drifting over time UCI 2007 C. Faloutsos 73 Conclusions Tensor-based methods (WTA/DTA/STA): spot patterns and anomalies on time evolving graphs, and on streams (monitoring) UCI 2007 C. Faloutsos 74 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 75 25

26 Virus propagation How do viruses/rumors propagate? Will a flu-like virus linger, or will it become extinct soon? UCI 2007 C. Faloutsos 76 The model: SIS Flu like: Susceptible-Infected-Susceptible Virus strength s= β/δ N1 Infected Prob. β Prob. δ N Prob. β N2 N3 Healthy UCI 2007 C. Faloutsos 77 Epidemic threshold τ of a graph: the value of τ, such that if strength s = β / δ < τ an epidemic can not happen Thus, given a graph compute its epidemic threshold UCI 2007 C. Faloutsos 78 26

27 Epidemic threshold τ What should τ depend on? avg. degree? and/or highest degree? and/or variance of degree? and/or third moment of degree? and/or diameter? UCI 2007 C. Faloutsos 79 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A UCI 2007 C. Faloutsos 80 Epidemic threshold [Theorem] We have no epidemic, if epidemic threshold recovery prob. β/δ <τ = 1/ λ 1,A attack prob. Proof: [Wang+03] largest eigenvalue of adj. matrix A UCI 2007 C. Faloutsos 81 27

28 Experiments (Oregon) Number of Infected Nodes Oregon β = β/δ > τ (above threshold) β/δ = τ (at the threshold) Time δ: β/δ < τ (below threshold) UCI 2007 C. Faloutsos 82 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions UCI 2007 C. Faloutsos 83 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU UCI 2007 C. Faloutsos 84 28

29 E-bay Fraud detection - NetProbe UCI 2007 C. Faloutsos 85 OVERALL CONCLUSIONS Graphs pose a wealth of fascinating problems self-similarity and power laws work, when textbook methods fail! New patterns (shrinking diameter!) New generator: Kronecker UCI 2007 C. Faloutsos 86 Philosophical observation Graph mining brings together: ML/AI / IR Stat, Num. analysis, Systems (DB (Gb/Tb), Networks ) sociology, epidemiology physics, ++ UCI 2007 C. Faloutsos 87 29

30 References Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast Random Walk with Restart and Its Applications ICDM 2006, Hong Kong. Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA UCI 2007 C. Faloutsos 88 References Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations KDD 2005, Chicago, IL. ("Best Research Paper" award). Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication (ECML/PKDD 2005), Porto, Portugal, UCI 2007 C. Faloutsos 89 References Jure Leskovec and Christos Faloutsos, Scalable Modeling of Real Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR, USA Jimeng Sun, Dacheng Tao, Christos Faloutsos Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA UCI 2007 C. Faloutsos 90 30

31 References Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr [pdf] UCI 2007 C. Faloutsos 91 Contact info: www. cs.cmu.edu /~christos (w/ papers, datasets, code, etc) UCI 2007 C. Faloutsos 92 31

CMU SCS. Large Graph Mining. Christos Faloutsos CMU

CMU SCS. Large Graph Mining. Christos Faloutsos CMU Large Graph Mining Christos Faloutsos CMU Thank you! Hillol Kargupta NGDM 2007 C. Faloutsos 2 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; fraud

More information

Data mining in large graphs

Data mining in large graphs Data mining in large graphs Christos Faloutsos University www.cs.cmu.edu/~christos ALLADIN 2003 C. Faloutsos 1 Outline Introduction - motivation Patterns & Power laws Scalability & Fast algorithms Fractals,

More information

Finding patterns in large, real networks

Finding patterns in large, real networks Thanks to Finding patterns in large, real networks Christos Faloutsos CMU www.cs.cmu.edu/~christos/talks/ucla-05 Deepayan Chakrabarti (CMU) Michalis Faloutsos (UCR) George Siganos (UCR) UCLA-IPAM 05 (c)

More information

CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools

CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU (on sabbatical at google) Tamara Kolda Thank you! C. Faloutsos (CMU + Google) 2 Our goal: Open source system for mining

More information

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank

More information

Faloutsos, Tong ICDE, 2009

Faloutsos, Tong ICDE, 2009 Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity

More information

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity CS322 Project Writeup Semih Salihoglu Stanford University 353 Serra Street Stanford, CA semih@stanford.edu

More information

Incremental Pattern Discovery on Streams, Graphs and Tensors

Incremental Pattern Discovery on Streams, Graphs and Tensors Incremental Pattern Discovery on Streams, Graphs and Tensors Jimeng Sun Thesis Committee: Christos Faloutsos Tom Mitchell David Steier, External member Philip S. Yu, External member Hui Zhang Abstract

More information

Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication

Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication Jurij Leskovec 1, Deepayan Chakrabarti 1, Jon Kleinberg 2, and Christos Faloutsos 1 1 School of Computer

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart

More information

Must-read Material : Multimedia Databases and Data Mining. Indexing - Detailed outline. Outline. Faloutsos

Must-read Material : Multimedia Databases and Data Mining. Indexing - Detailed outline. Outline. Faloutsos Must-read Material 15-826: Multimedia Databases and Data Mining Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. Technical Report SAND2007-6702, Sandia National Laboratories,

More information

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank

More information

CMU SCS : Multimedia Databases and Data Mining CMU SCS Must-read Material CMU SCS Must-read Material (cont d)

CMU SCS : Multimedia Databases and Data Mining CMU SCS Must-read Material CMU SCS Must-read Material (cont d) 15-826: Multimedia Databases and Data Mining Lecture #26: Graph mining - patterns Christos Faloutsos Must-read Material Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, On Power-Law Relationships

More information

Large Graph Mining: Power Tools and a Practitioner s guide

Large Graph Mining: Power Tools and a Practitioner s guide Large Graph Mining: Power Tools and a Practitioner s guide Task 3: Recommendations & proximity Faloutsos, Miller & Tsourakakis CMU KDD'09 Faloutsos, Miller, Tsourakakis P3-1 Outline Introduction Motivation

More information

Dynamics of Real-world Networks

Dynamics of Real-world Networks Dynamics of Real-world Networks Thesis proposal Jurij Leskovec Machine Learning Department Carnegie Mellon University May 2, 2007 Thesis committee: Christos Faloutsos, CMU Avrim Blum, CMU John Lafferty,

More information

arxiv:physics/ v3 [physics.soc-ph] 28 Jan 2007

arxiv:physics/ v3 [physics.soc-ph] 28 Jan 2007 arxiv:physics/0603229v3 [physics.soc-ph] 28 Jan 2007 Graph Evolution: Densification and Shrinking Diameters Jure Leskovec School of Computer Science, Carnegie Mellon University, Pittsburgh, PA Jon Kleinberg

More information

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS224W: Analysis of Networks Jure Leskovec, Stanford University CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/30/17 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

More information

RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs

RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu Mary McGlohon Christos Faloutsos Carnegie Mellon University, School of Computer Science {lakoglu, mmcgloho, christos}@cs.cmu.edu

More information

Talk 2: Graph Mining Tools - SVD, ranking, proximity. Christos Faloutsos CMU

Talk 2: Graph Mining Tools - SVD, ranking, proximity. Christos Faloutsos CMU Talk 2: Graph Mining Tools - SVD, ranking, proximity Christos Faloutsos CMU Outline Introduction Motivation Task 1: Node importance Task 2: Recommendations Task 3: Connection sub-graphs Conclusions Lipari

More information

Must-read material (1 of 2) : Multimedia Databases and Data Mining. Must-read material (2 of 2) Main outline

Must-read material (1 of 2) : Multimedia Databases and Data Mining. Must-read material (2 of 2) Main outline Must-read material (1 of 2) 15-826: Multimedia Databases and Data Mining Lecture #29: Graph mining - Generators & tools Fully Automatic Cross-Associations, by D. Chakrabarti, S. Papadimitriou, D. Modha

More information

ople are co-authors if there edge to each of them. derived from the five largest PH, HEP TH, HEP PH, place a time-stamp on each h paper, and for each perission to the arxiv. The the period from April 1992

More information

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS224W: Analysis of Networks Jure Leskovec, Stanford University Announcements: Please fill HW Survey Weekend Office Hours starting this weekend (Hangout only) Proposal: Can use 1 late period CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

More information

RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs

RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Submitted for Blind Review Abstract How do real, weighted graphs change over time? What patterns, if any, do they obey? Earlier studies

More information

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into

More information

CMU SCS Mining Large Graphs: Patterns, Anomalies, and Fraud Detection

CMU SCS Mining Large Graphs: Patterns, Anomalies, and Fraud Detection Mining Large Graphs: Patterns, Anomalies, and Fraud Detection Christos Faloutsos CMU Thank you! Prof. Julian McAuley Nicholas Urioste UCSD'16 (c) 2016, C. Faloutsos 2 Roadmap Introduction Motivation Why

More information

Subgraph Detection Using Eigenvector L1 Norms

Subgraph Detection Using Eigenvector L1 Norms Subgraph Detection Using Eigenvector L1 Norms Benjamin A. Miller Lincoln Laboratory Massachusetts Institute of Technology Lexington, MA 02420 bamiller@ll.mit.edu Nadya T. Bliss Lincoln Laboratory Massachusetts

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

Analysis & Generative Model for Trust Networks

Analysis & Generative Model for Trust Networks Analysis & Generative Model for Trust Networks Pranav Dandekar Management Science & Engineering Stanford University Stanford, CA 94305 ppd@stanford.edu ABSTRACT Trust Networks are a specific kind of social

More information

arxiv: v2 [stat.ml] 21 Aug 2009

arxiv: v2 [stat.ml] 21 Aug 2009 KRONECKER GRAPHS: AN APPROACH TO MODELING NETWORKS Kronecker graphs: An Approach to Modeling Networks arxiv:0812.4905v2 [stat.ml] 21 Aug 2009 Jure Leskovec Computer Science Department, Stanford University

More information

CMU SCS Talk 3: Graph Mining Tools Tensors, communities, parallelism

CMU SCS Talk 3: Graph Mining Tools Tensors, communities, parallelism Talk 3: Graph Mining Tools Tensors, communities, parallelism Christos Faloutsos CMU Overall Outline Introduction Motivation Talk#1: Patterns in graphs; generators Talk#2: Tools (Ranking, proximity) Talk#3:

More information

Supporting Statistical Hypothesis Testing Over Graphs

Supporting Statistical Hypothesis Testing Over Graphs Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,

More information

CS224W: Methods of Parallelized Kronecker Graph Generation

CS224W: Methods of Parallelized Kronecker Graph Generation CS224W: Methods of Parallelized Kronecker Graph Generation Sean Choi, Group 35 December 10th, 2012 1 Introduction The question of generating realistic graphs has always been a topic of huge interests.

More information

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University. Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org #1: C4.5 Decision Tree - Classification (61 votes) #2: K-Means - Clustering

More information

Jure Leskovec Stanford University

Jure Leskovec Stanford University Jure Leskovec Stanford University 2 Part 1: Models for networks Part 2: Information flows in networks Information spread in social media 3 Information flows through networks Analyzing underlying mechanisms

More information

Online Sampling of High Centrality Individuals in Social Networks

Online Sampling of High Centrality Individuals in Social Networks Online Sampling of High Centrality Individuals in Social Networks Arun S. Maiya and Tanya Y. Berger-Wolf Department of Computer Science University of Illinois at Chicago 85 S. Morgan, Chicago, IL 667,

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 2. Basic Linear Algebra continued Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

More information

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information

More information

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed

More information

Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search

Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Xutao Li 1 Michael Ng 2 Yunming Ye 1 1 Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology,

More information

Influence Maximization in Dynamic Social Networks

Influence Maximization in Dynamic Social Networks Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang and Xiaoming Sun Department of Computer Science and Technology, Tsinghua University Department of Computer

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Epidemics in Complex Networks and Phase Transitions

Epidemics in Complex Networks and Phase Transitions Master M2 Sciences de la Matière ENS de Lyon 2015-2016 Phase Transitions and Critical Phenomena Epidemics in Complex Networks and Phase Transitions Jordan Cambe January 13, 2016 Abstract Spreading phenomena

More information

CS224W: Social and Information Network Analysis

CS224W: Social and Information Network Analysis CS224W: Social and Information Network Analysis Reaction Paper Adithya Rao, Gautam Kumar Parai, Sandeep Sripada Keywords: Self-similar networks, fractality, scale invariance, modularity, Kronecker graphs.

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Anomaly Detection in Temporal Graph Data: An Iterative Tensor Decomposition and Masking Approach

Anomaly Detection in Temporal Graph Data: An Iterative Tensor Decomposition and Masking Approach Proceedings 1st International Workshop on Advanced Analytics and Learning on Temporal Data AALTD 2015 Anomaly Detection in Temporal Graph Data: An Iterative Tensor Decomposition and Masking Approach Anna

More information

Overlapping Communities

Overlapping Communities Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH

More information

CS60021: Scalable Data Mining. Dimensionality Reduction

CS60021: Scalable Data Mining. Dimensionality Reduction J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 10 Graphs II Rainer Gemulla, Pauli Miettinen Jul 4, 2013 Link analysis The web as a directed graph Set of web pages with associated textual content Hyperlinks between webpages

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering

More information

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a

More information

THE POWER OF CERTAINTY

THE POWER OF CERTAINTY A DIRICHLET-MULTINOMIAL APPROACH TO BELIEF PROPAGATION Dhivya Eswaran* CMU deswaran@cs.cmu.edu Stephan Guennemann TUM guennemann@in.tum.de Christos Faloutsos CMU christos@cs.cmu.edu PROBLEM MOTIVATION

More information

Social Influence in Online Social Networks. Epidemiological Models. Epidemic Process

Social Influence in Online Social Networks. Epidemiological Models. Epidemic Process Social Influence in Online Social Networks Toward Understanding Spatial Dependence on Epidemic Thresholds in Networks Dr. Zesheng Chen Viral marketing ( word-of-mouth ) Blog information cascading Rumor

More information

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Hyokun Yun (work with S.V.N. Vishwanathan) Department of Statistics Purdue Machine Learning Seminar November 9, 2011 Overview

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #9: Link Analysis Seoul National University 1 In This Lecture Motivation for link analysis Pagerank: an important graph ranking algorithm Flow and random walk formulation

More information

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

6.207/14.15: Networks Lecture 12: Generalized Random Graphs 6.207/14.15: Networks Lecture 12: Generalized Random Graphs 1 Outline Small-world model Growing random networks Power-law degree distributions: Rich-Get-Richer effects Models: Uniform attachment model

More information

Analytically tractable processes on networks

Analytically tractable processes on networks University of California San Diego CERTH, 25 May 2011 Outline Motivation 1 Motivation Networks Random walk and Consensus Epidemic models Spreading processes on networks 2 Networks Motivation Networks Random

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds MAE 298, Lecture 8 Feb 4, 2008 Web search and decentralized search on small-worlds Search for information Assume some resource of interest is stored at the vertices of a network: Web pages Files in a file-sharing

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Intro sessions to SNAP C++ and SNAP.PY: SNAP.PY: Friday 9/27, 4:5 5:30pm in Gates B03 SNAP

More information

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Jimeng Sun Spiros Papadimitriou Philip S. Yu Carnegie Mellon University Pittsburgh, PA, USA IBM T.J. Watson Research Center Hawthorne,

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

An Efficient reconciliation algorithm for social networks

An Efficient reconciliation algorithm for social networks An Efficient reconciliation algorithm for social networks Silvio Lattanzi (Google Research NY) Joint work with: Nitish Korula (Google Research NY) ICERM Stochastic Graph Models Outline Graph reconciliation

More information

Temporal Multi-View Inconsistency Detection for Network Traffic Analysis

Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW 15 Florence, Italy Temporal Multi-View Inconsistency Detection for Network Traffic Analysis Houping Xiao 1, Jing Gao 1, Deepak Turaga 2, Long Vu 2, and Alain Biem 2 1 Department of Computer Science

More information

A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks

A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks Joydeep Chandra 1, Ingo Scholtes 2, Niloy Ganguly 1, Frank Schweitzer 2 1 - Dept. of Computer Science and Engineering,

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Idea: Select and rank nodes w.r.t. their relevance or interestingness in large networks.

Idea: Select and rank nodes w.r.t. their relevance or interestingness in large networks. Graph Databases and Linked Data So far: Obects are considered as iid (independent and identical diributed) the meaning of obects depends exclusively on the description obects do not influence each other

More information

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University CS 322: (Social and Inormation) Network Analysis Jure Leskovec Stanord University Initially some nodes S are active Each edge (a,b) has probability (weight) p ab b 0.4 0.4 0.2 a 0.4 0.2 0.4 g 0.2 Node

More information

Anomaly Detection. Davide Mottin, Konstantina Lazaridou. HassoPlattner Institute. Graph Mining course Winter Semester 2016

Anomaly Detection. Davide Mottin, Konstantina Lazaridou. HassoPlattner Institute. Graph Mining course Winter Semester 2016 Anomaly Detection Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 and Next week February 7, third round presentations Slides are due by February 6,

More information

Large Graph Mining: Power Tools and a Practitioner s guide

Large Graph Mining: Power Tools and a Practitioner s guide Large Graph Mining: Power Tools and a Practitioner s guide Task 5: Graphs over time & tensors Faloutsos, Miller, Tsourakakis CMU KDD '9 Faloutsos, Miller, Tsourakakis P5-1 Outline Introduction Motivation

More information

Similarity Measures for Link Prediction Using Power Law Degree Distribution

Similarity Measures for Link Prediction Using Power Law Degree Distribution Similarity Measures for Link Prediction Using Power Law Degree Distribution Srinivas Virinchi and Pabitra Mitra Dept of Computer Science and Engineering, Indian Institute of Technology Kharagpur-72302,

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Spectral Analysis of k-balanced Signed Graphs

Spectral Analysis of k-balanced Signed Graphs Spectral Analysis of k-balanced Signed Graphs Leting Wu 1, Xiaowei Ying 1, Xintao Wu 1, Aidong Lu 1 and Zhi-Hua Zhou 2 1 University of North Carolina at Charlotte, USA, {lwu8,xying, xwu,alu1}@uncc.edu

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

More information

Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities

Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities Com2: Fast Automatic Discovery of Temporal ( Comet ) Communities Miguel Araujo CMU/University of Porto maraujo@cs.cmu.edu Christos Faloutsos* Carnegie Mellon University christos@cs.cmu.edu Spiros Papadimitriou

More information

Network Science & Telecommunications

Network Science & Telecommunications Network Science & Telecommunications Piet Van Mieghem 1 ITC30 Networking Science Vision Day 5 September 2018, Vienna Outline Networks Birth of Network Science Function and graph Outlook 1 Network: service(s)

More information

Estimating network degree distributions from sampled networks: An inverse problem

Estimating network degree distributions from sampled networks: An inverse problem Estimating network degree distributions from sampled networks: An inverse problem Eric D. Kolaczyk Dept of Mathematics and Statistics, Boston University kolaczyk@bu.edu Introduction: Networks and Degree

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com

More information

Multimedia Databases - 68A6 Final Term - exercises

Multimedia Databases - 68A6 Final Term - exercises Multimedia Databases - 8A Final Term - exercises Exercises for the preparation to the final term June, the 1th 00 quiz 1. approximation of cosine similarity An approximate computation of the cosine similarity

More information

Long Term Evolution of Networks CS224W Final Report

Long Term Evolution of Networks CS224W Final Report Long Term Evolution of Networks CS224W Final Report Jave Kane (SUNET ID javekane), Conrad Roche (SUNET ID conradr) Nov. 13, 2014 Introduction and Motivation Many studies in social networking and computer

More information

Analytic Theory of Power Law Graphs

Analytic Theory of Power Law Graphs Analytic Theory of Power Law Graphs Jeremy Kepner This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations

More information

Learning with Temporal Point Processes

Learning with Temporal Point Processes Learning with Temporal Point Processes t Manuel Gomez Rodriguez MPI for Software Systems Isabel Valera MPI for Intelligent Systems Slides/references: http://learning.mpi-sws.org/tpp-icml18 ICML TUTORIAL,

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

CS341 info session is on Thu 3/1 5pm in Gates415. CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS341 info session is on Thu 3/1 5pm in Gates415. CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS341 info session is on Thu 3/1 5pm in Gates415 CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/28/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

More information

Influence Maximization in Continuous Time Diffusion Networks

Influence Maximization in Continuous Time Diffusion Networks Manuel Gomez-Rodriguez 1,2 Bernhard Schölkopf 1 1 MPI for Intelligent Systems and 2 Stanford University MANUELGR@STANFORD.EDU BS@TUEBINGEN.MPG.DE Abstract The problem of finding the optimal set of source

More information

Outline : Multimedia Databases and Data Mining. Indexing - similarity search. Indexing - similarity search. C.

Outline : Multimedia Databases and Data Mining. Indexing - similarity search. Indexing - similarity search. C. Outline 15-826: Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos Goal: Find similar / interesting things Intro to DB Indexing - similarity search Points Text Time sequences; images

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal

More information

Time Series. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. Mining and Forecasting

Time Series. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. Mining and Forecasting http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Time Series Mining and Forecasting Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech

More information

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis

More information

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS LINK ANALYSIS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link analysis Models

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

1 Searching the World Wide Web

1 Searching the World Wide Web Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on

More information

On Influential Node Discovery in Dynamic Social Networks

On Influential Node Discovery in Dynamic Social Networks On Influential Node Discovery in Dynamic Social Networks Charu Aggarwal Shuyang Lin Philip S. Yu Abstract The problem of maximizing influence spread has been widely studied in social networks, because

More information

Supplementary Information Activity driven modeling of time varying networks

Supplementary Information Activity driven modeling of time varying networks Supplementary Information Activity driven modeling of time varying networks. Perra, B. Gonçalves, R. Pastor-Satorras, A. Vespignani May 11, 2012 Contents 1 The Model 1 1.1 Integrated network......................................

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search 6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)

More information

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of

More information

Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms

Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms B. Aditya Prakash *, Hanghang Tong *, Nicholas Valler +, Michalis Faloutsos +, and Christos Faloutsos * * Computer Science

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/26/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 More algorithms

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information