Heat Kernel Based Community Detection

Size: px
Start display at page:

Download "Heat Kernel Based Community Detection"

Transcription

1 Heat Kernel Based Community Detection Joint with David F. Gleich, (Purdue), supported by" NSF CAREER CCF Kyle Kloster! Purdue University!

2 Local Community Detection Given seed(s) S in G, find a community that contains S. seed Community?

3 Local Community Detection Given seed(s) S in G, find a community that contains S. seed Community? high internal, low external connectivity!

4 Low-conductance sets are communities # edges leaving T conductance( T ) = # edge endpoints in T = chance a random step exits T

5 Low-conductance sets are communities # edges leaving T conductance( T ) = # edge endpoints in T = chance a random step exits T conductance(comm) = 39/381 =.102 How to find these?!

6 Graph diffusions find low-conductance sets A diffusion propagates rank from a seed across a graph. seed = high! = low! diffusion value!

7 Graph diffusions find low-conductance sets A diffusion propagates rank from a seed across a graph. seed = high! = low! diffusion value! = local community /! low-conductance set! Okay " how does this work?

8 Graph Diffusion A diffusion models how a mass (green dye, money, popularity) spreads from a seed across a network. seed p 0 p 1 p 2 p 3

9 Graph Diffusion diffuse A diffusion models how a mass (green dye, money, popularity) spreads from a seed across a network. seed p 0 p 1 p 2 p 3

10 Graph Diffusion diffuse A diffusion models how a mass seed (green dye, money, popularity) spreads from a seed across a p 0 p 1 p 2 p 3 network." Once mass reaches a node, it propagates to the neighbors, with some decay. decay : dye dilutes, money is taxed, popularity fades

11 Graph Diffusion A diffusion models how a mass seed (green dye, money, popularity) spreads from a seed across a network. " Once mass reaches a node, it propagates to the neighbors, diffuse with some decay. decay : dye dilutes, money is taxed, popularity fades p 0 p 1 p 2 p 3

12 Graph Diffusion A diffusion models how a mass seed (green dye, money, popularity) spreads from a seed across a network." Once mass reaches a node, it propagates to the neighbors, with some decay. decay : dye dilutes, money is taxed, popularity fades p 0 p 1 p 2 p 3

13 Diffusion score diffusion score of a node = " weighted sum of the mass at that node during different stages. c 0 p 0 + c 1 p 1 + c 2 p 2 + c p 3 + 3

14 Diffusion score diffusion score of a node = " weighted sum of the mass at that node during different stages. c 0 p 0 + c 1 p 1 + c 2 p 2 + c p diffusion score vector = f! 1X f = c k P k s k=0 P = random-walk s = c k = transition matrix normalized seed vector weight on stage k

15 Heat Kernel vs. PageRank Diffusions Heat Kernel uses t k /k! " Our work is new analysis for this diffusion. t 0 p 0 0! t ! p 1 t 2 p 2 p 3 2! t 3 3! + PageRank uses α k at stage k." Standard, widely-used diffusion we use for comparison. α 0 p 0 + α 1 p 1 + α 2 p 2 + α3 p 3 +

16 Heat Kernel vs. PageRank Behavior HK emphasizes earlier stages of diffusion =0.99 PR, α k Weight 10 5 t=1 t=5 t=15 =0.85 HK, t k /k! Length à involve shorter walks from seed, à so HK looks at smaller sets than PR

17 Heat Kernel vs. PageRank Theory PR good conductance Local Cheeger Inequality:" PR finds set of nearoptimal conductance fast algorithm PPR-push is O(1/(ε(1-α))) in theory, fast in practice [Andersen Chung Lang 06] HK

18 Heat Kernel vs. PageRank Theory PR good conductance Local Cheeger Inequality:" PR finds set of nearoptimal conductance fast algorithm PPR-push is O(1/(ε(1-α))) in theory, fast in practice [Andersen Chung Lang 06] HK Local Cheeger Inequality [Chung 07]

19 Heat Kernel vs. PageRank Theory PR good conductance Local Cheeger Inequality:" PR finds set of nearoptimal conductance fast algorithm PPR-push is O(1/(ε(1-α))) in theory, fast in practice [Andersen Chung Lang 06] HK Local Cheeger Inequality [Chung 07] Our work!

20 Our work on Heat Kernel: theory THEOREM Our algorithm for a relative" ε-accuracy in a degree-weighted norm has runtime <= O( e t (log(1/ε) + log(t)) / ε) (which is constant, regardless of graph size)

21 Our work on Heat Kernel: theory THEOREM Our algorithm for a relative" ε-accuracy in a degree-weighted norm has runtime <= O( e t (log(1/ε) + log(t)) / ε) (which is constant, regardless of graph size) COROLLARY HK is local! (O(1) runtime à diffusion vector has O(1) entries)!

22 Our work on Heat Kernel: results First efficient, deterministic HK algorithm. Deterministic is important to be able to compare the behaviors of HK and PR experimentally: Our key findings! HK more accurately describes ground-truth communities in real-world networks! identifies smaller sets à better precision speed & conductance comparable with PR

23 Python" demo Twitter graph 41.6 M nodes 2.4 B edges un-optimized Python code on a laptop Available for download:!

24 Algorithm Outline Computing HK! 1. Pre-compute push thresholds 2. Do push on all entries above threshold

25 Algorithm Intuition Computing HK given parameters t, ε, seed s! Starting from here How to end up here? seed p 0 p 1 p 2 p 3 t 0 p 0 0! t ! p 1 t 2 p 2 p 3 2! t 3 3! +

26 Algorithm Intuition Begin with mass at seed(s) in a residual staging area, r 0 The residuals r k hold mass that is unprocessed it s like error seed r 0 r 1 r 2 r 3 t 0 p 0 0! t ! p 1 t 2 p 2 p 3 2! t 3 3! +

27 Push Operation push (1) remove entry in r k," (2) put in p, r 0 r 1 r 2 r 3 t 0 p 0 0! t ! p 1 t 2 p 2 p 3 2! t 3 3! +

28 Push Operation push (1) remove entry in r k," (2) put in p, (3) then scale and spread to neighbors in next r! t 1 1! r 0 r 1 r 2 r 3 t 0 p 0 0! t ! p 1 t 2 p 2 p 3 2! t 3 3! +

29 Push Operation push (1) remove entry in r k," (2) put in p, (3) then scale and spread to neighbors in next r (repeat) t 2 2! r 0 r 1 r 2 r 3 t 0 p 0 0! t ! p 1 t 2 p 2 p 3 2! t 3 3! +

30 Push Operation push (1) remove entry in r k," (2) put in p, (3) then scale and spread to neighbors in next r (repeat) r 0 r 1 r 2 r 3 t 2 2! t 0 p 0 0! t ! p 1 t 2 p 2 p 3 2! t 3 3! +

31 Push Operation push (1) remove entry in r k," (2) put in p, (3) then scale and spread to neighbors in next r (repeat) r 0 r 1 r 2 r 3 t 2 2! t 3 3! t 0 p 0 0! t ! p 1 t 2 p 2 p 3 2! t 3 3! +

32 Thresholds entries < threshold ERROR equals weighted sum of entries left in r k à Set threshold so leftovers sum to < ε r 0 r 1 r 2 r 3 t 0 p 0 0! t ! p 1 t 2 p 2 p 3 2! t 3 3! +

33 Thresholds entries < threshold ERROR equals weighted sum of entries left in r k à Set threshold so leftovers sum to < ε Threshold for stage r k is" sum of remaining scale factors" prior scaling factor t 0 p 0 0! r 0 r 1 r 2 r 3 t ! p 1 t 2 p 2 p 3 2! t 3 3! +

34 Algorithm Outline Computing HK! 1. Pre-compute push thresholds 2. Do push on all entries above threshold

35 Communities in Real-world Networks Given a seed in an unidentified real-world community, how well can HK and PR describe that community? Measure quality using F 1 -measure. Graph V E amazon 330 K 930 K dblp 320 K 1 M youtube 1.1 M 3 M lj 4 M 35 M orkut 3.1 M 120 M friendster 66 M 1.8 B Datasets from SNAP collection [Leskovec] F 1 -measure is the harmonic mean of " # correct guesses precision = # total guesses and recall = # answers you get # answers there are

36

37 data F 1 precision set size comm HK PR HK PR HK PR size amazon dblp youtube lj orkut friendster

38 data F 1 precision set size comm HK PR HK PR HK PR size amazon dblp youtube lj orkut friendster PR achieves high recall by guessing a huge set HK identifies a tighter cluster, so attains better precision

39 Runtime &" Conductance" " HK is comparable in runtime and conductance." " " As graphs scale," the diffusions performance becomes even more similar. Runtime (s) log10(conductances) Runtime: hk vs. ppr hkgrow 50% 25% 75% pprgrow 50% 25% 75% log10( V + E ) 10 0 Conductances: hk vs. ppr hkgrow 50% 25% 75% pprgrow 50% 25% 75% log10( V + E )

40 Code, references, future work Code available at!! Ongoing work! - generalizing to other diffusions - simultaneously compute multiple diffusions Questions or suggestions? Kyle Kloster at kkloste-at-purdue-dot-edu!

Facebook Friends! and Matrix Functions

Facebook Friends! and Matrix Functions Facebook Friends! and Matrix Functions! Graduate Research Day Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Network Analysis Use linear algebra

More information

A Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks

A Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks A Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks Kyle Kloster and David F. Gleich Purdue University December 14, 2013 Supported by NSF CAREER 1149756-CCF Kyle

More information

Strong localization in seeded PageRank vectors

Strong localization in seeded PageRank vectors Strong localization in seeded PageRank vectors https://github.com/nassarhuda/pprlocal David F. Gleich" Huda Nassar Computer Science" Computer Science " supported by NSF CAREER CCF-1149756, " IIS-1422918,

More information

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Author: Jaewon Yang, Jure Leskovec 1 1 Venue: WSDM 2013 Presenter: Yupeng Gu 1 Stanford University 1 Background Community

More information

Seeded PageRank solution paths

Seeded PageRank solution paths Euro. Jnl of Applied Mathematics (2016), vol. 27, pp. 812 845. c Cambridge University Press 2016 doi:10.1017/s0956792516000280 812 Seeded PageRank solution paths D. F. GLEICH 1 and K. KLOSTER 2 1 Department

More information

Parallel Local Graph Clustering

Parallel Local Graph Clustering Parallel Local Graph Clustering Kimon Fountoulakis, joint work with J. Shun, X. Cheng, F. Roosta-Khorasani, M. Mahoney, D. Gleich University of California Berkeley and Purdue University Based on J. Shun,

More information

Strong Localization in Personalized PageRank Vectors

Strong Localization in Personalized PageRank Vectors Strong Localization in Personalized PageRank Vectors Huda Nassar 1, Kyle Kloster 2, and David F. Gleich 1 1 Purdue University, Computer Science Department 2 Purdue University, Mathematics Department {hnassar,kkloste,dgleich}@purdue.edu

More information

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure Stat260/CS294: Spectral Graph Methods Lecture 19-04/02/2015 Lecture: Local Spectral Methods (2 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Heat Kernel Based Community Detection

Heat Kernel Based Community Detection Heat Kernel Based Community Detection Kyle Kloster Purdue University West Lafayette, IN kkloste@purdueedu David F Gleich Purdue University West Lafayette, IN dgleich@purdueedu ABSTRACT The heat kernel

More information

Locally-biased analytics

Locally-biased analytics Locally-biased analytics You have BIG data and want to analyze a small part of it: Solution 1: Cut out small part and use traditional methods Challenge: cutting out may be difficult a priori Solution 2:

More information

Local Lanczos Spectral Approximation for Community Detection

Local Lanczos Spectral Approximation for Community Detection Local Lanczos Spectral Approximation for Community Detection Pan Shi 1, Kun He 1( ), David Bindel 2, John E. Hopcroft 2 1 Huazhong University of Science and Technology, Wuhan, China {panshi,brooklet60}@hust.edu.cn

More information

Four graph partitioning algorithms. Fan Chung University of California, San Diego

Four graph partitioning algorithms. Fan Chung University of California, San Diego Four graph partitioning algorithms Fan Chung University of California, San Diego History of graph partitioning NP-hard approximation algorithms Spectral method, Fiedler 73, Folklore Multicommunity flow,

More information

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley Jure Leskovec (@jure) Joint work with Jaewon Yang, Julian McAuley Given a network, find communities! Sets of nodes with common function, role or property 2 3 Q: How and why do communities form? A: Strength

More information

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods Stat260/CS294: Spectral Graph Methods Lecture 20-04/07/205 Lecture: Local Spectral Methods (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Cross-lingual and temporal Wikipedia analysis

Cross-lingual and temporal Wikipedia analysis MTA SZTAKI Data Mining and Search Group June 14, 2013 Supported by the EC FET Open project New tools and algorithms for directed network analysis (NADINE No 288956) Table of Contents 1 Link prediction

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Kristina Lerman USC Information Sciences Institute

Kristina Lerman USC Information Sciences Institute Rethinking Network Structure Kristina Lerman USC Information Sciences Institute Università della Svizzera Italiana, December 16, 2011 Measuring network structure Central nodes Community structure Strength

More information

Using Local Spectral Methods in Theory and in Practice

Using Local Spectral Methods in Theory and in Practice Using Local Spectral Methods in Theory and in Practice Michael W. Mahoney ICSI and Dept of Statistics, UC Berkeley ( For more info, see: http: // www. stat. berkeley. edu/ ~ mmahoney or Google on Michael

More information

Attentive Betweenness Centrality (ABC): Considering Options and Bandwidth when Measuring Criticality

Attentive Betweenness Centrality (ABC): Considering Options and Bandwidth when Measuring Criticality Attentive Betweenness Centrality (ABC): Considering Options and Bandwidth when Measuring Criticality Sibel Adalı, Xiaohui Lu, M. Magdon-Ismail September 5, 0 Who is the Most Critical? Would you use the

More information

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University. Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org #1: C4.5 Decision Tree - Classification (61 votes) #2: K-Means - Clustering

More information

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018 Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google

More information

Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient

More information

TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs

TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs Zhewei Wei School of Infromation, Renmin University of China zhewei@ruc.edu.cn Sibo Wang University of Queensland sibo.wang@uq.edu.au

More information

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com

More information

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information

More information

DS504/CS586: Big Data Analytics Graph Mining II

DS504/CS586: Big Data Analytics Graph Mining II Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6:00pm 8:50pm Mon. and Wed. Location: SL105 Spring 2016 Reading assignments We will increase the bar a little bit Please

More information

EFFICIENT ALGORITHMS FOR PERSONALIZED PAGERANK

EFFICIENT ALGORITHMS FOR PERSONALIZED PAGERANK EFFICIENT ALGORITHMS FOR PERSONALIZED PAGERANK A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE

More information

A Nearly-Sublinear Method for Approximating a Column of the Matrix Exponential for Matrices from Large, Sparse Networks

A Nearly-Sublinear Method for Approximating a Column of the Matrix Exponential for Matrices from Large, Sparse Networks A Nearly-Sublinear Method for Approximating a Column of the Matrix Exponential for Matrices from Large, Sparse Networks Kyle Kloster, and David F. Gleich 2, Purdue University, Mathematics Department 2

More information

PU Learning for Matrix Completion

PU Learning for Matrix Completion Cho-Jui Hsieh Dept of Computer Science UT Austin ICML 2015 Joint work with N. Natarajan and I. S. Dhillon Matrix Completion Example: movie recommendation Given a set Ω and the values M Ω, how to predict

More information

Greedy Maximization Framework for Graph-based Influence Functions

Greedy Maximization Framework for Graph-based Influence Functions Greedy Maximization Framework for Graph-based Influence Functions Edith Cohen Google Research Tel Aviv University HotWeb '16 1 Large Graphs Model relations/interactions (edges) between entities (nodes)

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University. CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive

More information

LOCAL CHEEGER INEQUALITIES AND SPARSE CUTS

LOCAL CHEEGER INEQUALITIES AND SPARSE CUTS LOCAL CHEEGER INEQUALITIES AND SPARSE CUTS CAELAN GARRETT 18.434 FINAL PROJECT CAELAN@MIT.EDU Abstract. When computing on massive graphs, it is not tractable to iterate over the full graph. Local graph

More information

NETWORKS (a.k.a. graphs) are important data structures

NETWORKS (a.k.a. graphs) are important data structures PREPRINT Mixed-Order Spectral Clustering for Networks Yan Ge, Haiping Lu, and Pan Peng arxiv:82.040v [cs.lg] 25 Dec 208 Abstract Clustering is fundamental for gaining insights from complex networks, and

More information

Cutting Graphs, Personal PageRank and Spilling Paint

Cutting Graphs, Personal PageRank and Spilling Paint Graphs and Networks Lecture 11 Cutting Graphs, Personal PageRank and Spilling Paint Daniel A. Spielman October 3, 2013 11.1 Disclaimer These notes are not necessarily an accurate representation of what

More information

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.

More information

0.1 Naive formulation of PageRank

0.1 Naive formulation of PageRank PageRank is a ranking system designed to find the best pages on the web. A webpage is considered good if it is endorsed (i.e. linked to) by other good webpages. The more webpages link to it, and the more

More information

Link Analysis. Stony Brook University CSE545, Fall 2016

Link Analysis. Stony Brook University CSE545, Fall 2016 Link Analysis Stony Brook University CSE545, Fall 2016 The Web, circa 1998 The Web, circa 1998 The Web, circa 1998 Match keywords, language (information retrieval) Explore directory The Web, circa 1998

More information

Regularized Laplacian Estimation and Fast Eigenvector Approximation

Regularized Laplacian Estimation and Fast Eigenvector Approximation Regularized Laplacian Estimation and Fast Eigenvector Approximation Patrick O. Perry Information, Operations, and Management Sciences NYU Stern School of Business New York, NY 1001 pperry@stern.nyu.edu

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search 6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Shang-Hua Teng Computer Science, Viterbi School of Engineering USC Massive Data and Massive Graphs 500 billions web

More information

The Ties that Bind Characterizing Classes by Attributes and Social Ties

The Ties that Bind Characterizing Classes by Attributes and Social Ties The Ties that Bind WWW April, 2017, Bryan Perozzi*, Leman Akoglu Stony Brook University *Now at Google. Introduction Outline Our problem: Characterizing Community Differences Proposed Method Experimental

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Edo Liberty Principal Scientist Amazon Web Services. Streaming Quantiles

Edo Liberty Principal Scientist Amazon Web Services. Streaming Quantiles Edo Liberty Principal Scientist Amazon Web Services Streaming Quantiles Streaming Quantiles Manku, Rajagopalan, Lindsay. Random sampling techniques for space efficient online computation of order statistics

More information

Estimating Clustering Coefficients and Size of Social Networks via Random Walk

Estimating Clustering Coefficients and Size of Social Networks via Random Walk Estimating Clustering Coefficients and Size of Social Networks via Random Walk Stephen J. Hardiman* Capital Fund Management France Liran Katzir Advanced Technology Labs Microsoft Research, Israel *Research

More information

Model Reduction for Edge-Weighted Personalized PageRank

Model Reduction for Edge-Weighted Personalized PageRank Model Reduction for Edge-Weighted Personalized PageRank D. Bindel 11 Apr 2016 D. Bindel Purdue 11 Apr 2016 1 / 33 The Computational Science & Engineering Picture Application Analysis Computation MEMS Smart

More information

Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users?

Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users? Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users? ABSTRACT Yu Yang Simon Fraser University Burnaby, Canada yya119@sfu.ca Jian Pei Simon Fraser University Burnaby,

More information

Statistical Problem. . We may have an underlying evolving system. (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t

Statistical Problem. . We may have an underlying evolving system. (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t Markov Chains. Statistical Problem. We may have an underlying evolving system (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t Consecutive speech feature vectors are related

More information

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient

More information

Dissertation Defense

Dissertation Defense Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation

More information

DS504/CS586: Big Data Analytics Graph Mining II

DS504/CS586: Big Data Analytics Graph Mining II Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6-8:50PM Thursday Location: AK233 Spring 2018 v Course Project I has been graded. Grading was based on v 1. Project report

More information

Fast Algorithms for Pseudoarboricity

Fast Algorithms for Pseudoarboricity Fast Algorithms for Pseudoarboricity Meeting on Algorithm Engineering and Experiments ALENEX 2016 Markus Blumenstock Institute of Computer Science, University of Mainz January 10th, 2016 Pseudotrees Orientations

More information

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada On Top-k Structural 1 Similarity Search Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China 2014/10/14 Pei

More information

Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian

Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian Radu Balan February 5, 2018 Datasets diversity: Social Networks: Set of individuals ( agents, actors ) interacting with each other (e.g.,

More information

An Efficient reconciliation algorithm for social networks

An Efficient reconciliation algorithm for social networks An Efficient reconciliation algorithm for social networks Silvio Lattanzi (Google Research NY) Joint work with: Nitish Korula (Google Research NY) ICERM Stochastic Graph Models Outline Graph reconciliation

More information

A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally

A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally Journal of Machine Learning Research 13 (2012) 2339-2365 Submitted 11/12; Published 9/12 A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally

More information

Link Mining PageRank. From Stanford C246

Link Mining PageRank. From Stanford C246 Link Mining PageRank From Stanford C246 Broad Question: How to organize the Web? First try: Human curated Web dictionaries Yahoo, DMOZ LookSmart Second try: Web Search Information Retrieval investigates

More information

Edge-Weighted Personalized PageRank: Breaking a Decade-Old Performance Barrier

Edge-Weighted Personalized PageRank: Breaking a Decade-Old Performance Barrier Edge-Weighted Personalized PageRank: Breaking a Decade-Old Performance Barrier W. Xie D. Bindel A. Demers J. Gehrke 12 Aug 2015 W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 1 / 1 PageRank

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Spectral Analysis of Random Walks

Spectral Analysis of Random Walks Graphs and Networks Lecture 9 Spectral Analysis of Random Walks Daniel A. Spielman September 26, 2013 9.1 Disclaimer These notes are not necessarily an accurate representation of what happened in class.

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE Cost and Preference in Recommender Systems Junhua Chen, Big Data Research Center, UESTC Email:junmshao@uestc.edu.cn http://staff.uestc.edu.cn/shaojunming Abstract In many recommender systems (RS), user

More information

Algorithms for Picture Analysis. Lecture 07: Metrics. Axioms of a Metric

Algorithms for Picture Analysis. Lecture 07: Metrics. Axioms of a Metric Axioms of a Metric Picture analysis always assumes that pictures are defined in coordinates, and we apply the Euclidean metric as the golden standard for distance (or derived, such as area) measurements.

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Machine Learning CPSC 340. Tutorial 12

Machine Learning CPSC 340. Tutorial 12 Machine Learning CPSC 340 Tutorial 12 Random Walk on Graph Page Rank Algorithm Label Propagation on Graph Assume a strongly connected graph G = (V, A) Label Propagation on Graph Assume a strongly connected

More information

A Tale of Two Eigenvalue Problems

A Tale of Two Eigenvalue Problems A Tale of Two Eigenvalue Problems Music of the Microspheres + Hearing the Shape of a Graph David Bindel Department of Computer Science Cornell University CAM Visit Talk, 28 Feb 214 Brown bag 1 / 22 The

More information

Personal PageRank and Spilling Paint

Personal PageRank and Spilling Paint Graphs and Networks Lecture 11 Personal PageRank and Spilling Paint Daniel A. Spielman October 7, 2010 11.1 Overview These lecture notes are not complete. The paint spilling metaphor is due to Berkhin

More information

Data and Algorithms of the Web

Data and Algorithms of the Web Data and Algorithms of the Web Link Analysis Algorithms Page Rank some slides from: Anand Rajaraman, Jeffrey D. Ullman InfoLab (Stanford University) Link Analysis Algorithms Page Rank Hubs and Authorities

More information

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a

More information

LAPLACIAN MATRIX AND APPLICATIONS

LAPLACIAN MATRIX AND APPLICATIONS LAPLACIAN MATRIX AND APPLICATIONS Alice Nanyanzi Supervisors: Dr. Franck Kalala Mutombo & Dr. Simukai Utete alicenanyanzi@aims.ac.za August 24, 2017 1 Complex systems & Complex Networks 2 Networks Overview

More information

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2. 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions

More information

Efficient Subgraph Matching by Postponing Cartesian Products. Matthias Barkowsky

Efficient Subgraph Matching by Postponing Cartesian Products. Matthias Barkowsky Efficient Subgraph Matching by Postponing Cartesian Products Matthias Barkowsky 5 Subgraph Isomorphism - Approaches search problem: find all embeddings of a query graph in a data graph NP-hard, but can

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

Understanding graphs through spectral densities

Understanding graphs through spectral densities Understanding graphs through spectral densities David Bindel Department of Computer Science Cornell University SCAN seminar, 29 Feb 216 SCAN 1 / 1 Can One Hear the Shape of a Drum? 2 u = λu on Ω, u = on

More information

Algorithms, Graph Theory, and the Solu7on of Laplacian Linear Equa7ons. Daniel A. Spielman Yale University

Algorithms, Graph Theory, and the Solu7on of Laplacian Linear Equa7ons. Daniel A. Spielman Yale University Algorithms, Graph Theory, and the Solu7on of Laplacian Linear Equa7ons Daniel A. Spielman Yale University Rutgers, Dec 6, 2011 Outline Linear Systems in Laplacian Matrices What? Why? Classic ways to solve

More information

The Push Algorithm for Spectral Ranking

The Push Algorithm for Spectral Ranking The Push Algorithm for Spectral Ranking Paolo Boldi Sebastiano Vigna March 8, 204 Abstract The push algorithm was proposed first by Jeh and Widom [6] in the context of personalized PageRank computations

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve

More information

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing

More information

6.842 Randomness and Computation March 3, Lecture 8

6.842 Randomness and Computation March 3, Lecture 8 6.84 Randomness and Computation March 3, 04 Lecture 8 Lecturer: Ronitt Rubinfeld Scribe: Daniel Grier Useful Linear Algebra Let v = (v, v,..., v n ) be a non-zero n-dimensional row vector and P an n n

More information

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics

More information

Extractors and the Leftover Hash Lemma

Extractors and the Leftover Hash Lemma 6.889 New Developments in Cryptography March 8, 2011 Extractors and the Leftover Hash Lemma Instructors: Shafi Goldwasser, Yael Kalai, Leo Reyzin, Boaz Barak, and Salil Vadhan Lecturer: Leo Reyzin Scribe:

More information

Estimating network degree distributions from sampled networks: An inverse problem

Estimating network degree distributions from sampled networks: An inverse problem Estimating network degree distributions from sampled networks: An inverse problem Eric D. Kolaczyk Dept of Mathematics and Statistics, Boston University kolaczyk@bu.edu Introduction: Networks and Degree

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

7.1 Sampling Error The Need for Sampling Distributions

7.1 Sampling Error The Need for Sampling Distributions 7.1 Sampling Error The Need for Sampling Distributions Tom Lewis Fall Term 2009 Tom Lewis () 7.1 Sampling Error The Need for Sampling Distributions Fall Term 2009 1 / 5 Outline 1 Tom Lewis () 7.1 Sampling

More information

The Limiting Shape of Internal DLA with Multiple Sources

The Limiting Shape of Internal DLA with Multiple Sources The Limiting Shape of Internal DLA with Multiple Sources January 30, 2008 (Mostly) Joint work with Yuval Peres Finite set of points x 1,...,x k Z d. Start with m particles at each site x i. Each particle

More information

Network Essence: PageRank Completion and Centrality-Conforming Markov Chains

Network Essence: PageRank Completion and Centrality-Conforming Markov Chains Network Essence: PageRank Completion and Centrality-Conforming Markov Chains Shang-Hua Teng Computer Science and Mathematics USC November 29, 2016 Abstract Jiří Matoušek (1963-2015) had many breakthrough

More information

Protein Complex Identification by Supervised Graph Clustering

Protein Complex Identification by Supervised Graph Clustering Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie

More information

Capacity Releasing Diffusion for Speed and Locality

Capacity Releasing Diffusion for Speed and Locality 000 00 002 003 004 005 006 007 008 009 00 0 02 03 04 05 06 07 08 09 020 02 022 023 024 025 026 027 028 029 030 03 032 033 034 035 036 037 038 039 040 04 042 043 044 045 046 047 048 049 050 05 052 053 054

More information

Performance evaluation of binary classifiers

Performance evaluation of binary classifiers Performance evaluation of binary classifiers Kevin P. Murphy Last updated October 10, 2007 1 ROC curves We frequently design systems to detect events of interest, such as diseases in patients, faces in

More information

Local Flow Partitioning for Faster Edge Connectivity

Local Flow Partitioning for Faster Edge Connectivity Local Flow Partitioning for Faster Edge Connectivity Monika Henzinger, Satish Rao, Di Wang University of Vienna, UC Berkeley, UC Berkeley Edge Connectivity Edge-connectivity λ: least number of edges whose

More information

Chapter 7 Network Flow Problems, I

Chapter 7 Network Flow Problems, I Chapter 7 Network Flow Problems, I Network flow problems are the most frequently solved linear programming problems. They include as special cases, the assignment, transportation, maximum flow, and shortest

More information

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank David Glickenstein November 3, 4 Representing graphs as matrices It will sometimes be useful to represent graphs

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu May 2, 2017 Announcements Homework 2 due later today Due May 3 rd (11:59pm) Course project

More information

Deterministic Decentralized Search in Random Graphs

Deterministic Decentralized Search in Random Graphs Deterministic Decentralized Search in Random Graphs Esteban Arcaute 1,, Ning Chen 2,, Ravi Kumar 3, David Liben-Nowell 4,, Mohammad Mahdian 3, Hamid Nazerzadeh 1,, and Ying Xu 1, 1 Stanford University.

More information

Diffusion and random walks on graphs

Diffusion and random walks on graphs Diffusion and random walks on graphs Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural

More information

First, parse the title...

First, parse the title... First, parse the title... Eigenvector localization: Eigenvectors are usually global entities But they can be localized in extremely sparse/noisy graphs/matrices Implicit regularization: Usually exactly

More information