Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Size: px
Start display at page:

Download "Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University"

Transcription

1 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

2 We often think of networks being organized into modules, cluster, communities: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2

3 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 3

4 Find micro-markets by partitioning the query-to-advertiser graph: query advertiser [Andersen, Lang: Communities from seed sets, 2006] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 4

5 Clusters in Movies-to-Actors graph: [Andersen, Lang: Communities from seed sets, 2006] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 5

6 Discovering social circles, circles of trust: [McAuley, Leskovec: Discovering social circles in ego networks, 2012] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 6

7 How to find communities? We will work with undirected (unweighted) networks J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 7

8 Edge betweenness:number of shortest paths passing over the edge Intuition: b=16 b=7.5 Edge strengths (call volume) in a real network Edge betweenness in a real network J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 8

9 [Girvan-Newman 02] Divisive hierarchical clustering based on the notion of edge betweenness: Number of shortest paths passing through the edge Girvan-Newman Algorithm: Undirected unweighted networks Repeat until no edges are left: Calculate betweenness of edges Remove edges with highest betweenness Connected components are communities Gives a hierarchical decomposition of the network J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 9

10 Need to re-compute betweenness at every step J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 10

11 Step 1: Step 2: Step 3: Hierarchical network decomposition: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 11

12 Communities in physics collaborations J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 12

13 Zachary s Karate club: Hierarchical decomposition J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 13

14 1. How to compute betweenness? 2. How to select the number of clusters? J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 14

15 Want to compute betweennessof paths starting at node Breath first search starting from : J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 15

16 Count the number of shortest paths from to all other nodes of the network: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 16

17 Compute betweennessby working up the tree:if there are multiple paths count them fractionally The algorithm: Add edge flows: -- node flow = 1+ child edges -- split the flow up based on the parent value Repeat the BFS procedure for each starting node 1 path to K. Split evenly paths to J Split 1:2 1+1 paths to H Split evenly J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 17

18 Compute betweennessby working up the tree:if there are multiple paths count them fractionally The algorithm: Add edge flows: -- node flow = 1+ child edges -- split the flow up based on the parent value Repeat the BFS procedure for each starting node 1 path to K. Split evenly paths to J Split 1:2 1+1 paths to H Split evenly J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 18

19 1. How to compute betweenness? 2. How to select the number of clusters? J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 19

20 Communities:sets of tightly connected nodes Define:Modularity A measure of how well a network is partitioned into communities Given a partitioning of the network into groups : Q s S [ (# edges within group s) (expected # edges within group s) ] Need a null model! J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 20

21 Given real on nodes and edges, construct rewired network Same degree distribution but random connections Consider as a multigraph The expected number of edges between nodes and of degrees and equals to: The expected number of edges in (multigraph) G : J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 21 i j Note: 2

22 Modularity of partitioning S of graph G: Q s S [ (# edges within group s) (expected # edges within group s) ], Normalizing cost.: -1<Q<1 A ij = 1 if i j, 0 else Modularity values take range [ 1,1] It is positive if the number of edges within groups exceeds the expected number <Q means significant community structure J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 22

23 Modularity is useful for selecting the number of clusters: Q Next time: Why not optimize Modularity directly? J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 23

24

25 Undirected graph, : Bi-partitioning task: Divide vertices into two disjoint groups, A B Questions: How can we define a good partition of? How can we efficiently identify such a partition? J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 25

26 What makes a good partition? Maximize the number of within-group connections Minimize the number of between-group connections A B J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 26

27 Express partitioning objectives as a function of the edge cut of the partition Cut:Set of edges with only one vertex in a group: A B cut(a,b) = 2 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 27

28 Criterion: Minimum-cut Minimize weight of connections between groups arg min A,B cut(a,b) Degenerate case: Optimal cut Minimum cut Problem: Only considers external cluster connections Does not consider internal cluster connectivity J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 28

29 [Shi-Malik] Criterion: Normalized-cut [Shi-Malik, 97] Connectivity between groups relative to the density of each group :total weight of the edges with at least one endpoint in : Why use this criterion? Produces more balanced partitions How do we efficiently find a good partition? Problem: Computing optimal cut is NP-hard J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 29

30 A: adjacency matrix of undirected G A ij =1 if, is an edge, else 0 xis a vector in R n with components,, Think of it as a label/value of each node of What is the meaning of A x? Entry y i is a sum of labels x j of neighbors of i J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 30

31 j th coordinate of A x : Sum of the x-values of neighborsof j Make this a new value at node j Spectral Graph Theory: Analyze the spectrum of matrix representing Spectrum:Eigenvectors of a graph, ordered by the magnitude (strength) of their corresponding eigenvalues : J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 31

32 Suppose all nodes in have degree and is connected What are some eigenvalues/vectors of? What is λ? What x? Let s try:,,, Then:,,,. So: We found eigenpairof :,,,, Remember the meaning of : J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 32

33 Details! Gisd-regular connected, Ais its adjacency matrix Claim: dis largest eigenvalue ofa, dhas multiplicity of 1 (there is only 1 eigenvector associated with eigenvalue d) Proof: Why no eigenvalue? To obtain dwe needed for every, This means 1,1,,1 for some const. Define: = nodes with maximum possible value of Then consider some vector which is not a multiple of vector,,. So not all nodes (with labels ) are in Consider some node and a neighbor then node gets a value strictly less than So is not eigenvector! And so is the largest eigenvalue! J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 33

34 What if is not connected? has 2components, each -regular What are some eigenvectors? Put all s on and s on or vice versa,,,,, then,,,,,,,,,, then,,,,, And so in both cases the corresponding A bit of intuition: A A B B A J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 34 B A B 2 nd largest eigval. now has value very close to

35 More intuition: A B A B 2 nd largest eigval. now has value very close to If the graph is connected (right example) then we already know that, is an eigenvector Since eigenvectors are orthogonal then the components of sum to 0. Why? Because So we can look at the eigenvector of the 2 nd largest eigenvalue and declare nodes with positive label in A and negative label in B. But there is still lots to sort out. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 35

36 Adjacency matrix (A): n nmatrix A=[a ij ], a ij =1if edge between node iand j Important properties: Symmetric matrix Eigenvectors are real and orthogonal J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 36

37 Degree matrix (D): n n diagonal matrix D=[d ii ], d ii =degree of node i J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 37

38 Laplacian matrix (L): n n symmetric matrix What is trivial eigenpair? 4 5,, then and so Important properties: Eigenvalues are non-negative real numbers Eigenvectors are real and orthogonal J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 38

39 Details! (a)all eigenvalues are 0 (b) 0 for every (c) That is, is positive semi-definite Proof: (c) (b): 0 As it is just the square of length of (b) (a): Let be an eigenvalue of. Then by (b) 0so (a) (c): is also easy! Do it yourself. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 39

40 Fact: For symmetric matrix M: What is the meaning of min x T L x on G? x L x,,, 2 λ = min 2 x x T x M x T x, 2, Node has degree. So, value needs to be summed up times. But each edge, has two endpoints so we need J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 40

41 λ = min 2 x T x M x T x x Details! Write in axes of eigenvecotrs,,, of. So, Then we get: So, what is? To minimize this over all unit vectors x orthogonal to: w = min over choices of, so that: 1 (unit length) 0(orthogonal to ) To minimize this, set and so if 1 otherwise J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 41

42 What else do we know about x? is unit vector: is orthogonal to 1 st eigenvector,, thus: Remember: λ 2 = min All labelings of nodes so that 0 ( i, j) E ( x i i 2 i We want to assign values to nodes i such that few edges cross 0. (we want x i and x j to subtract each other) x x j ) 2 0 Balance to minimize x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 42

43 Back to finding the optimal cut Express partition (A,B) as a vector We can minimize the cut of the partition by finding a non-trivial vector x that minimizes: Can t solve exactly. Let s relax and allow it to take any real value J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 43

44 0 x :The minimum value of is given by the 2 nd smallest eigenvalue λ 2 of the Laplacian matrix L :The optimal solution for y is given by the corresponding eigenvector, referred as the Fiedler vector J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 44

45 Details! Suppose there is a partition of Ginto Aand B # where, s.t. then 2 This is the approximation guarantee of the spectral clustering. It says the cut spectral finds is at most 2 away from the optimal one of score. Proof: Let: a= A, b= B and e=# edges from Ato B Enough to choose some based on AandBsuch that: is only smaller 2 (while also 0 ) J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 45

46 Details! Proof (continued): 1)Let s set: Let s quickly verify that 0: 2) Then:, e number of edges between A and B J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 46 Which proves that the cost achieved by spectral is better than twice the OPT cost

47 Details! Putting it all together: where is the maximum node degree in the graph Note we only provide the 1 st part: We did not prove Overall this always certifies that always gives a useful bound J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 47

48 How to define a good partition of a graph? Minimize a given graph cut criterion How to efficiently identify such a partition? Approximate using information provided by the eigenvalues and eigenvectors of a graph Spectral Clustering J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 48

49 Three basic stages: 1) Pre-processing Construct a matrix representation of the graph 2) Decomposition Compute eigenvalues and eigenvectors of the matrix Map each point to a lower-dimensional representation based on one or more eigenvectors 3) Grouping Assign points to two or more clusters, based on the new representation J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 49

50 1)Pre-processing: Build Laplacian matrix Lof the graph 2) Decomposition: Find eigenvalues λ and eigenvectors x of the matrix L Map vertices to corresponding components of λ λ= X = How do we now find the clusters? J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 50

51 3)Grouping: Sort components of reduced 1-dimensional vector Identify clusters by splitting the sorted vector in two How to choose a splitting point? Naïve approaches: Split at 0or median value More expensive approaches: Attempt to minimize normalized cut in 1-dimension (sweep over ordering of nodes induced by the eigenvector) Split at 0: Cluster A: Positive points Cluster B: Negative points J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 51 A B

52 Value of x 2 Rank in x 2 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 52

53 Components of x 2 Value of x 2 Rank in x 2 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 53

54 Components of x 1 Components of x 3 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 54

55 How do we partition a graph into kclusters? Two basic approaches: Recursive bi-partitioning[hagen et al., 92] Recursively apply bi-partitioning algorithm in a hierarchical divisive manner Disadvantages: Inefficient, unstable Cluster multiple eigenvectors[shi-malik, 00] Build a reduced space from multiple eigenvectors Commonly used in recent papers A preferable approach J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 55

56 Approximates the optimal cut [Shi-Malik, 00] Can be used to approximate optimal k-way normalized cut Emphasizes cohesive clusters Increases the unevenness in the distribution of the data Associations between similar points are amplified, associations between dissimilar points are attenuated The data begins to approximate a clustering Well-separated space Transforms data to a new embedded space, consisting of k orthogonal basis vectors Multiple eigenvectors prevent instability due to information loss J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 56

57

58 [Kumar et al. 99] Searching for small communities in the Web graph What is the signature of a community / discussion in a Web graph? Use this to define topics : What the same people on the left talk about on the right Remember HITS! Dense 2-layer graph Intuition: Many people all talking about the same things J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 58

59 A more well-defined problem: Enumerate complete bipartite subgraphsk s,t Where K s,t : snodes on the left where each links to the same tother nodes on the right X K 3,4 Y X = s = 3 Y = t = 4 Fully connected J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 59

60 [Agrawal-Srikant 99] Market basket analysis. Setting: Market:Universe Uof nitems Baskets:msubsets of U: S 1, S 2,, S m U (S i is a set of items one person bought) Support: Frequency threshold f Goal: Find all subsets T s.t. T S i of at least f sets S i (items in Twere bought together at least f times) What s the connection between the itemsets and complete bipartite graphs? J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 60

61 [Kumar et al. 99] Frequent itemsets = complete bipartite graphs! How? View each node ias a set S i of nodes ipoints to K s,t = a set Yof size t that occurs in ssets S i Looking for K s,t set of frequency threshold to s and look at layer t all frequent sets of size t i X j i k a b c d S i ={a,b,c,d} a b c d Y s minimum support ( X =s) t itemset size ( Y =t) J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 61

62 [Kumar et al. 99] View each node ias a set S i of nodes ipoints to i a b c d S i ={a,b,c,d} Find frequent itemsets: s minimum support t itemset size We found K s,t! K s,t = a set Yof size t that occurs in ssets S i x Say we find a frequent itemset Y={a,b,c} of supp s So, there are snodes that link to all of {a,b,c}: a b c X J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 62 y x y z a b c a b c z Y a b c

63 a c d b e f Itemsets: a = {b,c,d} b = {d} c = {b,d,e,f} d = {e,f} e = {b,d} f = {} Support threshold s=2 {b,d}: support 3 {e,f}: support 2 And we just found 2 bipartite subgraphs: a c e d b J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 63 c e f d

64 Example of a community from a web graph Nodes on the right Nodes on the left [Kumar, Raghavan, Rajagopalan, Tomkins: Trawling the Web for emerging cyber-communities 1999] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 64

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: Find coalitions in signed networks Incentives: European

More information

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University. Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org #1: C4.5 Decision Tree - Classification (61 votes) #2: K-Means - Clustering

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

4/26/2017. More algorithms for streams: Each element of data stream is a tuple Given a list of keys S Determine which tuples of stream are in S

4/26/2017. More algorithms for streams: Each element of data stream is a tuple Given a list of keys S Determine which tuples of stream are in S Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Lecture 13: Spectral Graph Theory

Lecture 13: Spectral Graph Theory CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Non-overlapping vs. overlapping communities 11/10/2010 Jure Leskovec, Stanford CS224W: Social

More information

Graph Partitioning Algorithms and Laplacian Eigenvalues

Graph Partitioning Algorithms and Laplacian Eigenvalues Graph Partitioning Algorithms and Laplacian Eigenvalues Luca Trevisan Stanford Based on work with Tsz Chiu Kwok, Lap Chi Lau, James Lee, Yin Tat Lee, and Shayan Oveis Gharan spectral graph theory Use linear

More information

Markov Chains and Spectral Clustering

Markov Chains and Spectral Clustering Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu

More information

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Lecture 12 : Graph Laplacians and Cheeger s Inequality CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/26/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 More algorithms

More information

CS341 info session is on Thu 3/1 5pm in Gates415. CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS341 info session is on Thu 3/1 5pm in Gates415. CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS341 info session is on Thu 3/1 5pm in Gates415 CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/28/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

More information

Slides credits: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University

Slides credits: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Spectral Clustering. Guokun Lai 2016/10

Spectral Clustering. Guokun Lai 2016/10 Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,

More information

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory Tim Roughgarden & Gregory Valiant May 2, 2016 Spectral graph theory is the powerful and beautiful theory that arises from

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition

More information

Networks and Their Spectra

Networks and Their Spectra Networks and Their Spectra Victor Amelkin University of California, Santa Barbara Department of Computer Science victor@cs.ucsb.edu December 4, 2017 1 / 18 Introduction Networks (= graphs) are everywhere.

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms : Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer

More information

Algebraic Representation of Networks

Algebraic Representation of Networks Algebraic Representation of Networks 0 1 2 1 1 0 0 1 2 0 0 1 1 1 1 1 Hiroki Sayama sayama@binghamton.edu Describing networks with matrices (1) Adjacency matrix A matrix with rows and columns labeled by

More information

1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u)

1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u) CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh (rezab@stanford.edu) Final Review Session 03/20/17 1. Let G = (V, E) be an unweighted, undirected graph. Let λ 1 be the maximum eigenvalue

More information

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016 Lecture 1: Introduction and Review We begin with a short introduction to the course, and logistics. We then survey some basics about approximation algorithms and probability. We also introduce some of

More information

Graph Clustering Algorithms

Graph Clustering Algorithms PhD Course on Graph Mining Algorithms, Università di Pisa February, 2018 Clustering: Intuition to Formalization Task Partition a graph into natural groups so that the nodes in the same cluster are more

More information

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

CS246 Final Exam. March 16, :30AM - 11:30AM

CS246 Final Exam. March 16, :30AM - 11:30AM CS246 Final Exam March 16, 2016 8:30AM - 11:30AM Name : SUID : I acknowledge and accept the Stanford Honor Code. I have neither given nor received unpermitted help on this examination. (signed) Directions

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Intro sessions to SNAP C++ and SNAP.PY: SNAP.PY: Friday 9/27, 4:5 5:30pm in Gates B03 SNAP

More information

Kristina Lerman USC Information Sciences Institute

Kristina Lerman USC Information Sciences Institute Rethinking Network Structure Kristina Lerman USC Information Sciences Institute Università della Svizzera Italiana, December 16, 2011 Measuring network structure Central nodes Community structure Strength

More information

Data Mining Recitation Notes Week 3

Data Mining Recitation Notes Week 3 Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents

More information

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks Shen HW, Cheng XQ, Wang YZ et al. A dimensionality reduction framework for detection of multiscale structure in heterogeneous networks. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(2): 341 357 Mar. 2012.

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Problem Set 4. General Instructions

Problem Set 4. General Instructions CS224W: Analysis of Networks Fall 2017 Problem Set 4 General Instructions Due 11:59pm PDT November 30, 2017 These questions require thought, but do not require long answers. Please be as concise as possible.

More information

Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds

Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds Complex Networks CSYS/MATH 303, Spring, 2011 Prof. Peter Dodds Department of Mathematics & Statistics Center for Complex Systems Vermont Advanced Computing Center University of Vermont Licensed under the

More information

Bayesian Networks: Independencies and Inference

Bayesian Networks: Independencies and Inference Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would be delighted if you found this source material useful

More information

CS246 Final Exam, Winter 2011

CS246 Final Exam, Winter 2011 CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including

More information

COMPSCI 514: Algorithms for Data Science

COMPSCI 514: Algorithms for Data Science COMPSCI 514: Algorithms for Data Science Arya Mazumdar University of Massachusetts at Amherst Fall 2018 Lecture 8 Spectral Clustering Spectral clustering Curse of dimensionality Dimensionality Reduction

More information

Communities, Spectral Clustering, and Random Walks

Communities, Spectral Clustering, and Random Walks Communities, Spectral Clustering, and Random Walks David Bindel Department of Computer Science Cornell University 26 Sep 2011 20 21 19 16 22 28 17 18 29 26 27 30 23 1 25 5 8 24 2 4 14 3 9 13 15 11 10 12

More information

LAPLACIAN MATRIX AND APPLICATIONS

LAPLACIAN MATRIX AND APPLICATIONS LAPLACIAN MATRIX AND APPLICATIONS Alice Nanyanzi Supervisors: Dr. Franck Kalala Mutombo & Dr. Simukai Utete alicenanyanzi@aims.ac.za August 24, 2017 1 Complex systems & Complex Networks 2 Networks Overview

More information

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.

More information

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min Spectral Graph Theory Lecture 2 The Laplacian Daniel A. Spielman September 4, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class. The notes written before

More information

An Algorithmist s Toolkit September 10, Lecture 1

An Algorithmist s Toolkit September 10, Lecture 1 18.409 An Algorithmist s Toolkit September 10, 2009 Lecture 1 Lecturer: Jonathan Kelner Scribe: Jesse Geneson (2009) 1 Overview The class s goals, requirements, and policies were introduced, and topics

More information

High Dimensional Search Min- Hashing Locality Sensi6ve Hashing

High Dimensional Search Min- Hashing Locality Sensi6ve Hashing High Dimensional Search Min- Hashing Locality Sensi6ve Hashing Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata September 8 and 11, 2014 High Support Rules vs Correla6on of

More information

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee wwlee1@uiuc.edu September 28, 2004 Motivation IR on newsgroups is challenging due to lack of

More information

Overlapping Communities

Overlapping Communities Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH

More information

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

More information

Lecture 18: More NP-Complete Problems

Lecture 18: More NP-Complete Problems 6.045 Lecture 18: More NP-Complete Problems 1 The Clique Problem a d f c b e g Given a graph G and positive k, does G contain a complete subgraph on k nodes? CLIQUE = { (G,k) G is an undirected graph with

More information

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline

More information

Introduction to Spectral Graph Theory and Graph Clustering

Introduction to Spectral Graph Theory and Graph Clustering Introduction to Spectral Graph Theory and Graph Clustering Chengming Jiang ECS 231 Spring 2016 University of California, Davis 1 / 40 Motivation Image partitioning in computer vision 2 / 40 Motivation

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University. CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive

More information

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit: Frequent Itemsets and Association Rule Mining Vinay Setty vinay.j.setty@uis.no Slides credit: http://www.mmds.org/ Association Rule Discovery Supermarket shelf management Market-basket model: Goal: Identify

More information

CSCE 750 Final Exam Answer Key Wednesday December 7, 2005

CSCE 750 Final Exam Answer Key Wednesday December 7, 2005 CSCE 750 Final Exam Answer Key Wednesday December 7, 2005 Do all problems. Put your answers on blank paper or in a test booklet. There are 00 points total in the exam. You have 80 minutes. Please note

More information

CS60021: Scalable Data Mining. Dimensionality Reduction

CS60021: Scalable Data Mining. Dimensionality Reduction J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a

More information

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will

More information

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory ORIE 4741: Learning with Big Messy Data Spectral Graph Theory Mika Sumida Operations Research and Information Engineering Cornell September 15, 2017 1 / 32 Outline Graph Theory Spectral Graph Theory Laplacian

More information

Jeffrey D. Ullman Stanford University

Jeffrey D. Ullman Stanford University Jeffrey D. Ullman Stanford University 2 Often, our data can be represented by an m-by-n matrix. And this matrix can be closely approximated by the product of two matrices that share a small common dimension

More information

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Lecture 13 Spectral Graph Algorithms

Lecture 13 Spectral Graph Algorithms COMS 995-3: Advanced Algorithms March 6, 7 Lecture 3 Spectral Graph Algorithms Instructor: Alex Andoni Scribe: Srikar Varadaraj Introduction Today s topics: Finish proof from last lecture Example of random

More information

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley Jure Leskovec (@jure) Joint work with Jaewon Yang, Julian McAuley Given a network, find communities! Sets of nodes with common function, role or property 2 3 Q: How and why do communities form? A: Strength

More information

Blog Community Discovery and Evolution

Blog Community Discovery and Evolution Blog Community Discovery and Evolution Mutual Awareness, Interactions and Community Stories Yu-Ru Lin, Hari Sundaram, Yun Chi, Junichi Tatemura and Belle Tseng What do people feel about Hurricane Katrina?

More information

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis Hansi Jiang Carl Meyer North Carolina State University October 27, 2015 1 /

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics Spectral Graph Theory and You: and Centrality Metrics Jonathan Gootenberg March 11, 2013 1 / 19 Outline of Topics 1 Motivation Basics of Spectral Graph Theory Understanding the characteristic polynomial

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

MATH 567: Mathematical Techniques in Data Science Clustering II

MATH 567: Mathematical Techniques in Data Science Clustering II This lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 567: Mathematical Techniques in Data Science Clustering II Dominique Guillot Departments

More information

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 References 1 L. Freeman, Centrality in Social Networks: Conceptual Clarification, Social Networks, Vol. 1, 1978/1979, pp. 215 239. 2 S. Wasserman

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Spectral Clustering on Handwritten Digits Database

Spectral Clustering on Handwritten Digits Database University of Maryland-College Park Advance Scientific Computing I,II Spectral Clustering on Handwritten Digits Database Author: Danielle Middlebrooks Dmiddle1@math.umd.edu Second year AMSC Student Advisor:

More information

Topics in Approximation Algorithms Solution for Homework 3

Topics in Approximation Algorithms Solution for Homework 3 Topics in Approximation Algorithms Solution for Homework 3 Problem 1 We show that any solution {U t } can be modified to satisfy U τ L τ as follows. Suppose U τ L τ, so there is a vertex v U τ but v L

More information

Link Mining PageRank. From Stanford C246

Link Mining PageRank. From Stanford C246 Link Mining PageRank From Stanford C246 Broad Question: How to organize the Web? First try: Human curated Web dictionaries Yahoo, DMOZ LookSmart Second try: Web Search Information Retrieval investigates

More information

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity CS322 Project Writeup Semih Salihoglu Stanford University 353 Serra Street Stanford, CA semih@stanford.edu

More information

Spectra of Adjacency and Laplacian Matrices

Spectra of Adjacency and Laplacian Matrices Spectra of Adjacency and Laplacian Matrices Definition: University of Alicante (Spain) Matrix Computing (subject 3168 Degree in Maths) 30 hours (theory)) + 15 hours (practical assignment) Contents 1. Spectra

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Graph Theory. Thomas Bloom. February 6, 2015

Graph Theory. Thomas Bloom. February 6, 2015 Graph Theory Thomas Bloom February 6, 2015 1 Lecture 1 Introduction A graph (for the purposes of these lectures) is a finite set of vertices, some of which are connected by a single edge. Most importantly,

More information

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure Stat260/CS294: Spectral Graph Methods Lecture 19-04/02/2015 Lecture: Local Spectral Methods (2 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions Summary of the previous lecture Recall that we mentioned the following topics: P: is the set of decision problems (or languages) that are solvable in polynomial time. NP: is the set of decision problems

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #9: Link Analysis Seoul National University 1 In This Lecture Motivation for link analysis Pagerank: an important graph ranking algorithm Flow and random walk formulation

More information

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis CS-621 Theory Gems October 18, 2012 Lecture 10 Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis 1 Introduction In this lecture, we will see how one can use random walks to

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

Spectral Graph Theory Tools. Analysis of Complex Networks

Spectral Graph Theory Tools. Analysis of Complex Networks Spectral Graph Theory Tools for the Department of Mathematics and Computer Science Emory University Atlanta, GA 30322, USA Acknowledgments Christine Klymko (Emory) Ernesto Estrada (Strathclyde, UK) Support:

More information

ON THE QUALITY OF SPECTRAL SEPARATORS

ON THE QUALITY OF SPECTRAL SEPARATORS ON THE QUALITY OF SPECTRAL SEPARATORS STEPHEN GUATTERY AND GARY L. MILLER Abstract. Computing graph separators is an important step in many graph algorithms. A popular technique for finding separators

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart

More information

A Statistical Look at Spectral Graph Analysis. Deep Mukhopadhyay

A Statistical Look at Spectral Graph Analysis. Deep Mukhopadhyay A Statistical Look at Spectral Graph Analysis Deep Mukhopadhyay Department of Statistics, Temple University Office: Speakman 335 deep@temple.edu http://sites.temple.edu/deepstat/ Graph Signal Processing

More information

Chapter 11. Matrix Algorithms and Graph Partitioning. M. E. J. Newman. June 10, M. E. J. Newman Chapter 11 June 10, / 43

Chapter 11. Matrix Algorithms and Graph Partitioning. M. E. J. Newman. June 10, M. E. J. Newman Chapter 11 June 10, / 43 Chapter 11 Matrix Algorithms and Graph Partitioning M. E. J. Newman June 10, 2016 M. E. J. Newman Chapter 11 June 10, 2016 1 / 43 Table of Contents 1 Eigenvalue and Eigenvector Eigenvector Centrality The

More information

Spectral Methods for Subgraph Detection

Spectral Methods for Subgraph Detection Spectral Methods for Subgraph Detection Nadya T. Bliss & Benjamin A. Miller Embedded and High Performance Computing Patrick J. Wolfe Statistics and Information Laboratory Harvard University 12 July 2010

More information