Data Mining and Analysis: Fundamental Concepts and Algorithms

Similar documents
Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms

Machine Learning for Data Science (CS4786) Lecture 11

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms

1 Matrix notation and preliminaries from spectral graph theory

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

1 Matrix notation and preliminaries from spectral graph theory

Spectral Clustering. Zitao Liu

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Spectral Clustering. Guokun Lai 2016/10

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Spectra of Adjacency and Laplacian Matrices

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

COMPSCI 514: Algorithms for Data Science

Algebraic Representation of Networks

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Data Mining and Analysis: Fundamental Concepts and Algorithms

Lecture 13: Spectral Graph Theory

Lecture 1 and 2: Introduction and Graph theory basics. Spring EE 194, Networked estimation and control (Prof. Khan) January 23, 2012

Data Mining and Analysis: Fundamental Concepts and Algorithms

Markov Chains and Spectral Clustering

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Markov Chains, Random Walks on Graphs, and the Laplacian

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Knowledge Discovery and Data Mining 1 (VO) ( )

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

A lower bound for the Laplacian eigenvalues of a graph proof of a conjecture by Guo

STA141C: Big Data & High Performance Statistical Computing

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Lecture 12 : Graph Laplacians and Cheeger s Inequality

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

Statistical Learning Notes III-Section 2.4.3

7. Symmetric Matrices and Quadratic Forms

17.1 Directed Graphs, Undirected Graphs, Incidence Matrices, Adjacency Matrices, Weighted Graphs

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Math 1553, Introduction to Linear Algebra

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Graph fundamentals. Matrices associated with a graph

Parallel Singular Value Decomposition. Jiaxing Tan

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering. Some Slides

Strongly Regular Graphs, part 1

Physical Metaphors for Graphs

MATH 829: Introduction to Data Mining and Analysis Clustering II

Graph Partitioning Using Random Walks

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Clustering compiled by Alvin Wan from Professor Benjamin Recht s lecture, Samaneh s discussion

Lecture 1: Graphs, Adjacency Matrices, Graph Laplacian

The maximal stable set problem : Copositive programming and Semidefinite Relaxations

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

MATH6142 Complex Networks Exam 2016

An indicator for the number of clusters using a linear map to simplex structure

22m:033 Notes: 7.1 Diagonalization of Symmetric Matrices

8.1 Concentration inequality for Gaussian random matrix (cont d)

Basic graph theory 18.S995 - L26.

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Fundamentals of Engineering Analysis (650163)

3 Best-Fit Subspaces and Singular Value Decomposition

Lecture 5: Random Walks and Markov Chain

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

Diffusion and random walks on graphs

Web Structure Mining Nodes, Links and Influence

LECTURE NOTE #11 PROF. ALAN YUILLE

Chapter 4. Signed Graphs. Intuitively, in a weighted graph, an edge with a positive weight denotes similarity or proximity of its endpoints.

Communities, Spectral Clustering, and Random Walks

GAUSSIAN PROCESS TRANSFORMS

Linear Algebra. Matrices Operations. Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0.

Boolean Inner-Product Spaces and Boolean Matrices

Data-dependent representations: Laplacian Eigenmaps

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

MATH 567: Mathematical Techniques in Data Science Clustering II

Quick Tour of Linear Algebra and Graph Theory

Topics. Vectors (column matrices): Vector addition and scalar multiplication The matrix of a linear function y Ax The elements of a matrix A : A ij

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

1 Last time: least-squares problems

, some directions, namely, the directions of the 1. and

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Relations Graphical View

Lecture 13 Spectral Graph Algorithms

CS 246 Review of Linear Algebra 01/17/19

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

1 9/5 Matrices, vectors, and their applications

Networks and Their Spectra

Transcription:

: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil Chapter 16: Spectral & Graph Clustering Chapter 16: Spectral & Graph Clustering 1 /

Graphs and Matrices: Adjacency Matrix Given a dataset D = {x i } n i=1 consisting of n points in Rd, let A denote the n n symmetric similarity matrix between the points, given as a 11 a 12 a 1n a 21 a 22 a 2n A =...... a n1 a n2 a nn where A(i, j) = a ij denotes the similarity or affinity between points x i and x j. We require the similarity to be symmetric and non-negative, that is, a ij = a ji and a ij 0, respectively. The matrix A is the weighted adjacency matrix for the data graph. If all affinities are 0 or 1, then A represents the regular adjacency relationship between the vertices. Chapter 16: Spectral & Graph Clustering 2 /

Iris Similarity Graph: Mutual Nearest Neighbors V = n = 150, E = m = 1730 Chapter 16: Spectral & Graph Clustering 3 /

Graphs and Matrices: Degree Matrix For a vertex x i, let d i denote the degree of the vertex, defined as d i = n j=1 a ij We define the degree matrix of graph G as the n n diagonal matrix: n d 1 0 0 j=1 a 1j 0 0 0 d 2 0 =...... = 0 n j=1 a 2j 0...... 0 0 d n n 0 0 j=1 a nj can be compactly written as (i, i) = d i for all 1 i n. Chapter 16: Spectral & Graph Clustering 4 /

Graphs and Matrices: Normalized Adjacency Matrix The normalized adjacency matrix is obtained by dividing each row of the adjacency matrix by the degree of the corresponding node. Given the weighted adjacency matrix A for a graph G, its normalized adjacency matrix is defined as a 11 a 12 a d 1 d 1 1n M = 1 A =. a 21 d 2 a 22 d 2 a n1 d n d 1 a 2n d 2..... a n2 d n Because A is assumed to have non-negative elements, this implies that each element of M, namely m ij is also non-negative, as m ij = a ij d i 0. Each row in M sums to 1, which implies that 1 is an eigenvalue of M. In fact, λ 1 = 1 is the largest eigenvalue of M, and the other eigenvalues satisfy the property that λ i 1. Because M is not symmetric, its eigenvectors are not necessarily orthogonal. a nn d n Chapter 16: Spectral & Graph Clustering 5 /

Example Graph: Adjacency and Degree Matrices 1 6 2 4 5 3 7 Its adjacency and degree matrices are given as 0 1 0 1 0 1 0 3 0 0 0 0 0 0 1 0 1 1 0 0 0 0 3 0 0 0 0 0 0 1 0 1 0 0 1 0 0 3 0 0 0 0 A = 1 1 1 0 1 0 0 = 0 0 0 4 0 0 0 0 0 0 1 0 1 1 0 0 0 0 3 0 0 1 0 0 0 1 0 1 0 0 0 0 0 3 0 0 0 1 0 1 1 0 0 0 0 0 0 0 3 Chapter 16: Spectral & Graph Clustering 6 /

Example Graph: Normalized Adjacency Matrix 1 6 2 4 5 3 7 The normalized adjacency matrix is as follows: 0 0.33 0 0.33 0 0.33 0 0.33 0 0.33 0.33 0 0 0 0 0.33 0 0.33 0 0 0.33 M = 1 A = 0.25 0.25 0.25 0 0.25 0 0 0 0 0 0.33 0 0.33 0.33 0.33 0 0 0 0.33 0 0.33 0 0 0.33 0 0.33 0.33 0 The eigenvalues of M are: λ 1 = 1, λ 2 = 0.483, λ 3 = 0.206, λ 4 = 0.045, λ 5 = 0.405, λ 6 = 0.539, λ 7 = 0.7 Chapter 16: Spectral & Graph Clustering 7 /

Graph Laplacian Matrix The Laplacian matrix of a graph is defined as L = A n j=1 a 1j 0 0 a 11 a 12 a 1n n 0 j=1 = a 2j 0...... a 21 a 22 a 2n...... n 0 0 j=1 a nj a n1 a n2 a nn j 1 a 1j a 12 a 1n a 21 j 2 = a 2j a 2n...... a n1 a n2 j n a nj L is a symmetric, positive semidefinite matrix. This means that L has n real, non-negative eigenvalues, which can be arranged in decreasing order as follows: λ 1 λ 2 λ n 0. Because L is symmetric, its eigenvectors are orthonormal. The rank of L is at most n 1, and the smallest eigenvalue is λ n = 0. Chapter 16: Spectral & Graph Clustering 8 /

Example Graph: Laplacian Matrix 1 6 2 4 5 3 7 The graph Laplacian is given as 3 1 0 1 0 1 0 1 3 1 1 0 0 0 0 1 3 1 0 0 1 L = A = 1 1 1 4 1 0 0 0 0 0 1 3 1 1 1 0 0 0 1 3 1 0 0 1 0 1 1 3 The eigenvalues of L are as follows: λ 1 = 5.618, λ 2 = 4.618, λ 3 = 4.414, λ 4 = 3.382, λ 5 = 2.382, λ 6 = 1.586, λ 7 = 0 Chapter 16: Spectral & Graph Clustering 9 /

Normalized Laplacian Matrices The normalized symmetric Laplacian matrix of a graph is defined as j 1 a 1j a 12 a 1n d1 d 1 d1 d 2 a 21 j 2 a 2j a 2n L s = 1/2 L 1/2 d2 d = 1 d2 d 2........ a n2 dnd2 a n1 dnd1 d1 d n d2 d n j n a nj dnd n L s is a symmetric, positive semidefinite matrix, with rank at most n 1. The smallest eigenvalueλ n = 0. The normalized asymmetric Laplacian matrix is defined as L a = 1 L = j 1 a 1j d 1 a 12 a 21 d 2. a n1 d n d 1 a 1n d 1 j 2 a 2j a 2n d 2 d 2....... a n2 d n j n a nj d n L a is also a positive semi-definite matrix with n real eigenvalues λ 1 λ 2 λ n = 0. Chapter 16: Spectral & Graph Clustering 10 /

Example Graph: Normalized Symmetric Laplacian Matrix 1 6 2 4 5 3 7 The normalized symmetric Laplacian is given as 1 0.33 0 0.29 0 0.33 0 0.33 1 0.33 0.29 0 0 0 0 0.33 1 0.29 0 0 0.33 L s = 0.29 0.29 0.29 1 0.29 0 0 0 0 0 0.29 1 0.33 0.33 0 0 0 0.33 1 0.33 0 0 0.33 0 0.33 0.33 1 The eigenvalues of L s are as follows: λ 1 = 1.7, λ 2 = 1.539, λ 3 = 1.405, λ 4 = 1.045, λ 5 = Zaki 0.794, & Meira Jr. λ(rpi 6 = and0.517, UFMG) λ 7 = 0 Chapter 16: Spectral & Graph Clustering 11 /

Example Graph: Normalized Asymmetric Laplacian Matrix 1 6 2 4 5 3 7 The normalized asymmetric Laplacian matrix is given as 1 0.33 0 0.33 0 0.33 0 0.33 1 0.33 0.33 0 0 0 0 0.33 1 0.33 0 0 0.33 L a = 1 L = 0.25 0.25 0.25 1 0.25 0 0 0 0 0 0.33 1 0.33 0.33 0 0 0 0.33 1 0.33 0 0 0.33 0 0.33 0.33 1 The eigenvalues of L a are identical to those for L s, namelyλ 1 = 1.7, λ 2 = 1.539, λ 3 = Zaki 1.405, & Meira Jr. λ(rpi 4 = and1.045, UFMG) λ 5 = 0.794, Data λ 6 Mining = 0.517, and Analysis λ 7 = 0 Chapter 16: Spectral & Graph Clustering 12 /

Clustering as Graph Cuts A k-way cut in a graph is a partitioning or clustering of the vertex set, given as C = {C 1,...,C k }. We require C to optimize some objective function that captures the intuition that nodes within a cluster should have high similarity, and nodes from different clusters should have low similarity. Given a weighted graph G defined by its similarity matrix A, let S, T V be any two subsets of the vertices. We denote by W(S, T) the sum of the weights on all edges with one vertex in S and the other in T, given as W(S, T) = a ij v i S v j T Given S V, we denote by S the complementary set of vertices, that is, S = V S. A (vertex) cut in a graph is defined as a partitioning of V into S V and S. The weight of the cut or cut weight is defined as the sum of all the weights on edges between vertices in S and S, given as W(S, S). Chapter 16: Spectral & Graph Clustering 13 /

Cuts and Matrix Operations Given a clustering C = {C 1,...,C k } comprising k clusters. Let c i {0, 1} n be the cluster indicator vector that records the cluster membership for cluster C i, defined as { 1 if v j C i c ij = 0 if v j C i The cluster size can be written as C i = c T i c i = c i 2 The volume of a cluster C i is defined as the sum of all the weights on edges with one end in cluster C i : vol(c i ) = W(C i, V) = n n d r = c ir d rc ir = c ir rsc is = c T i c i v r C i v r C i r=1 s=1 The sum of weights of all internal edges is: W(C i, C i ) = v r C i v s C i a rs = n r=1 s=1 n c ir a rsc is = c T i Ac i We can get the sum of weights for all the external edges as follows: W(C i, C i ) = v r C i v s V C i a rs = W(C i, V) W(C i, C i ) = c i ( A)c i = c T i Lc i Chapter 16: Spectral & Graph Clustering 14 /

Clustering Objective Functions: Ratio Cut The clustering objective function can be formulated as an optimization problem over the k-way cut C = {C 1,...,C k }. The ratio cut objective is defined over a k-way cut as follows: min C J rc (C) = k i=1 W(C i, C i ) C i = k i=1 c T i Lc i c T i c i = k i=1 c T i Lc i c i 2 Ratio cut tries to minimize the sum of the similarities from a cluster C i to other points not in the cluster C i, taking into account the size of each cluster. Unfortunately, for binary cluster indicator vectors c i, the ratio cut objective is NP-hard. An obvious relaxation is to allow c i to take on any real value. In this case, we can rewrite the objective as min C J rc (C) = k i=1 c T i Lc i c i 2 = k i=1 ( ) T ( ) ci ci L = c i c i k u T i Lu i The optimal solution comprises the eigenvectors corresponding to the k smallest eigenvalues of L, i.e., the eigenvectors u n, u n 1,...,u n k+1 represent the relaxed cluster indicator vectors. i=1 Chapter 16: Spectral & Graph Clustering 15 /

Clustering Objective Functions: Normalized Cut Normalized cut is similar to ratio cut, except that it divides the cut weight of each cluster by the volume of a cluster instead of its size. The objective function is given as min C J nc (C) = k i=1 W(C i, C i ) vol(c i ) = k c T i Lc i c T i c i i=1 We can obtain an optimal solution by allowing c i to be an arbitrary real vector. The optimal solution comprise the eigenvectors corresponding to the k smallest eigenvalues of either the normalized symmetric or asymmetric Laplacian matrices, L s and L a. Chapter 16: Spectral & Graph Clustering 16 /

Spectral Clustering Algorithm The spectral clustering algorithm takes a dataset D as input and computes the similarity matrix A. For normalized cut we chose either L s or L a, whereas for ratio cut we choose L. Next, we compute the k smallest eigenvalues and corresponding eigenvectors of the chosen matrix. The main problem is that the eigenvectors u i are not binary, and thus it is not immediately clear how we can assign points to clusters. One solution to this problem is to treat the n k matrix of eigenvectors as a new data matrix: U = y T 1 u n u n 1 u n k+1 y T 2 normalize rows.. = Y y T n We then cluster the new points in Y into k clusters via the K-means algorithm or any other fast clustering method to obtain binary cluster indicator vectors c i. Chapter 16: Spectral & Graph Clustering 17 /

Spectral Clustering Algorithm 1 2 3 4 5 6 7 SPECTRAL CLUSTERING (D, k): Compute the similarity matrix A R n n if ratio cut then B L else if normalized cut then B L s or L a Solve Bu i = λ i u i for i = n,...,n k + 1, where λ n λ n 1 λ n k+1 U ( ) u n u n 1 u n k+1 Y normalize rows of U C {C 1,...,C k } via K-means on Y Chapter 16: Spectral & Graph Clustering 18 /

Spectral Clustering on Example Graph k = 2, normalized cut (normalized asymmetric Laplacian) 1 6 2 4 5 3 7 u 2 5 6, 7 0.5 0 1, 3 0.5 4 2 1 1 0.9 0.8 0.7 0.6 u 1 Chapter 16: Spectral & Graph Clustering 19 /

Normalized Cut on Iris Graph k = 3, normalized asymmetric Laplacian setosa virginica versicolor C 1 (triangle) 50 0 4 C 2 (square) 0 36 0 C 3 (circle) 0 14 46 Chapter 16: Spectral & Graph Clustering 20 /

Maximization Objectives: Average Cut The average weight objective is defined as max C J aw (C) = k i=1 W(C i, C i ) C i = k i=1 c T i Ac i c T i c i = k u T i Au i i=1 where u i is an arbitrary real vector, which is a relaxation of the binary cluster indicator vectors c i. We can maximize the objective by selecting the k largest eigenvalues of A, and the corresponding eigenvectors. max C J aw (C) = u T 1 Au 1 + +u T k Au k = λ 1 + +λ k where λ 1 λ 2 λ n. In general, while A is symmetric, it may not be positive semidefinite. This means that A can have negative eigenvalues, and to maximize the objective we must consider only the positive eigenvalues and the corresponding eigenvectors. Chapter 16: Spectral & Graph Clustering 21 /

Maximization Objectives: Modularity Given A, the weighted adjacency matrix, the modularity of a clustering is the difference between the observed and expected fraction of weights on edges within the clusters. The clustering objective is given as max C J Q (C) = k i=1 ( c T i Ac i tr( ) (dt i c i ) 2 ) tr( ) 2 = k c T i Qc i i=1 where Q is the modularity matrix: Q = 1 ) (A d dt tr( ) tr( ) The optimal solution comprises the eigenvectors corresponding to the k largest eigenvalues of Q. Since Q is symmetric, but not positive semidefinite, we use only the positive eigenvalues. Chapter 16: Spectral & Graph Clustering 22 /

Markov Chain Clustering A Markov chain is a discrete-time stochastic process over a set of states, in our case the set of vertices V. The Markov chain makes a transition from one node to another at discrete timesteps t = 1, 2,..., with the probability of making a transition from node i to node j given as m ij. Let the random variable X t denote the state at time t. The Markov property means that the probability distribution of X t over the states at time t depends only on the probability distribution of X t 1, that is, P(X t = i X 0, X 1,...,X t 1 ) = P(X t = i X t 1 ) Further, we assume that the Markov chain is homogeneous, that is, the transition probability is independent of the time step t. P(X t = j X t 1 = i) = m ij Chapter 16: Spectral & Graph Clustering 23 /

Markov Chain Clustering: Markov Matrix The normalized adjacency matrix M = 1 A can be interpreted as the n n transition matrix where the entry m ij = a ij d i is the probability of transitioning or jumping from node i to node j in the graph G. The matrix M is thus the transition matrix for a Markov chain or a Markov random walk on graph G. That is, Given node i the transition matrix M specifies the probabilities of reaching any other node j in one time step. In general, the transition probability matrix for t time steps is given as M t 1 M = M t Chapter 16: Spectral & Graph Clustering 24 /

Markov Chain Clustering: Random Walk A random walk on G thus corresponds to taking successive powers of the transition matrix M. Let π 0 specify the initial state probability vector at time t = 0. The state probability vector after t steps is π T t = π T t 1M = π T t 2M 2 = = π T 0 M t Equivalently, taking transpose on both sides, we get π t = (M t ) T π 0 = (M T ) t π 0 The state probability vector thus converges to the dominant eigenvector of M T. Chapter 16: Spectral & Graph Clustering 25 /

Markov Clustering Algorithm Consider a variation of the random walk, where the probability of transitioning from node i to j is inflated by taking each element m ij to the power r 1. Given a transition matrix M, define the inflation operator Υ as follows: Υ(M, r) = { (mij ) r n a=1 (m ia) r } n i,j=1 The net effect of the inflation operator is to increase the higher probability transitions and decrease the lower probability transitions. The Markov clustering algorithm (MCL) is an iterative method that interleaves matrix expansion and inflation steps. Matrix expansion corresponds to taking successive powers of the transition matrix, leading to random walks of longer lengths. On the other hand, matrix inflation makes the higher probability transitions even more likely and reduces the lower probability transitions. MCL takes as input the inflation parameter r 1. Higher values lead to more, smaller clusters, whereas smaller values lead to fewer, but larger clusters. Chapter 16: Spectral & Graph Clustering 26 /

Markov Clustering Algorithm: MCL The final clusters are found by enumerating the weakly connected components in the directed graph induced by the converged transition matrix M t, where the edges are defined as: E = { (i, j) M t (i, j) > 0 } A directed edge (i, j) exists only if node i can transition to node j within t steps of the expansion and inflation process. A node j is called an attractor if M t (j, j) > 0, and we say that node i is attracted to attractor j if M t (i, j) > 0. The MCL process yields a set of attractor nodes, V a V, such that other nodes are attracted to at least one attractor in V a. To extract the clusters from G t, MCL first finds the strongly connected components S 1, S 2,...,S q over the set of attractors V a. Next, for each strongly connected set of attractors S j, MCL finds the weakly connected components consisting of all nodes i V t V a attracted to an attractor in S j. If a node i is attracted to multiple strongly connected components, it is added to each such cluster, resulting in possibly overlapping clusters. Chapter 16: Spectral & Graph Clustering 27 /

Algorithm MARKOV CLUSTERING 1 2 3 4 5 6 7 8 9 10 MARKOV CLUSTERING (A, r,ǫ): t 0 Add self-edges to A if they do not exist M t 1 A repeat t t + 1 M t M t 1 M t 1 M t Υ(M t, r) until M t M t 1 F ǫ G t directed graph induced by M t C {weakly connected components in G t } Chapter 16: Spectral & Graph Clustering 28 /

MCL Attractors and Clusters r = 2.5 1 6 2 4 5 3 7 1 6 1 0.5 1 2 4 5 0.5 1 0.5 3 7 Chapter 16: Spectral & Graph Clustering 29 /

MCL on Iris Graph (a) r = 1.3 (b) r = 2 Chapter 16: Spectral & Graph Clustering 30 /

Contingency Table: MCL Clusters versus Iris Types r = 1.3 iris-setosa iris-virginica iris-versicolor C 1 (triangle) 50 0 1 C 2 (square) 0 36 0 C 3 (circle) 0 14 49 Chapter 16: Spectral & Graph Clustering /