NORMALIZED CUTS ARE APPROXIMATELY INVERSE EXIT TIMES

Size: px

Start display at page:

Download "NORMALIZED CUTS ARE APPROXIMATELY INVERSE EXIT TIMES"

Scot Harmon
5 years ago
Views:

1 NORMALIZED CUTS ARE APPROXIMATELY INVERSE EXIT TIMES MATAN GAVISH AND BOAZ NADLER Abstract The Normalized Cut is a widely used measure of separation between clusters in a graph In this paper we provide a novel probabilistic perspective on this measure We show that for a partition of a graph into two weakly connected sets, V = A B, in fact NcutV = /τ A B + /τ B A, where τ A B is the uni-directional characteristic exit time of a random walk from subset A to subset B Using matrix perturbation theory, we show that this approximation is exact to first order in the connection strength between the two subsets A and B, and derive an explicit bound for the approximation error Our result implies that for a Markov chain composed of a reversible component A that is weakly connected to an absorbing cluster B, the easy-to-compute Normalized Cut measure is a reliable proxy for the chain s spectral gap Key words normalized cut, graph partitioning, Markov chain, characteristic exit time, generalized eigenvalues, perturbation bounds, matrix perturbation theory AMS subject classifications 6H30, 5A8, 5A4, 60J0, 60J0, 65H7 Introduction Clustering of data is a fundamental problem in many fields When the data is represented as a symmetric weighted similarity graph, clustering becomes a graph partitioning problem In this paper we focus on the popular normalized cut criteria for graph partitioning The normalized cut Ncut criteria, and its variants based on conductance, are widely used measures of separation between clusters in a graph in a variety of applications [7, 7] Since optimizing the Ncut measure is in general an NP-hard problem, several researchers suggested various spectral-based approximation methods, see for example [3, 5] These spectral methods, in turn, have interesting connections to kernel K-means clustering and to non-negative matrix factorizations, see [4, 5] In this paper we focus on the normalized cut measure itself Similar to previous works [0, 4], we consider a random walk on the weighted graph, with the probability to jump from one node to another proportional to the edge weight between them Our contribution is a novel Markov chain based perspective on this measure Our main result is that for a partition of a graph V into two disjoint sets, V = A B, NcutV τ A B + τ B A where τ A B is the characteristic exit time of a random walk from cluster A to cluster B, as defined explicitly below Furthermore, we provide an explicit bound for the approximation error in Eq, which is second order in the connection strengths between the two subsets A and B Our proof is based on matrix perturbation theory for generalized eigenvalue problems [8] Interestingly, can be viewed as a discrete analog of a well-known result for continuous diffusion processes with two metastable states [9], see Section 4 From a mathematical point of view, when the connection strengths between the two clusters are small, the two clusters become nearly uncoupled and the resulting Markov chain nearly decomposable The properties of such singularly perturbed Markov chains have been a subject of several works, see the surveys [6, ] and the many references therein Some works considered the behavior of the stationary distribution, others considered mean first passage times, typically focusing on singular Laurent series expansions, and yet others considered the important problem of dimensionality reduction and aggregation of states for such nearly reducible Markov chains, see for example [6, ] Our focus, in contrast, is on the relation of such Markov chains to the Department of Statistics, Stanford University, Stanford, 94305, CA gavish@stanfordedu Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, 7600, Israel boaznadler@weizmannacil

2 normalized cut measure - an easy-to-compute quantity use a variety of applied scenarios To this end, our analysis involves the study of a singularly perturbed Markov chain with transient states, as we consider one of the clusters as absorbing with respect to a random walk that starts at the other cluster Furthermore, rather than series expansions, we derive explicit error bounds for the approximation in Our result has several implications and potential applications First, in the context of clustering, Eq provides a precise probabilistic statement regarding the relation between the algebraic, easy-to-compute Ncut measure of a given partition, and the eigenvalues corresponding to local random walks on each of these partitions Our result is different from previously derived global spectral bounds that depend on the eigenvalues of the random walk on the entire graph In particular, our analysis implies that for weakly connected clusters, the normalized cut measure essentially considers only the time it takes for a random walk to exit a given cluster, with taking into account its internal structure As discusse more detail in [], this leads to a fundamental limitation of the Ncut measure to detect structures at different scales in a large graph, and motivates the use of other measures that also take into account the internal characteristics of potential clusters [3] Another potential application of our result is in the study of stochastic dynamical systems an particular their discretization into reversible finite Markov chains [3] In this field, a fundamental problem is the identification of almost invariant sets, eg subsets of phase space where a stochastic system stays for a long time before exiting Our observation above, in the context of clustering, is also relevant here As complex dynamical systems may exhibit invariant sets at very different spatial and time scales, our analysis suggests that the Ncut measure or its various spectral relaxations may not be appropriate in such cases Yet another application of our result is to the numerical evaluation of the spectral gap of nearly decomposable Markov matrices In the study of stochastic dynamical systems, the equivalent quantity of interest are the mean exit times from given or a-priori known metastable sets, and their dependence on various parameters, such as temperature, bond-strengths etc Upon discretization of such systems, this amounts to the exit times of nearly uncoupled Markov chains In this context, one may apply Eq in the reverse direction: Rather than computing the second largest eigenvalue of a potentially very large matrix, and from it the characteristic exit time from this metastable state, one may approximate it by the easy-tocompute normalized cut measure, with explicit guarantees as to its accuracy The paper is organized as follows The normalized cut and relevant notation is introduced in Section Our main result is state Section 3, discusse Section 4, and demonstrated using a simple numerical simulation in Section 5 The proof appears in Section 6, with some auxiliary results deferred to the Appendix Preliminaries Let V, W be a connected weighted graph with node set V = {,, n} and an n n affinity or edge weight matrix W = W i,j i,j n The matrix W is assumed to be symmetric with non-negative entries, and such that W ii > 0 for all i =,, n The normalized cut of a graph partition Consider a partition of the note set into two disjoint clusters, V = A B With loss of generality, we denote the clusters by A = { k} and B = {k +, n} As discusse the introduction, the normalized cut [7] is a widely used, easy to calculate measure of separation for a given graph partition: Definition For a partition V = A B, define Cut A, B = Assoc A, V = i A, j B i A, j V W i,j W i,j

3 Then the normalized cut of A with respect to B is Ncut A, B = Cut A, B Assoc A, V whereas the multiway normalized cut of the partition V = A B is defined as [8, 9] MNcut A Cut A, B Cut B, A B = Ncut A, B + Ncut B, A = + Assoc A, V Assoc B, V The goal of this paper is to provide a precise probabilistic interpretation of this measure, which holds when the clusters A and B are well separated in a sense defined below Notation For any i V, denote by W i,a = j A W i,j W i,b = j B W i,j 3 the total affinity between the node i and the clusters A and B, respectively Similarly, denote W A,A = W i,a = W i,j W A,B = W i,b = W i,j i A i A,j A i A i A,j B In this notation, the normalized cut measure is Ncut A, B = W A,B W A,A 4 + W A,B With an eye towards a matrix perturbation approach, we define the connectivity parameter ε = max i A W i,b 5 W i,a In this paper we focus on situations where W i,b W i,a for all i A, namely all nodes i A are weakly connected to B Equivalently, ε, which allows ε to be used as a small parameter in a matrix perturbation analysis 3 The Uni-directional Exit time τ A B Following [0, 4], we define a Markov random walk on our graph V, W that jumps from node to node at unit time intervals We define the probability to jump from node i to node j as p i,j = W i,j /D i,i, where D i,i = j W i,j For our purposes, it will also be useful to consider random walks on subsets of V, such as a random walk constrained to the cluster A, as well as random walks where some of the nodes are absorbing In particular, for the uni-directional exit time from cluster A to cluster B, we consider a simplification of the weighted graph V, W, in which the cluster B is collapseto a new, single absorbing state b, with no allowed transitions b A Let V A b = A b be the node set of this simplified graph whose corresponding affinity matrix is W, W,k W,B W A b = W k, W k,k 0 W k,b Consider a random walk on this simplified graph, V A b, W A b, whose corresponding Markov matrix is D A b W A b, where D A b is the row-normalization diagonal matrix, Di,i A b = j W i,j A b Denote the eigenvalues of this matrix by = λ A b λ A b λ A b k+ 3

4 Assume that the set A is connected, so that > λ A b, and moreover that λ A b is the second largest eigenvalue in absolute value The characteristic exit time from A to B is defined as the relaxation time of this chain, τ A B = λ A b 6 3 Main result Our main result is a relation between the exit time τ A B and the normalized cut measure Eq To this end, it will be useful to consider in addition a random walk constrained to stay in the cluster A Under the assumption that A is connected, this chain is reversible We denote by λ A its second largest eigenvalue and by Γ A = λ A 3 its spectral gap Again we assume that the second largest eigenvalue λ A largest in absolute value We are finally ready to state our main result is also the second Theorem 3 Let G be a symmetric graph with affinity matrix W Consider a binary partition of the graph V = A B Assume that the subset A is connected and that for the random walk constrained to remain in A, the second largest eigenvalue λ A is also second largest W in absolute value Denote by ε = max i,b i A the connectivity parameter W i,a For any ε 0, Assume that τ A B Ncut A, B 3 0 < ε < ΓA 8 33 Then Ncut A, B = τ A B + ε Rε, 34 where 0 Rε 6 Γ A { } W Corollary 3 Denote ε = max max i,b W i A, max i,a W i,a i B and assume that W i,b 0 < ε < 8 min { Γ A, Γ B} Then where MNcut A, B = τ A B + τ B A + ε Rε, 35 0 Rε 6 Γ A + Γ B 4

5 4 Discussion For a binary partition, Eq 35 gives a novel probabilistic interpretation to the normalized cut measure, in terms of characteristic exit times from each of the two subsets For a partition into s disjoint subsets, V = s j= A j, Eq 34 generalizes to a relation between the multiway normalized cut measure and the exit times from each of the s clusters, s MNcut A j = + ε Rε 4 τ j= Aj A c j where A c j = V \ A j and a remainder term Rε 0 with a bound analogous to the one in the binary case 4 Limitations The key assumption limiting the applicability of our result is Eq 33, which requires that ε = max i W i,b /W i,a < λ A /8 Hence, our result holds only when A and B are weakly connected, compared to the internal spectral gap of a random walk constrained to remain in A In addition, while a small value of ε mplies a small normalized cut, since NcutA, B ε, the converse is in general false: it is possible to construct examples where the normalized cut is small yet ε is large Indeed ε is a worse-case, rather than an on-average measure of separation between clusters Informally, a small value of ε means that the boundary between the clusters A and B is pronounced, in that each node in A has much stronger overall connection to nodes in A than to nodes in B 4 Sharpness of the upper bound A closer inspection of the proof shows that it in fact gives a stronger bound on the residual Rε from Eq 34 To this end, consider a Markov chain X t on A b that starts from the stationary distribution π inside A, { W i,a /W A,A i A π i = 0 i = b Then the one-step transition probability to the absorbing state b is P A B = P X = b X 0 = i π i = [ W i,b ] i,a W W i,a + W i,b W A,A 4 i A i A To factor the small connectivity magnitude ε, we write P A B = εq A B By definition, this quantity satisfies that Q A B and measures the one-step absorption probability in units of ε, when the Markov chain on A is in its stationary distribution Then an improved bound on the residual is Rε 6 Γ A min {Q A B, } Finally, let us remark that the coefficient 6 above is by no means sharp For example, in the numerical experiment described below, the actual residual observed was approximately 0 ε Γ A 43 Comparison to previous spectral bounds Several authors derived the following lower bound on the multiway normalized cut of a partition into s clusters V = s j= A j: s MNcut A j λ j V, 43 where λ j V are the eigenvalues of the global random walk matrix on the whole graph G [8, ] In contrast, our result Eq 4 is different as it provides under suitable cluster separability conditions both a lower and a precise upper bound on MNcutV in terms of local properties of the graph: the between cluster connectivity, internal relaxation times within each cluster, and the characteristic exit times from them These quantities depend on eigenvalues of various normalized submatrices of G, which are in general different from the global eigenvalues λ j V 5 j=

6 44 Insensitivity to internal structure One implication of our result is that when A and B are weakly connected, even though NcutA, B contains a normalization of the cut by the volume of a cluster, it nonetheless considers mainly the exit time from A, with taking into account its internal characteristics For example, the set A may in fact contain a disconnected or weakly connected subgraph, but this structure is not captured by the normalized cut measure As discusse [], this leads to some limitations of the normalized cut measure in discovering clusters in graphs 45 Approximation of exit times by Ncut In the study of weakly coupled stochastic dynamical systems, a quantity of interest is the characteristic exit time from a given set of nodes At low temperatures corresponding to extremely small values of ε, numerical methods for calculating λ A b and hence τ A B may be unstable as λ A b becomes very close to In contrast, the N cut measure can still be calculated efficiently Our result thus provides a-priori guarantees for the accuracy of this approximation to λ A b Similarly, the N cut approximation can be computed for extremely large graphs, for which eigenvalue computations may be quite challenging or even no longer feasible 46 Analogy to continuum diffusion Finally, we mention that our result can also be viewed as a discrete analog of known results regarding the second eigenvalue for a continuum diffusion process with two metastable states [9] Consider a particle performing diffusion in a multi-dimensional potential with two wells separated by a high potential barrier In this case, asymptotically as temperature tends to zero equivalently in our case ε 0, the second non-trivial eigenvalue µ of the Fokker-Planck diffusion operator with two metastable states A, B corresponding to the two potential wells is approximately given by µ τ A B + τ B A In analogy, this suggests that when there are two well defined clusters in a discrete finite graph, then the second eigenvalue of the whole graph is close to Eq A detailed study of this issue, which is related to isoperimetric and Cheeger inequalities on graphs, is an interesting problem beyond the scope of this paper 5 A Numerical Example One way to evaluate the tightness of the upper bound 34 is to compare the residual NcutA, B /τ A B with our upper bound of 6 ε in a numerical Γ A simulation Unfortunately, as note Section 45, general-purpose numerical procedures do not allow us to measure λ A b when ε We now construct a simple class of examples where λ A b can be calculated analytically, allowing us to test the true residual against our upper bound, for values of ε close to machine precision Assume that the cluster A is a d-regular graph on k nodes, drawn from the uniform distribution, so that the spectral gap Γ A is reasonably large When all vertices of A have the same exit probability α to the absorbing state b, it is easy to see that Theorem 3 holds with no approximation error, namely NcutA, b = λ A b = /τ A b The simplest example where the exit probabilities are not all equal is when they are equal to α on one half of A and to rα on the other half of A, for some r > 0 Consider then two d- regular graphs V i, W i, i =,, sampled from the uniform distribution on d-regular graphs Denote V = {,, k} and V = {k +, k}, and let b = k + be the absorbing state Define an affinity matrix W on V = {,, k + } by W i,j = or W i,j W i,j = α i k, j = k + rα k + i k, j = k + = or i j = k so that each node in V is connected to a corresponding node in V, to its neighbors in V, and to the absorbing state Note that ε = rα/d + and that λ A b α is the second 6

7 x 0 3 λ Ncut and upper bound Ncut λ + 6 ε / Γ A True residual and upper bound on a log scale Ncut λ 6 ε / Γ A x 0 3 Ncut and "better" upper bound λ Ncut λ + ε / Γ A ε x logε ε x 0 3 Fig 5 Numerical Comparison of the normalized cut and the inverse exit time approximation eigenvalue of the 3 3 Markov matrix d d++α d++rα d++α d d++rα α d++α rα d++rα 0 0, allowing us to calculate it analytically Figure 5 summarizes the result of this experiment for k = 50, d = 7, r = 0, and ε [0 6, 0 ] We compared the exact value of the residual NcutA, b τ A b = NcutA, b λ A b with the upper bound 6 ε In the case under study, the upper bound is informative Γ A yet pessimistic left panel, as the approximation Ncut λ A b is quite exact The true residual NcutA, b λ A b is in this case exactly quadratic in ε middle panel, log scale and our quadratic bound is pessimistic by a multiplicative factor of roughly 80 Replacing the constant 6 in our main result, which is presumably not tight due to limitations of our proof technique, with the constant, we also compared the residual NcutA, b λ A b with the quantity ε /Γ A right panel The reader is invited to re-create this experiment, replacing the values k,d,r and α with arbitrary values The Matlab script and the original 7-regular graph used here are available at the author homepage 6 Proof of the main result To simplify the proof, we first introduce the following notation: We write for the vector W i,a i A and d for the vector ε W i,b i A Hence i is the in-degree of a node i A, whereas d i is its -degree scaled by the connectivity parameter ε, so that d i = O Similarly, we define = i A i = W A,A and d = i A d i = ε W A,B Our main object of interest, λ A b, will be denoted simply by λ, whereas the Ncut measure is now given by NcutA, B = W A,B W A,A + W A,B = εd + εd 7

8 For future use, we denote D in = diag,, k, D = diag d,, d k and D ε = D in + εd, so that D A b Din + εd = 0 D ε 0 = 0 0 Finally, let W A be the upper k k minor of W, namely W A i,j = W i,j, for i, j k 6 Passing to a generalized eigenvalue problem Recall that λ A b is the second D A b largest root of the characteristic polynomial det W A b λi = 0, or equivalently of det W A b λd A b = 0 Expanding the latter determinant by the last row of W A b gives det W A b λd A b = ± λ det W A λd ε Since λ A b <, it is enough to consider the smaller k k matrices D ε and W A Obviously, x 0 is an eigenvector of D ε W A corresponding to an eigenvalue λ if and only if W A x = λd ε x 6 When 6 holds, we say that λ is a generalized eigenvalue of the matrix pair W A, D ε Clearly, the solutions to 6 are λ λ A b λ A b k+ 6 Lower bound for N cut A, B The lower bound 3 follows from a min-max characterization of generalized eigenvalues [8, pp 8 part VI corollary 6], whereby the largest eigenvalue of the matrix pair W A, D ε is given by λ A b x W A x = max x 0 x D ε x W A D ε = = NcutA, B 6 + εd We note that this simple inequality can also be derived by other methods For example, from a lemma by Ky Fan [8, ] it follows that NcutA, B λ A b λ A b = λ A b 63 Upper bound The proof of the upper bound relies on perturbation theory of generalized eigenvalues [8, ch 6], where we treat matrix pair above as a perturbed pair: W A, D ε = W A, D in + ε 0, D Two facts simplify the perturbation analysis to follow First, the perturbation does not affect the left matrix W A Second, the right hand side matrices on both the perturbed and unperturbed pair are diagonal We will need the following key lemma, which is prove the Appendix Recall the notation Q A B = ε P A B, as define 4 Lemma 6 If ε < ΓA 8, then there exists β R with β 8 Γ A min {Q A B, } such that Λ = + ε d + ε β 63 is a generalized eigenvalue of the matrix pair W A, D ε Given our assumption that ε satisfies 33, Lemma 6 holds Hence let β be such that Λ = + ε d + ε β is a generalized eigenvalue of the matrix pair W A, D ε We now conclude the proof in two steps: We show that NcutA, B = Λ + ε Rε 8

9 We then show that Λ must in fact be this pair s largest eigenvalue, so that necessarily Λ = λ From these two steps the theorem readily follows For the first step, we rewrite 63 as Λ = + εd + ε β +εd 64 Now, for any δ [, ] we have +δ + δ As ε < ΓA 8 and β 8 Γ A, obviously ε β + εd Γ A 8 β Hence, + ε β +εd Combining 64 and 65 gives that + ε β + εd + ε β 65 Λ = + ε d + ε β = + ε R ε = NcutA, B + ε Rε + εd with as required R ε 6 Γ A min {Q A B, } For the second step, we need to show that if ε < ΓA 8, then the eigenvalue Λ is indeed the largest eigenvalue, λ This is achieved by proving that λ A b 3, the next largest eigenvalue of W A, D ε, is strictly smaller than Λ To this end, we appeal to [8, pp 34 part VI theorem 3] whereby if x D x ζ = ε max <, 66 x = x W A x + x D in x then λ A λ A b 3 λ A λ + A b 3 + ζ 67 Here as before, λ A is the second eigenvalue of W A, D0 To see that the condition ζ < holds, observe that from 66 we have We thus deduce from 67 that λ A b 3 λ A x D x ζ ε max x = x D in x = ε max d i i i = ε < λ A λ A b 3 λ A λ + A b ζ ε,

10 Fig 6 Localization intervals of various eigenvalues an particular λ A b 3 λ A + ε 68 To complete the second step, as illustrate Figure 6, we now localize Λ and λ A b 3 in two disjoint intervals, which implies that necessarily λ A b 3 < Λ To this end, first observe that for all δ [, ], we have δ +δ Similarly to 65, this implies that Λ = ε β, + εd + ε β + εd + εd so that Λ +εd +εd ε β Now, clearly + εd = + ε d ε d ε Finally, since β 8 Γ A and ε < ΓA 8, Λ + εd ε β ε ε = ε 69 Combining 69, 68 and the fact that Γ A = λ A 8ε, we obtain λ A b 3 Γ A + ε 8ε + ε = 6ε < ε Λ Hence, the eigenvalue Λ is strictly larger than λ A b 3, namely Λ = λ A b The requested upper bound and Theorem 3 follow Acknowledgements We thank the anonymous refrees for their helpful comments and suggestions MG is supported by a William R and Sara Hart Kimball Stanford Graduate Fellowship BN s research was partly supported by a grant from the ISF Appendix A Lemma 6 finds an eigenvalue for the perturbed matrix W A, D ε = W A, D in + ε 0, D To prove the lemma, following [8], we introduce a tool to study the effect of a diagonal perturbation on generalized eigenvalues 0

11 Let the spectrum of the reversible Markov matrix D in W A be = λ A > λ A λ A 3 λ A k, A and define A = λ A λ A k A Equip the product space R k R k with the norm v, w F = max { v, w } where v, w R k and is the usual Euclidean norm We now define an operator T : R k R k R k R k by T v, w = w + A v, w + v, and its associated quantity dif [A ] by dif [A ] = inf T v, w F v,w F = A3 The following auxiliary lemma relates this quantity to the spectral gap of A Lemma A Assume that λ A = max j λ A j Then, dif[a ] = λa = ΓA Proof Let v, w be a pair with v, w F = Therefore, either w = or v = When w =, Since v and A = λ A, it follows that T v, w F w + A v w A v T v, w F λ A = Γ A Next, consider the case where v = Then by definition A v λ A < It is easy to see that inside the unit ball w, the minimizer of T v, w is simply the average w = v + A v, which gives T v, w F = v A v / To conclude the proof we note that since A is a diagonal matrix with largest entry λ A position,, the quantity v A v is minimized at the vector v = e In this case in e A e = λa = ΓA Proof of Lemma 6: Recall that the matrix pair W A, D ε is a sum of the pair W A, D in and a diagonal perturbation: W A, D ε = W A, D in + ε 0, D

12 For the unperturbed pair, observe that its generalized eigenvalues are all real, since the matrix D in W A is similar to the symmetric matrix D in D in W A D in = D in W A D in Furthermore, since D in W A is the transition matrix of an irreducible Markov chain, the largest eigenvalue in A is λ A = and the remaining eigenvalues are all strictly smaller than in absolute value Now, the vector of length k x = din D in,, is an eigenvector of D in W A D in corresponding to the eigenvalue λa = Let U = x u u k be the k k orthogonal matrix that diagonalizes D in W A D in, namely D in W A D in = U 0 λ A λ A k U Now define X = D in U and decompose X to blocks as X = X X where X = din,, is a k vector and X is the remaining k k matrix, X = D in u u k Clearly we have X W A X = λ A λ A k and X D in X = I k In matrix block notation, and X X X X W A 0 X X = 0 A 0 D in X X = 0 I k With this simultaneous diagonalization of the unperturbed matrix pair W A, D in at hand, we turn to the perturbation pair, 0, D Again using matrix block notation, denote X d X D X X = d d D,

13 where d is a k vector, d = d and D is a k k matrix It is straightforward to verify that d = d din din,, d din k u u k A4 The amount by which the eigenvalue λ A is perturbed as ε departs from 0 depends primarily on the norms of d and D For d we have that d k i= d i i = k i= d i dini d = in i k i= d i d in i πa i, where π A i = dini is the stationary distribution of a random walk inside A Furthermore, d since max i i d =, it follows that for all i A, i ini [i + εd i] Hence, d k i= Since by A4 clearly d d i d in i πa i k i=, we can write d d i i + εd i πa i = Q A B min {Q A B, } As for D, it is easy to verify that since U is orthogonal, D F A5 Now, as the generalized eigenvalues of the pair A, I k are strictly smaller than in absolute value, the column X spans a simple eigenspace of the unperturbed pair W A, D in We can thus appeal to the main instrument in our proof, a theorem from matrix perturbation theory for generalized eigenvalue problems [8, pp 30 part VI theorem 5] In our setting and our notation, this theorem states that if dif [A ] > 0 and if εγ dif [A ] ερ <, A6 where γ = d = d ρ = d + D d F, in then there exists p R k with p γ dif [A ] ερ A7 such that the pair, + ε d + ε d p is a generalized eigenvalue of the matrix pair W A, D ε The first condition in this theorem, namely dif [A ] > 0, follows directly from Lemma A For the second condition, we need to bound the constants γ and ρ, as follows For γ, we have seen in A5 that γ = d = d Similarly, we have ρ = d + D F 3

14 Suppose now that ε < ΓA 8 Then we have ερ < ΓA 4 Invoking Lemma A, whereby dif [A ] = ΓA, we get that dif [A ] ερ > ΓA ΓA 4 = ΓA 4 A8 and therefore εγ dif [A ] ερ < ε Γ A 4 < Γ A 8 Γ A 4 = so the second condition of the theorem, condition A6, is satisfied In conclusion, the theorem above holds, so a vector p as above exists, with γ p dif [A ] ερ Finally, using A9, A8 and A5 and the definition γ = d, we have d p p d d This completes the proof of Lemma 6 Γ A 4 γ d dif [A ] ερ 8 Γ A min {Q A B, } = d dif [A ] ερ A9 REFERENCES [] K Avrachenkov, J Fila, and M Haviv, Singular perturbations of Markov chains and decision processes A survey, in Markov Decision Processes: Models, Methods, Directions and Open Problems, E Feinberg and A Schwartz, eds, Kluwer, 00, pp 3 5 [] F Bach and M I Jordan, Learning spectral clustering with applications to speech separation, Journal of Machine Learning Research, 7 006, pp [3] P Deuflhard, W Huisinga, A Fischer, and C Schutte, Identification of almost invariant aggregates in reversible nearly uncoupled Markov chains, Linear Algebra and its Applications, , pp [4] I Dhillon, Y Guan, and B Kulis, Kernel k-means: spectral clustering and normalized cuts, in Proceedings of the tenth ACM SIGKDD conference, ACM, New York, 004 [5] C Ding, X He, and H Simon, On the equivalence of nonnegative matrix factorization and spectral clustering, in Proceedings of the Fifth SIAM International Conference on Data Mining, 005 [6] M Haviv and Y Ritov, On series expansions and stochastic matrices, SIAM J Matrix Anal Appl, 4 993, pp [7] R Kannan, S Vempala, and A Vetta, On clusterings: good, bad and spectral, J ACM, 5 004, pp [8] S S M Meila and L Xu, Regularized spectral learning, in Proceedings of the Artificial Intelligence and Statistics Workshop, R Cowell and Z Ghahramani, eds, 005 [9] B Matkowsky and Z Schuss, Eigenvalues of the Fokker-Plank operator and the approach to equilibrium for diffusions in potential fields, SIAM J Appl Math, 40 98, pp 4 54 [0] M Meila and J Shi, A random walks view of spectral segmentation, in Artificial Intelligence and Statistics Conference, 00 [] C Meyer, Stochastic complementation, uncoupling Markov chains, and the theory of nearly reducible systems, SIAM Review, 3 989, pp 40 7 [] B Nadler and M Galun, Fundamental limitations of spectral clustering, in Advances in Neural Information Processing Systems 9, B Schölkopf, J Platt, and T Hoffman, eds, MIT Press, Cambridge, MA, 007, pp [3] B Nadler, M Galun, and T Gross, Clustering and multi-scale structure of graphs: a scale-invariant diffusion based approach, in preparation, 0 [4] B Nadler, S Lafon, R Coifman, and I Kevrekidis, Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators, in Advances in Neural Information Processing Systems 8, Y Weiss, B Schölkopf, and J Platt, eds, MIT Press, Cambridge, MA, 006, pp

15 [5] A Ng, M Jordan, and Y Weiss, On spectral clustering: Analysis and an algorithm, in Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, 00 [6] P Schweitzer, A survey of aggregation / disaggregation in large Markov chains, in Numerical Solution of Markov Chain, W Stewart, ed, Marcel Dekker, New York, 99, pp [7] J Shi and J Malik, Normalized cuts and image segmentation, Pattern Analysis and Machine Intelligence, 000 [8] G Stewart and J-G Sun, Matrix Perturbation Theory, Academic Press, 990 [9] S Yu and J Shi, Multiclass spectral clustering, in Computer Vision, 003 Proceedings Ninth IEEE International Conference on, oct 003, pp vol 5

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of