NORMALIZED CUTS ARE APPROXIMATELY INVERSE EXIT TIMES

Size: px
Start display at page:

Download "NORMALIZED CUTS ARE APPROXIMATELY INVERSE EXIT TIMES"

Transcription

1 NORMALIZED CUTS ARE APPROXIMATELY INVERSE EXIT TIMES MATAN GAVISH AND BOAZ NADLER Abstract The Normalized Cut is a widely used measure of separation between clusters in a graph In this paper we provide a novel probabilistic perspective on this measure We show that for a partition of a graph into two weakly connected sets, V = A B, in fact NcutV = /τ A B + /τ B A, where τ A B is the uni-directional characteristic exit time of a random walk from subset A to subset B Using matrix perturbation theory, we show that this approximation is exact to first order in the connection strength between the two subsets A and B, and derive an explicit bound for the approximation error Our result implies that for a Markov chain composed of a reversible component A that is weakly connected to an absorbing cluster B, the easy-to-compute Normalized Cut measure is a reliable proxy for the chain s spectral gap Key words normalized cut, graph partitioning, Markov chain, characteristic exit time, generalized eigenvalues, perturbation bounds, matrix perturbation theory AMS subject classifications 6H30, 5A8, 5A4, 60J0, 60J0, 65H7 Introduction Clustering of data is a fundamental problem in many fields When the data is represented as a symmetric weighted similarity graph, clustering becomes a graph partitioning problem In this paper we focus on the popular normalized cut criteria for graph partitioning The normalized cut Ncut criteria, and its variants based on conductance, are widely used measures of separation between clusters in a graph in a variety of applications [7, 7] Since optimizing the Ncut measure is in general an NP-hard problem, several researchers suggested various spectral-based approximation methods, see for example [3, 5] These spectral methods, in turn, have interesting connections to kernel K-means clustering and to non-negative matrix factorizations, see [4, 5] In this paper we focus on the normalized cut measure itself Similar to previous works [0, 4], we consider a random walk on the weighted graph, with the probability to jump from one node to another proportional to the edge weight between them Our contribution is a novel Markov chain based perspective on this measure Our main result is that for a partition of a graph V into two disjoint sets, V = A B, NcutV τ A B + τ B A where τ A B is the characteristic exit time of a random walk from cluster A to cluster B, as defined explicitly below Furthermore, we provide an explicit bound for the approximation error in Eq, which is second order in the connection strengths between the two subsets A and B Our proof is based on matrix perturbation theory for generalized eigenvalue problems [8] Interestingly, can be viewed as a discrete analog of a well-known result for continuous diffusion processes with two metastable states [9], see Section 4 From a mathematical point of view, when the connection strengths between the two clusters are small, the two clusters become nearly uncoupled and the resulting Markov chain nearly decomposable The properties of such singularly perturbed Markov chains have been a subject of several works, see the surveys [6, ] and the many references therein Some works considered the behavior of the stationary distribution, others considered mean first passage times, typically focusing on singular Laurent series expansions, and yet others considered the important problem of dimensionality reduction and aggregation of states for such nearly reducible Markov chains, see for example [6, ] Our focus, in contrast, is on the relation of such Markov chains to the Department of Statistics, Stanford University, Stanford, 94305, CA gavish@stanfordedu Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, 7600, Israel boaznadler@weizmannacil

2 normalized cut measure - an easy-to-compute quantity use a variety of applied scenarios To this end, our analysis involves the study of a singularly perturbed Markov chain with transient states, as we consider one of the clusters as absorbing with respect to a random walk that starts at the other cluster Furthermore, rather than series expansions, we derive explicit error bounds for the approximation in Our result has several implications and potential applications First, in the context of clustering, Eq provides a precise probabilistic statement regarding the relation between the algebraic, easy-to-compute Ncut measure of a given partition, and the eigenvalues corresponding to local random walks on each of these partitions Our result is different from previously derived global spectral bounds that depend on the eigenvalues of the random walk on the entire graph In particular, our analysis implies that for weakly connected clusters, the normalized cut measure essentially considers only the time it takes for a random walk to exit a given cluster, with taking into account its internal structure As discusse more detail in [], this leads to a fundamental limitation of the Ncut measure to detect structures at different scales in a large graph, and motivates the use of other measures that also take into account the internal characteristics of potential clusters [3] Another potential application of our result is in the study of stochastic dynamical systems an particular their discretization into reversible finite Markov chains [3] In this field, a fundamental problem is the identification of almost invariant sets, eg subsets of phase space where a stochastic system stays for a long time before exiting Our observation above, in the context of clustering, is also relevant here As complex dynamical systems may exhibit invariant sets at very different spatial and time scales, our analysis suggests that the Ncut measure or its various spectral relaxations may not be appropriate in such cases Yet another application of our result is to the numerical evaluation of the spectral gap of nearly decomposable Markov matrices In the study of stochastic dynamical systems, the equivalent quantity of interest are the mean exit times from given or a-priori known metastable sets, and their dependence on various parameters, such as temperature, bond-strengths etc Upon discretization of such systems, this amounts to the exit times of nearly uncoupled Markov chains In this context, one may apply Eq in the reverse direction: Rather than computing the second largest eigenvalue of a potentially very large matrix, and from it the characteristic exit time from this metastable state, one may approximate it by the easy-tocompute normalized cut measure, with explicit guarantees as to its accuracy The paper is organized as follows The normalized cut and relevant notation is introduced in Section Our main result is state Section 3, discusse Section 4, and demonstrated using a simple numerical simulation in Section 5 The proof appears in Section 6, with some auxiliary results deferred to the Appendix Preliminaries Let V, W be a connected weighted graph with node set V = {,, n} and an n n affinity or edge weight matrix W = W i,j i,j n The matrix W is assumed to be symmetric with non-negative entries, and such that W ii > 0 for all i =,, n The normalized cut of a graph partition Consider a partition of the note set into two disjoint clusters, V = A B With loss of generality, we denote the clusters by A = { k} and B = {k +, n} As discusse the introduction, the normalized cut [7] is a widely used, easy to calculate measure of separation for a given graph partition: Definition For a partition V = A B, define Cut A, B = Assoc A, V = i A, j B i A, j V W i,j W i,j

3 Then the normalized cut of A with respect to B is Ncut A, B = Cut A, B Assoc A, V whereas the multiway normalized cut of the partition V = A B is defined as [8, 9] MNcut A Cut A, B Cut B, A B = Ncut A, B + Ncut B, A = + Assoc A, V Assoc B, V The goal of this paper is to provide a precise probabilistic interpretation of this measure, which holds when the clusters A and B are well separated in a sense defined below Notation For any i V, denote by W i,a = j A W i,j W i,b = j B W i,j 3 the total affinity between the node i and the clusters A and B, respectively Similarly, denote W A,A = W i,a = W i,j W A,B = W i,b = W i,j i A i A,j A i A i A,j B In this notation, the normalized cut measure is Ncut A, B = W A,B W A,A 4 + W A,B With an eye towards a matrix perturbation approach, we define the connectivity parameter ε = max i A W i,b 5 W i,a In this paper we focus on situations where W i,b W i,a for all i A, namely all nodes i A are weakly connected to B Equivalently, ε, which allows ε to be used as a small parameter in a matrix perturbation analysis 3 The Uni-directional Exit time τ A B Following [0, 4], we define a Markov random walk on our graph V, W that jumps from node to node at unit time intervals We define the probability to jump from node i to node j as p i,j = W i,j /D i,i, where D i,i = j W i,j For our purposes, it will also be useful to consider random walks on subsets of V, such as a random walk constrained to the cluster A, as well as random walks where some of the nodes are absorbing In particular, for the uni-directional exit time from cluster A to cluster B, we consider a simplification of the weighted graph V, W, in which the cluster B is collapseto a new, single absorbing state b, with no allowed transitions b A Let V A b = A b be the node set of this simplified graph whose corresponding affinity matrix is W, W,k W,B W A b = W k, W k,k 0 W k,b Consider a random walk on this simplified graph, V A b, W A b, whose corresponding Markov matrix is D A b W A b, where D A b is the row-normalization diagonal matrix, Di,i A b = j W i,j A b Denote the eigenvalues of this matrix by = λ A b λ A b λ A b k+ 3

4 Assume that the set A is connected, so that > λ A b, and moreover that λ A b is the second largest eigenvalue in absolute value The characteristic exit time from A to B is defined as the relaxation time of this chain, τ A B = λ A b 6 3 Main result Our main result is a relation between the exit time τ A B and the normalized cut measure Eq To this end, it will be useful to consider in addition a random walk constrained to stay in the cluster A Under the assumption that A is connected, this chain is reversible We denote by λ A its second largest eigenvalue and by Γ A = λ A 3 its spectral gap Again we assume that the second largest eigenvalue λ A largest in absolute value We are finally ready to state our main result is also the second Theorem 3 Let G be a symmetric graph with affinity matrix W Consider a binary partition of the graph V = A B Assume that the subset A is connected and that for the random walk constrained to remain in A, the second largest eigenvalue λ A is also second largest W in absolute value Denote by ε = max i,b i A the connectivity parameter W i,a For any ε 0, Assume that τ A B Ncut A, B 3 0 < ε < ΓA 8 33 Then Ncut A, B = τ A B + ε Rε, 34 where 0 Rε 6 Γ A { } W Corollary 3 Denote ε = max max i,b W i A, max i,a W i,a i B and assume that W i,b 0 < ε < 8 min { Γ A, Γ B} Then where MNcut A, B = τ A B + τ B A + ε Rε, 35 0 Rε 6 Γ A + Γ B 4

5 4 Discussion For a binary partition, Eq 35 gives a novel probabilistic interpretation to the normalized cut measure, in terms of characteristic exit times from each of the two subsets For a partition into s disjoint subsets, V = s j= A j, Eq 34 generalizes to a relation between the multiway normalized cut measure and the exit times from each of the s clusters, s MNcut A j = + ε Rε 4 τ j= Aj A c j where A c j = V \ A j and a remainder term Rε 0 with a bound analogous to the one in the binary case 4 Limitations The key assumption limiting the applicability of our result is Eq 33, which requires that ε = max i W i,b /W i,a < λ A /8 Hence, our result holds only when A and B are weakly connected, compared to the internal spectral gap of a random walk constrained to remain in A In addition, while a small value of ε mplies a small normalized cut, since NcutA, B ε, the converse is in general false: it is possible to construct examples where the normalized cut is small yet ε is large Indeed ε is a worse-case, rather than an on-average measure of separation between clusters Informally, a small value of ε means that the boundary between the clusters A and B is pronounced, in that each node in A has much stronger overall connection to nodes in A than to nodes in B 4 Sharpness of the upper bound A closer inspection of the proof shows that it in fact gives a stronger bound on the residual Rε from Eq 34 To this end, consider a Markov chain X t on A b that starts from the stationary distribution π inside A, { W i,a /W A,A i A π i = 0 i = b Then the one-step transition probability to the absorbing state b is P A B = P X = b X 0 = i π i = [ W i,b ] i,a W W i,a + W i,b W A,A 4 i A i A To factor the small connectivity magnitude ε, we write P A B = εq A B By definition, this quantity satisfies that Q A B and measures the one-step absorption probability in units of ε, when the Markov chain on A is in its stationary distribution Then an improved bound on the residual is Rε 6 Γ A min {Q A B, } Finally, let us remark that the coefficient 6 above is by no means sharp For example, in the numerical experiment described below, the actual residual observed was approximately 0 ε Γ A 43 Comparison to previous spectral bounds Several authors derived the following lower bound on the multiway normalized cut of a partition into s clusters V = s j= A j: s MNcut A j λ j V, 43 where λ j V are the eigenvalues of the global random walk matrix on the whole graph G [8, ] In contrast, our result Eq 4 is different as it provides under suitable cluster separability conditions both a lower and a precise upper bound on MNcutV in terms of local properties of the graph: the between cluster connectivity, internal relaxation times within each cluster, and the characteristic exit times from them These quantities depend on eigenvalues of various normalized submatrices of G, which are in general different from the global eigenvalues λ j V 5 j=

6 44 Insensitivity to internal structure One implication of our result is that when A and B are weakly connected, even though NcutA, B contains a normalization of the cut by the volume of a cluster, it nonetheless considers mainly the exit time from A, with taking into account its internal characteristics For example, the set A may in fact contain a disconnected or weakly connected subgraph, but this structure is not captured by the normalized cut measure As discusse [], this leads to some limitations of the normalized cut measure in discovering clusters in graphs 45 Approximation of exit times by Ncut In the study of weakly coupled stochastic dynamical systems, a quantity of interest is the characteristic exit time from a given set of nodes At low temperatures corresponding to extremely small values of ε, numerical methods for calculating λ A b and hence τ A B may be unstable as λ A b becomes very close to In contrast, the N cut measure can still be calculated efficiently Our result thus provides a-priori guarantees for the accuracy of this approximation to λ A b Similarly, the N cut approximation can be computed for extremely large graphs, for which eigenvalue computations may be quite challenging or even no longer feasible 46 Analogy to continuum diffusion Finally, we mention that our result can also be viewed as a discrete analog of known results regarding the second eigenvalue for a continuum diffusion process with two metastable states [9] Consider a particle performing diffusion in a multi-dimensional potential with two wells separated by a high potential barrier In this case, asymptotically as temperature tends to zero equivalently in our case ε 0, the second non-trivial eigenvalue µ of the Fokker-Planck diffusion operator with two metastable states A, B corresponding to the two potential wells is approximately given by µ τ A B + τ B A In analogy, this suggests that when there are two well defined clusters in a discrete finite graph, then the second eigenvalue of the whole graph is close to Eq A detailed study of this issue, which is related to isoperimetric and Cheeger inequalities on graphs, is an interesting problem beyond the scope of this paper 5 A Numerical Example One way to evaluate the tightness of the upper bound 34 is to compare the residual NcutA, B /τ A B with our upper bound of 6 ε in a numerical Γ A simulation Unfortunately, as note Section 45, general-purpose numerical procedures do not allow us to measure λ A b when ε We now construct a simple class of examples where λ A b can be calculated analytically, allowing us to test the true residual against our upper bound, for values of ε close to machine precision Assume that the cluster A is a d-regular graph on k nodes, drawn from the uniform distribution, so that the spectral gap Γ A is reasonably large When all vertices of A have the same exit probability α to the absorbing state b, it is easy to see that Theorem 3 holds with no approximation error, namely NcutA, b = λ A b = /τ A b The simplest example where the exit probabilities are not all equal is when they are equal to α on one half of A and to rα on the other half of A, for some r > 0 Consider then two d- regular graphs V i, W i, i =,, sampled from the uniform distribution on d-regular graphs Denote V = {,, k} and V = {k +, k}, and let b = k + be the absorbing state Define an affinity matrix W on V = {,, k + } by W i,j = or W i,j W i,j = α i k, j = k + rα k + i k, j = k + = or i j = k so that each node in V is connected to a corresponding node in V, to its neighbors in V, and to the absorbing state Note that ε = rα/d + and that λ A b α is the second 6

7 x 0 3 λ Ncut and upper bound Ncut λ + 6 ε / Γ A True residual and upper bound on a log scale Ncut λ 6 ε / Γ A x 0 3 Ncut and "better" upper bound λ Ncut λ + ε / Γ A ε x logε ε x 0 3 Fig 5 Numerical Comparison of the normalized cut and the inverse exit time approximation eigenvalue of the 3 3 Markov matrix d d++α d++rα d++α d d++rα α d++α rα d++rα 0 0, allowing us to calculate it analytically Figure 5 summarizes the result of this experiment for k = 50, d = 7, r = 0, and ε [0 6, 0 ] We compared the exact value of the residual NcutA, b τ A b = NcutA, b λ A b with the upper bound 6 ε In the case under study, the upper bound is informative Γ A yet pessimistic left panel, as the approximation Ncut λ A b is quite exact The true residual NcutA, b λ A b is in this case exactly quadratic in ε middle panel, log scale and our quadratic bound is pessimistic by a multiplicative factor of roughly 80 Replacing the constant 6 in our main result, which is presumably not tight due to limitations of our proof technique, with the constant, we also compared the residual NcutA, b λ A b with the quantity ε /Γ A right panel The reader is invited to re-create this experiment, replacing the values k,d,r and α with arbitrary values The Matlab script and the original 7-regular graph used here are available at the author homepage 6 Proof of the main result To simplify the proof, we first introduce the following notation: We write for the vector W i,a i A and d for the vector ε W i,b i A Hence i is the in-degree of a node i A, whereas d i is its -degree scaled by the connectivity parameter ε, so that d i = O Similarly, we define = i A i = W A,A and d = i A d i = ε W A,B Our main object of interest, λ A b, will be denoted simply by λ, whereas the Ncut measure is now given by NcutA, B = W A,B W A,A + W A,B = εd + εd 7

8 For future use, we denote D in = diag,, k, D = diag d,, d k and D ε = D in + εd, so that D A b Din + εd = 0 D ε 0 = 0 0 Finally, let W A be the upper k k minor of W, namely W A i,j = W i,j, for i, j k 6 Passing to a generalized eigenvalue problem Recall that λ A b is the second D A b largest root of the characteristic polynomial det W A b λi = 0, or equivalently of det W A b λd A b = 0 Expanding the latter determinant by the last row of W A b gives det W A b λd A b = ± λ det W A λd ε Since λ A b <, it is enough to consider the smaller k k matrices D ε and W A Obviously, x 0 is an eigenvector of D ε W A corresponding to an eigenvalue λ if and only if W A x = λd ε x 6 When 6 holds, we say that λ is a generalized eigenvalue of the matrix pair W A, D ε Clearly, the solutions to 6 are λ λ A b λ A b k+ 6 Lower bound for N cut A, B The lower bound 3 follows from a min-max characterization of generalized eigenvalues [8, pp 8 part VI corollary 6], whereby the largest eigenvalue of the matrix pair W A, D ε is given by λ A b x W A x = max x 0 x D ε x W A D ε = = NcutA, B 6 + εd We note that this simple inequality can also be derived by other methods For example, from a lemma by Ky Fan [8, ] it follows that NcutA, B λ A b λ A b = λ A b 63 Upper bound The proof of the upper bound relies on perturbation theory of generalized eigenvalues [8, ch 6], where we treat matrix pair above as a perturbed pair: W A, D ε = W A, D in + ε 0, D Two facts simplify the perturbation analysis to follow First, the perturbation does not affect the left matrix W A Second, the right hand side matrices on both the perturbed and unperturbed pair are diagonal We will need the following key lemma, which is prove the Appendix Recall the notation Q A B = ε P A B, as define 4 Lemma 6 If ε < ΓA 8, then there exists β R with β 8 Γ A min {Q A B, } such that Λ = + ε d + ε β 63 is a generalized eigenvalue of the matrix pair W A, D ε Given our assumption that ε satisfies 33, Lemma 6 holds Hence let β be such that Λ = + ε d + ε β is a generalized eigenvalue of the matrix pair W A, D ε We now conclude the proof in two steps: We show that NcutA, B = Λ + ε Rε 8

9 We then show that Λ must in fact be this pair s largest eigenvalue, so that necessarily Λ = λ From these two steps the theorem readily follows For the first step, we rewrite 63 as Λ = + εd + ε β +εd 64 Now, for any δ [, ] we have +δ + δ As ε < ΓA 8 and β 8 Γ A, obviously ε β + εd Γ A 8 β Hence, + ε β +εd Combining 64 and 65 gives that + ε β + εd + ε β 65 Λ = + ε d + ε β = + ε R ε = NcutA, B + ε Rε + εd with as required R ε 6 Γ A min {Q A B, } For the second step, we need to show that if ε < ΓA 8, then the eigenvalue Λ is indeed the largest eigenvalue, λ This is achieved by proving that λ A b 3, the next largest eigenvalue of W A, D ε, is strictly smaller than Λ To this end, we appeal to [8, pp 34 part VI theorem 3] whereby if x D x ζ = ε max <, 66 x = x W A x + x D in x then λ A λ A b 3 λ A λ + A b 3 + ζ 67 Here as before, λ A is the second eigenvalue of W A, D0 To see that the condition ζ < holds, observe that from 66 we have We thus deduce from 67 that λ A b 3 λ A x D x ζ ε max x = x D in x = ε max d i i i = ε < λ A λ A b 3 λ A λ + A b ζ ε,

10 Fig 6 Localization intervals of various eigenvalues an particular λ A b 3 λ A + ε 68 To complete the second step, as illustrate Figure 6, we now localize Λ and λ A b 3 in two disjoint intervals, which implies that necessarily λ A b 3 < Λ To this end, first observe that for all δ [, ], we have δ +δ Similarly to 65, this implies that Λ = ε β, + εd + ε β + εd + εd so that Λ +εd +εd ε β Now, clearly + εd = + ε d ε d ε Finally, since β 8 Γ A and ε < ΓA 8, Λ + εd ε β ε ε = ε 69 Combining 69, 68 and the fact that Γ A = λ A 8ε, we obtain λ A b 3 Γ A + ε 8ε + ε = 6ε < ε Λ Hence, the eigenvalue Λ is strictly larger than λ A b 3, namely Λ = λ A b The requested upper bound and Theorem 3 follow Acknowledgements We thank the anonymous refrees for their helpful comments and suggestions MG is supported by a William R and Sara Hart Kimball Stanford Graduate Fellowship BN s research was partly supported by a grant from the ISF Appendix A Lemma 6 finds an eigenvalue for the perturbed matrix W A, D ε = W A, D in + ε 0, D To prove the lemma, following [8], we introduce a tool to study the effect of a diagonal perturbation on generalized eigenvalues 0

11 Let the spectrum of the reversible Markov matrix D in W A be = λ A > λ A λ A 3 λ A k, A and define A = λ A λ A k A Equip the product space R k R k with the norm v, w F = max { v, w } where v, w R k and is the usual Euclidean norm We now define an operator T : R k R k R k R k by T v, w = w + A v, w + v, and its associated quantity dif [A ] by dif [A ] = inf T v, w F v,w F = A3 The following auxiliary lemma relates this quantity to the spectral gap of A Lemma A Assume that λ A = max j λ A j Then, dif[a ] = λa = ΓA Proof Let v, w be a pair with v, w F = Therefore, either w = or v = When w =, Since v and A = λ A, it follows that T v, w F w + A v w A v T v, w F λ A = Γ A Next, consider the case where v = Then by definition A v λ A < It is easy to see that inside the unit ball w, the minimizer of T v, w is simply the average w = v + A v, which gives T v, w F = v A v / To conclude the proof we note that since A is a diagonal matrix with largest entry λ A position,, the quantity v A v is minimized at the vector v = e In this case in e A e = λa = ΓA Proof of Lemma 6: Recall that the matrix pair W A, D ε is a sum of the pair W A, D in and a diagonal perturbation: W A, D ε = W A, D in + ε 0, D

12 For the unperturbed pair, observe that its generalized eigenvalues are all real, since the matrix D in W A is similar to the symmetric matrix D in D in W A D in = D in W A D in Furthermore, since D in W A is the transition matrix of an irreducible Markov chain, the largest eigenvalue in A is λ A = and the remaining eigenvalues are all strictly smaller than in absolute value Now, the vector of length k x = din D in,, is an eigenvector of D in W A D in corresponding to the eigenvalue λa = Let U = x u u k be the k k orthogonal matrix that diagonalizes D in W A D in, namely D in W A D in = U 0 λ A λ A k U Now define X = D in U and decompose X to blocks as X = X X where X = din,, is a k vector and X is the remaining k k matrix, X = D in u u k Clearly we have X W A X = λ A λ A k and X D in X = I k In matrix block notation, and X X X X W A 0 X X = 0 A 0 D in X X = 0 I k With this simultaneous diagonalization of the unperturbed matrix pair W A, D in at hand, we turn to the perturbation pair, 0, D Again using matrix block notation, denote X d X D X X = d d D,

13 where d is a k vector, d = d and D is a k k matrix It is straightforward to verify that d = d din din,, d din k u u k A4 The amount by which the eigenvalue λ A is perturbed as ε departs from 0 depends primarily on the norms of d and D For d we have that d k i= d i i = k i= d i dini d = in i k i= d i d in i πa i, where π A i = dini is the stationary distribution of a random walk inside A Furthermore, d since max i i d =, it follows that for all i A, i ini [i + εd i] Hence, d k i= Since by A4 clearly d d i d in i πa i k i=, we can write d d i i + εd i πa i = Q A B min {Q A B, } As for D, it is easy to verify that since U is orthogonal, D F A5 Now, as the generalized eigenvalues of the pair A, I k are strictly smaller than in absolute value, the column X spans a simple eigenspace of the unperturbed pair W A, D in We can thus appeal to the main instrument in our proof, a theorem from matrix perturbation theory for generalized eigenvalue problems [8, pp 30 part VI theorem 5] In our setting and our notation, this theorem states that if dif [A ] > 0 and if εγ dif [A ] ερ <, A6 where γ = d = d ρ = d + D d F, in then there exists p R k with p γ dif [A ] ερ A7 such that the pair, + ε d + ε d p is a generalized eigenvalue of the matrix pair W A, D ε The first condition in this theorem, namely dif [A ] > 0, follows directly from Lemma A For the second condition, we need to bound the constants γ and ρ, as follows For γ, we have seen in A5 that γ = d = d Similarly, we have ρ = d + D F 3

14 Suppose now that ε < ΓA 8 Then we have ερ < ΓA 4 Invoking Lemma A, whereby dif [A ] = ΓA, we get that dif [A ] ερ > ΓA ΓA 4 = ΓA 4 A8 and therefore εγ dif [A ] ερ < ε Γ A 4 < Γ A 8 Γ A 4 = so the second condition of the theorem, condition A6, is satisfied In conclusion, the theorem above holds, so a vector p as above exists, with γ p dif [A ] ερ Finally, using A9, A8 and A5 and the definition γ = d, we have d p p d d This completes the proof of Lemma 6 Γ A 4 γ d dif [A ] ερ 8 Γ A min {Q A B, } = d dif [A ] ερ A9 REFERENCES [] K Avrachenkov, J Fila, and M Haviv, Singular perturbations of Markov chains and decision processes A survey, in Markov Decision Processes: Models, Methods, Directions and Open Problems, E Feinberg and A Schwartz, eds, Kluwer, 00, pp 3 5 [] F Bach and M I Jordan, Learning spectral clustering with applications to speech separation, Journal of Machine Learning Research, 7 006, pp [3] P Deuflhard, W Huisinga, A Fischer, and C Schutte, Identification of almost invariant aggregates in reversible nearly uncoupled Markov chains, Linear Algebra and its Applications, , pp [4] I Dhillon, Y Guan, and B Kulis, Kernel k-means: spectral clustering and normalized cuts, in Proceedings of the tenth ACM SIGKDD conference, ACM, New York, 004 [5] C Ding, X He, and H Simon, On the equivalence of nonnegative matrix factorization and spectral clustering, in Proceedings of the Fifth SIAM International Conference on Data Mining, 005 [6] M Haviv and Y Ritov, On series expansions and stochastic matrices, SIAM J Matrix Anal Appl, 4 993, pp [7] R Kannan, S Vempala, and A Vetta, On clusterings: good, bad and spectral, J ACM, 5 004, pp [8] S S M Meila and L Xu, Regularized spectral learning, in Proceedings of the Artificial Intelligence and Statistics Workshop, R Cowell and Z Ghahramani, eds, 005 [9] B Matkowsky and Z Schuss, Eigenvalues of the Fokker-Plank operator and the approach to equilibrium for diffusions in potential fields, SIAM J Appl Math, 40 98, pp 4 54 [0] M Meila and J Shi, A random walks view of spectral segmentation, in Artificial Intelligence and Statistics Conference, 00 [] C Meyer, Stochastic complementation, uncoupling Markov chains, and the theory of nearly reducible systems, SIAM Review, 3 989, pp 40 7 [] B Nadler and M Galun, Fundamental limitations of spectral clustering, in Advances in Neural Information Processing Systems 9, B Schölkopf, J Platt, and T Hoffman, eds, MIT Press, Cambridge, MA, 007, pp [3] B Nadler, M Galun, and T Gross, Clustering and multi-scale structure of graphs: a scale-invariant diffusion based approach, in preparation, 0 [4] B Nadler, S Lafon, R Coifman, and I Kevrekidis, Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators, in Advances in Neural Information Processing Systems 8, Y Weiss, B Schölkopf, and J Platt, eds, MIT Press, Cambridge, MA, 006, pp

15 [5] A Ng, M Jordan, and Y Weiss, On spectral clustering: Analysis and an algorithm, in Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, 00 [6] P Schweitzer, A survey of aggregation / disaggregation in large Markov chains, in Numerical Solution of Markov Chain, W Stewart, ed, Marcel Dekker, New York, 99, pp [7] J Shi and J Malik, Normalized cuts and image segmentation, Pattern Analysis and Machine Intelligence, 000 [8] G Stewart and J-G Sun, Matrix Perturbation Theory, Academic Press, 990 [9] S Yu and J Shi, Multiclass spectral clustering, in Computer Vision, 003 Proceedings Ninth IEEE International Conference on, oct 003, pp vol 5

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of

More information

An indicator for the number of clusters using a linear map to simplex structure

An indicator for the number of clusters using a linear map to simplex structure An indicator for the number of clusters using a linear map to simplex structure Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep Zuse Institute Berlin ZIB Takustraße 7, D-495 Berlin, Germany

More information

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University) Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University) The authors explain how the NCut algorithm for graph bisection

More information

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

Markov Chains and Spectral Clustering

Markov Chains and Spectral Clustering Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu

More information

Limits of Spectral Clustering

Limits of Spectral Clustering Limits of Spectral Clustering Ulrike von Luxburg and Olivier Bousquet Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tübingen, Germany {ulrike.luxburg,olivier.bousquet}@tuebingen.mpg.de

More information

Spectral Clustering. Guokun Lai 2016/10

Spectral Clustering. Guokun Lai 2016/10 Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph

More information

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators Boaz Nadler Stéphane Lafon Ronald R. Coifman Department of Mathematics, Yale University, New Haven, CT 652. {boaz.nadler,stephane.lafon,ronald.coifman}@yale.edu

More information

Self-Tuning Spectral Clustering

Self-Tuning Spectral Clustering Self-Tuning Spectral Clustering Lihi Zelnik-Manor Pietro Perona Department of Electrical Engineering Department of Electrical Engineering California Institute of Technology California Institute of Technology

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance Marina Meilă University of Washington Department of Statistics Box 354322 Seattle, WA

More information

The Second Eigenvalue of the Google Matrix

The Second Eigenvalue of the Google Matrix The Second Eigenvalue of the Google Matrix Taher H. Haveliwala and Sepandar D. Kamvar Stanford University {taherh,sdkamvar}@cs.stanford.edu Abstract. We determine analytically the modulus of the second

More information

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators Boaz Nadler Stéphane Lafon Ronald R. Coifman Department of Mathematics, Yale University, New Haven, CT 652. {boaz.nadler,stephane.lafon,ronald.coifman}@yale.edu

More information

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

More information

THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR

THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR WEN LI AND MICHAEL K. NG Abstract. In this paper, we study the perturbation bound for the spectral radius of an m th - order n-dimensional

More information

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These

More information

The Distribution of Mixing Times in Markov Chains

The Distribution of Mixing Times in Markov Chains The Distribution of Mixing Times in Markov Chains Jeffrey J. Hunter School of Computing & Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand December 2010 Abstract The distribution

More information

Fundamental Limitations of Spectral Clustering

Fundamental Limitations of Spectral Clustering Fundamental Limitations of Spectral Clustering Boaz Nadler, Meirav Galun Department of Applied Mathematics and Computer Science Weizmann Institute of Science, Rehovot, Israel 761 boaz.nadler,meirav.galun@weizmann.ac.il

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY

OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY published in IMA Journal of Numerical Analysis (IMAJNA), Vol. 23, 1-9, 23. OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY SIEGFRIED M. RUMP Abstract. In this note we give lower

More information

Spectral Clustering on Handwritten Digits Database

Spectral Clustering on Handwritten Digits Database University of Maryland-College Park Advance Scientific Computing I,II Spectral Clustering on Handwritten Digits Database Author: Danielle Middlebrooks Dmiddle1@math.umd.edu Second year AMSC Student Advisor:

More information

Spectral Techniques for Clustering

Spectral Techniques for Clustering Nicola Rebagliati 1/54 Spectral Techniques for Clustering Nicola Rebagliati 29 April, 2010 Nicola Rebagliati 2/54 Thesis Outline 1 2 Data Representation for Clustering Setting Data Representation and Methods

More information

Graph Metrics and Dimension Reduction

Graph Metrics and Dimension Reduction Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November

More information

MATH 567: Mathematical Techniques in Data Science Clustering II

MATH 567: Mathematical Techniques in Data Science Clustering II This lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 567: Mathematical Techniques in Data Science Clustering II Dominique Guillot Departments

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms : Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer

More information

Markov Chains and Stochastic Sampling

Markov Chains and Stochastic Sampling Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,

More information

ECS231: Spectral Partitioning. Based on Berkeley s CS267 lecture on graph partition

ECS231: Spectral Partitioning. Based on Berkeley s CS267 lecture on graph partition ECS231: Spectral Partitioning Based on Berkeley s CS267 lecture on graph partition 1 Definition of graph partitioning Given a graph G = (N, E, W N, W E ) N = nodes (or vertices), E = edges W N = node weights

More information

Markov Chains, Random Walks on Graphs, and the Laplacian

Markov Chains, Random Walks on Graphs, and the Laplacian Markov Chains, Random Walks on Graphs, and the Laplacian CMPSCI 791BB: Advanced ML Sridhar Mahadevan Random Walks! There is significant interest in the problem of random walks! Markov chain analysis! Computer

More information

On the eigenvalues of specially low-rank perturbed matrices

On the eigenvalues of specially low-rank perturbed matrices On the eigenvalues of specially low-rank perturbed matrices Yunkai Zhou April 12, 2011 Abstract We study the eigenvalues of a matrix A perturbed by a few special low-rank matrices. The perturbation is

More information

Lecture 7: Positive Semidefinite Matrices

Lecture 7: Positive Semidefinite Matrices Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.

More information

Eugene Wigner [4]: in the natural sciences, Communications in Pure and Applied Mathematics, XIII, (1960), 1 14.

Eugene Wigner [4]: in the natural sciences, Communications in Pure and Applied Mathematics, XIII, (1960), 1 14. Introduction Eugene Wigner [4]: The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve.

More information

Notes on Linear Algebra and Matrix Theory

Notes on Linear Algebra and Matrix Theory Massimo Franceschet featuring Enrico Bozzo Scalar product The scalar product (a.k.a. dot product or inner product) of two real vectors x = (x 1,..., x n ) and y = (y 1,..., y n ) is not a vector but a

More information

On the mean connected induced subgraph order of cographs

On the mean connected induced subgraph order of cographs AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 71(1) (018), Pages 161 183 On the mean connected induced subgraph order of cographs Matthew E Kroeker Lucas Mol Ortrud R Oellermann University of Winnipeg Winnipeg,

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Graph Clustering Algorithms

Graph Clustering Algorithms PhD Course on Graph Mining Algorithms, Università di Pisa February, 2018 Clustering: Intuition to Formalization Task Partition a graph into natural groups so that the nodes in the same cluster are more

More information

Space-Variant Computer Vision: A Graph Theoretic Approach

Space-Variant Computer Vision: A Graph Theoretic Approach p.1/65 Space-Variant Computer Vision: A Graph Theoretic Approach Leo Grady Cognitive and Neural Systems Boston University p.2/65 Outline of talk Space-variant vision - Why and how of graph theory Anisotropic

More information

1 9/5 Matrices, vectors, and their applications

1 9/5 Matrices, vectors, and their applications 1 9/5 Matrices, vectors, and their applications Algebra: study of objects and operations on them. Linear algebra: object: matrices and vectors. operations: addition, multiplication etc. Algorithms/Geometric

More information

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing Advances in Dynamical Systems and Applications ISSN 0973-5321, Volume 8, Number 2, pp. 401 412 (2013) http://campus.mst.edu/adsa Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

SENSITIVITY OF THE STATIONARY DISTRIBUTION OF A MARKOV CHAIN*

SENSITIVITY OF THE STATIONARY DISTRIBUTION OF A MARKOV CHAIN* SIAM J Matrix Anal Appl c 1994 Society for Industrial and Applied Mathematics Vol 15, No 3, pp 715-728, July, 1994 001 SENSITIVITY OF THE STATIONARY DISTRIBUTION OF A MARKOV CHAIN* CARL D MEYER Abstract

More information

We first repeat some well known facts about condition numbers for normwise and componentwise perturbations. Consider the matrix

We first repeat some well known facts about condition numbers for normwise and componentwise perturbations. Consider the matrix BIT 39(1), pp. 143 151, 1999 ILL-CONDITIONEDNESS NEEDS NOT BE COMPONENTWISE NEAR TO ILL-POSEDNESS FOR LEAST SQUARES PROBLEMS SIEGFRIED M. RUMP Abstract. The condition number of a problem measures the sensitivity

More information

Learning Spectral Graph Segmentation

Learning Spectral Graph Segmentation Learning Spectral Graph Segmentation AISTATS 2005 Timothée Cour Jianbo Shi Nicolas Gogin Computer and Information Science Department University of Pennsylvania Computer Science Ecole Polytechnique Graph-based

More information

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Prof. Tesler Math 283 Fall 2016 Also see the separate version of this with Matlab and R commands. Prof. Tesler Diagonalizing

More information

Examples include: (a) the Lorenz system for climate and weather modeling (b) the Hodgkin-Huxley system for neuron modeling

Examples include: (a) the Lorenz system for climate and weather modeling (b) the Hodgkin-Huxley system for neuron modeling 1 Introduction Many natural processes can be viewed as dynamical systems, where the system is represented by a set of state variables and its evolution governed by a set of differential equations. Examples

More information

Robust Motion Segmentation by Spectral Clustering

Robust Motion Segmentation by Spectral Clustering Robust Motion Segmentation by Spectral Clustering Hongbin Wang and Phil F. Culverhouse Centre for Robotics Intelligent Systems University of Plymouth Plymouth, PL4 8AA, UK {hongbin.wang, P.Culverhouse}@plymouth.ac.uk

More information

Stochastic process for macro

Stochastic process for macro Stochastic process for macro Tianxiao Zheng SAIF 1. Stochastic process The state of a system {X t } evolves probabilistically in time. The joint probability distribution is given by Pr(X t1, t 1 ; X t2,

More information

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Prof. Tesler Math 283 Fall 2018 Also see the separate version of this with Matlab and R commands. Prof. Tesler Diagonalizing

More information

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Lecture 12 : Graph Laplacians and Cheeger s Inequality CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an

More information

Majorizations for the Eigenvectors of Graph-Adjacency Matrices: A Tool for Complex Network Design

Majorizations for the Eigenvectors of Graph-Adjacency Matrices: A Tool for Complex Network Design Majorizations for the Eigenvectors of Graph-Adjacency Matrices: A Tool for Complex Network Design Rahul Dhal Electrical Engineering and Computer Science Washington State University Pullman, WA rdhal@eecs.wsu.edu

More information

Detailed Proof of The PerronFrobenius Theorem

Detailed Proof of The PerronFrobenius Theorem Detailed Proof of The PerronFrobenius Theorem Arseny M Shur Ural Federal University October 30, 2016 1 Introduction This famous theorem has numerous applications, but to apply it you should understand

More information

Approximating sparse binary matrices in the cut-norm

Approximating sparse binary matrices in the cut-norm Approximating sparse binary matrices in the cut-norm Noga Alon Abstract The cut-norm A C of a real matrix A = (a ij ) i R,j S is the maximum, over all I R, J S of the quantity i I,j J a ij. We show that

More information

Introduction to Spectral Graph Theory and Graph Clustering

Introduction to Spectral Graph Theory and Graph Clustering Introduction to Spectral Graph Theory and Graph Clustering Chengming Jiang ECS 231 Spring 2016 University of California, Davis 1 / 40 Motivation Image partitioning in computer vision 2 / 40 Motivation

More information

Absolute value equations

Absolute value equations Linear Algebra and its Applications 419 (2006) 359 367 www.elsevier.com/locate/laa Absolute value equations O.L. Mangasarian, R.R. Meyer Computer Sciences Department, University of Wisconsin, 1210 West

More information

Comparison of perturbation bounds for the stationary distribution of a Markov chain

Comparison of perturbation bounds for the stationary distribution of a Markov chain Linear Algebra and its Applications 335 (00) 37 50 www.elsevier.com/locate/laa Comparison of perturbation bounds for the stationary distribution of a Markov chain Grace E. Cho a, Carl D. Meyer b,, a Mathematics

More information

Spectral Graph Theory Lecture 3. Fundamental Graphs. Daniel A. Spielman September 5, 2018

Spectral Graph Theory Lecture 3. Fundamental Graphs. Daniel A. Spielman September 5, 2018 Spectral Graph Theory Lecture 3 Fundamental Graphs Daniel A. Spielman September 5, 2018 3.1 Overview We will bound and derive the eigenvalues of the Laplacian matrices of some fundamental graphs, including

More information

5 Quiver Representations

5 Quiver Representations 5 Quiver Representations 5. Problems Problem 5.. Field embeddings. Recall that k(y,..., y m ) denotes the field of rational functions of y,..., y m over a field k. Let f : k[x,..., x n ] k(y,..., y m )

More information

CS 664 Segmentation (2) Daniel Huttenlocher

CS 664 Segmentation (2) Daniel Huttenlocher CS 664 Segmentation (2) Daniel Huttenlocher Recap Last time covered perceptual organization more broadly, focused in on pixel-wise segmentation Covered local graph-based methods such as MST and Felzenszwalb-Huttenlocher

More information

Decentralized Stabilization of Heterogeneous Linear Multi-Agent Systems

Decentralized Stabilization of Heterogeneous Linear Multi-Agent Systems 1 Decentralized Stabilization of Heterogeneous Linear Multi-Agent Systems Mauro Franceschelli, Andrea Gasparri, Alessandro Giua, and Giovanni Ulivi Abstract In this paper the formation stabilization problem

More information

Doubly Stochastic Normalization for Spectral Clustering

Doubly Stochastic Normalization for Spectral Clustering Doubly Stochastic Normalization for Spectral Clustering Ron Zass and Amnon Shashua Abstract In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that

More information

RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY

RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY ILSE C.F. IPSEN Abstract. Absolute and relative perturbation bounds for Ritz values of complex square matrices are presented. The bounds exploit quasi-sparsity

More information

Finding eigenvalues for matrices acting on subspaces

Finding eigenvalues for matrices acting on subspaces Finding eigenvalues for matrices acting on subspaces Jakeniah Christiansen Department of Mathematics and Statistics Calvin College Grand Rapids, MI 49546 Faculty advisor: Prof Todd Kapitula Department

More information

OPERATIONS on large matrices are a cornerstone of

OPERATIONS on large matrices are a cornerstone of 1 On Sparse Representations of Linear Operators and the Approximation of Matrix Products Mohamed-Ali Belabbas and Patrick J. Wolfe arxiv:0707.4448v2 [cs.ds] 26 Jun 2009 Abstract Thus far, sparse representations

More information

Characterization of half-radial matrices

Characterization of half-radial matrices Characterization of half-radial matrices Iveta Hnětynková, Petr Tichý Faculty of Mathematics and Physics, Charles University, Sokolovská 83, Prague 8, Czech Republic Abstract Numerical radius r(a) is the

More information

Communities, Spectral Clustering, and Random Walks

Communities, Spectral Clustering, and Random Walks Communities, Spectral Clustering, and Random Walks David Bindel Department of Computer Science Cornell University 26 Sep 2011 20 21 19 16 22 28 17 18 29 26 27 30 23 1 25 5 8 24 2 4 14 3 9 13 15 11 10 12

More information

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88 Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

More information

Linear Spectral Hashing

Linear Spectral Hashing Linear Spectral Hashing Zalán Bodó and Lehel Csató Babeş Bolyai University - Faculty of Mathematics and Computer Science Kogălniceanu 1., 484 Cluj-Napoca - Romania Abstract. assigns binary hash keys to

More information

Basis Construction from Power Series Expansions of Value Functions

Basis Construction from Power Series Expansions of Value Functions Basis Construction from Power Series Expansions of Value Functions Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA 3 mahadeva@cs.umass.edu Bo Liu Department of

More information

Noise Robust Spectral Clustering

Noise Robust Spectral Clustering Noise Robust Spectral Clustering Zhenguo Li, Jianzhuang Liu, Shifeng Chen, and Xiaoou Tang, Dept. of Information Engineering Microsoft Research Asia The Chinese University of Hong Kong Beijing, China {zgli5,jzliu,sfchen5}@ie.cuhk.edu.hk

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes Repeated Eigenvalues and Symmetric Matrices. Introduction In this Section we further develop the theory of eigenvalues and eigenvectors in two distinct directions. Firstly we look at matrices where one

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

From graph to manifold Laplacian: The convergence rate

From graph to manifold Laplacian: The convergence rate Appl. Comput. Harmon. Anal. 2 (2006) 28 34 www.elsevier.com/locate/acha Letter to the Editor From graph to manifold Laplacian: The convergence rate A. Singer Department of athematics, Yale University,

More information

Online solution of the average cost Kullback-Leibler optimization problem

Online solution of the average cost Kullback-Leibler optimization problem Online solution of the average cost Kullback-Leibler optimization problem Joris Bierkens Radboud University Nijmegen j.bierkens@science.ru.nl Bert Kappen Radboud University Nijmegen b.kappen@science.ru.nl

More information

Reconstruction and Higher Dimensional Geometry

Reconstruction and Higher Dimensional Geometry Reconstruction and Higher Dimensional Geometry Hongyu He Department of Mathematics Louisiana State University email: hongyu@math.lsu.edu Abstract Tutte proved that, if two graphs, both with more than two

More information

Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data

Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data Boaz Nadler Dept. of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, Israel 76 boaz.nadler@weizmann.ac.il

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Graphs in Machine Learning

Graphs in Machine Learning Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman January 27, 2015 MVA 2014/2015

More information

Eigenvalues, random walks and Ramanujan graphs

Eigenvalues, random walks and Ramanujan graphs Eigenvalues, random walks and Ramanujan graphs David Ellis 1 The Expander Mixing lemma We have seen that a bounded-degree graph is a good edge-expander if and only if if has large spectral gap If G = (V,

More information

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract

More information

Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices

Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices Yi Sun IDSIA Galleria, Manno CH-98, Switzerland yi@idsia.ch Faustino Gomez IDSIA Galleria, Manno CH-98, Switzerland

More information

Math 307 Learning Goals. March 23, 2010

Math 307 Learning Goals. March 23, 2010 Math 307 Learning Goals March 23, 2010 Course Description The course presents core concepts of linear algebra by focusing on applications in Science and Engineering. Examples of applications from recent

More information

arxiv: v1 [math.na] 1 Sep 2018

arxiv: v1 [math.na] 1 Sep 2018 On the perturbation of an L -orthogonal projection Xuefeng Xu arxiv:18090000v1 [mathna] 1 Sep 018 September 5 018 Abstract The L -orthogonal projection is an important mathematical tool in scientific computing

More information

Lecture 15 Perron-Frobenius Theory

Lecture 15 Perron-Frobenius Theory EE363 Winter 2005-06 Lecture 15 Perron-Frobenius Theory Positive and nonnegative matrices and vectors Perron-Frobenius theorems Markov chains Economic growth Population dynamics Max-min and min-max characterization

More information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information Fast Angular Synchronization for Phase Retrieval via Incomplete Information Aditya Viswanathan a and Mark Iwen b a Department of Mathematics, Michigan State University; b Department of Mathematics & Department

More information

Notes on Measure Theory and Markov Processes

Notes on Measure Theory and Markov Processes Notes on Measure Theory and Markov Processes Diego Daruich March 28, 2014 1 Preliminaries 1.1 Motivation The objective of these notes will be to develop tools from measure theory and probability to allow

More information

MATH 567: Mathematical Techniques in Data Science Clustering II

MATH 567: Mathematical Techniques in Data Science Clustering II Spectral clustering: overview MATH 567: Mathematical Techniques in Data Science Clustering II Dominique uillot Departments of Mathematical Sciences University of Delaware Overview of spectral clustering:

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Laplacian Integral Graphs with Maximum Degree 3

Laplacian Integral Graphs with Maximum Degree 3 Laplacian Integral Graphs with Maximum Degree Steve Kirkland Department of Mathematics and Statistics University of Regina Regina, Saskatchewan, Canada S4S 0A kirkland@math.uregina.ca Submitted: Nov 5,

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < + Random Walks: WEEK 2 Recurrence and transience Consider the event {X n = i for some n > 0} by which we mean {X = i}or{x 2 = i,x i}or{x 3 = i,x 2 i,x i},. Definition.. A state i S is recurrent if P(X n

More information

Institute for Advanced Computer Studies. Department of Computer Science. On Markov Chains with Sluggish Transients. G. W. Stewart y.

Institute for Advanced Computer Studies. Department of Computer Science. On Markov Chains with Sluggish Transients. G. W. Stewart y. University of Maryland Institute for Advanced Computer Studies Department of Computer Science College Park TR{94{77 TR{3306 On Markov Chains with Sluggish Transients G. W. Stewart y June, 994 ABSTRACT

More information

On a Conjecture of Thomassen

On a Conjecture of Thomassen On a Conjecture of Thomassen Michelle Delcourt Department of Mathematics University of Illinois Urbana, Illinois 61801, U.S.A. delcour2@illinois.edu Asaf Ferber Department of Mathematics Yale University,

More information

Affine iterations on nonnegative vectors

Affine iterations on nonnegative vectors Affine iterations on nonnegative vectors V. Blondel L. Ninove P. Van Dooren CESAME Université catholique de Louvain Av. G. Lemaître 4 B-348 Louvain-la-Neuve Belgium Introduction In this paper we consider

More information