Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Size: px

Start display at page:

Download "Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)"

Dana Bates
5 years ago
Views:

1 Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University) The authors explain how the NCut algorithm for graph bisection can be viewed as an algorithm which attempts to find regions of the graph where a random walk is likely to linger before moving on to other regions The normalized cut, or NCut, criterion was introduced in Shi and Malik s Normalized Cuts and Image Segmentation, as an optimality criterion for dividing the vertices v i V in a graph G(V, E) into two partitions A and Ā The NCut between two such partitions is defined as: ( 1 NCut(A, Ā) vola + 1 ) S ij volā v i A,v j Ā where vola is the sum of the degrees d i of the vertices v i in partition A: vola v i A S ij is a measure of similarity between vertices v i and v j, given by a similarity function s(v i, v j ), where s(v i, v j ) > 0 if (v i, v j ) E, and d i is the sum of the weights of all edges incident on v i : d i (v i,v j) E S ij d i Thus, NCut is small when a partitioning of the graph produces two partitions that are both high in volume, containing many vertices with a high sum of similarities to other vertices, but where the sum of the similarities between vertices in different partitions is small The NCut problem is to find a partition of the graph which minimizes the N Cut criterion Minimizing the NCut for a graph, however, has been shown to be NP-hard by Shi and Malik The NCut algorithm was devised to find an approximation to the optimal solution of the NCut problem In the NCut algorithm, an eigenvector x L is found, corresponding to the second-smallest eigenvalue λ L of the graph Laplacian matrix L D S, where D is the diagonal matrix of vertex degrees D i i d i, and S is the similarity matrix S ij is the given by the similarity function of vertex v i and v j The entries x L i in the eigenvector x L are then divided into two parts, corresponding to a paritioning of the vertices such that if x L i belong to part A, then v i is placed in partition A, and if x L i is in partition Ā then v i is placed in partition Ā, producing a bisection of the graph into two partitions A and Ā In Normalized Cuts and Image Segmentation it is shown that an optimal NCut corresponds to the case when the entries in x L are piecewise constant, with some entries x L i α and some entries x L j β, and that in this case the 1

2 value of the normalized cut is equal to λ L The hope is that this eigenvector x L will be approximately piecewise-constant, with some entries approximately equal to α and some entries approximately equal to β Then, it is easy to divide the entries of x L into two parts Meila and Shi show that this algorithm has a probabilistic interpretation, giving intuition as to why the NCut algorithm should find a good bisection of a graph They begin by looking at the probability transition matrix P D 1 S of a random walk This row-stochastic matrix gives the probability P ij that a random walk will transition from a vertex v i to a vertex v j in a step of the random walk on the graph (D and S are matrices as defined above) In Proposition 1, they state that the eigenvalues λ and eigenvectors x if P are the the reversed eigenvalues (1 λ) and the exact same eigenvectors x of the graph Laplacian L This is easy to prove: Lx (D S)x λdx D 1 (D S)x λx x D 1 Sx λx (1 λ)x D 1 Sx P x The benefit of viewing the problem this way is that we get a better understanding of why the NCut algorithm should work If we start a random walk at a vertex v i with probability qi 0, we have an initial probability distribution in the row vector q 0 for all the vertices, and one step of the random walk corresponds to a vector-matrix multiplication of this distribution times the transition probability matrix P The probability that we are at a given vertex at timestep t of the random walk is given by q t, where q t q t 1 P q 0 P t It is known that an irreducible, aperiodic Markov chain has a limiting or stationary distribution π, which is the distribution that doesn t change after multiplication by P ; in other words, as t, q π, where πp π Now, we know what π is for the graph with edge weights defined by the similarity matrix S ij In this case, π i πp ( d1, d 2,, d 1 di ) (v i,v j ) E Sij S 11 S 12 d 1 S 21 S 22 d 2 S n1 d n Proof: S 1n d 1 S 2n d 2 d 1 d 2 S n2 d n S nn d n ( (v 1,v j ) E S1j (v 2,v j ) E S2j (vn,v j ) E Snj ) 2

3 ( d1, d 2,, d 1 ) π We can look at a random walk that is already in the limiting aka stationary distribution, and calculate the probability that the walk will transition from one set of vertices A to another set B, and call this probability P AB This is given by: P AB probability that the walk transitions to B, given that it is in A probability that the walk is in A and transitions to B in the next step probability that the walk is in A v π d i A,v j B ip i S ij 1 ij v v π i A,v j B d i v d i S i A,v j B ij v 1 i A i v i A v d S i A,v j B ij i A i vola Therefore, if we partition the graph into A and Ā, then the probability that a walk in the limiting distribution will transition from one partition to the other is given by: v P A Ā+P ĀA S i A,v j Ā ij v + S ( i Ā,vj A ij 1 vola volā vola + 1 ) S ij volā v i A,v j Ā NCut(A, Ā) Here we see that by minimizing NCut(A, Ā), we are minimizing the probability that a random walk in the stationary distribution will transition between the partitions A and Ā In other words, it is very likely that the random walk will stay one of these two partitions This view of the NCut criterion was presented in sections 1-3; in Section 4, Meila and Shi use this view of NCut to describe why the NCut algorithm works; they show why the Laplacian of a graph with two clear partitions should have a piecewise-constant second eigenvector The important question that is answered by the probabilistic view of spectral bisection is: When does L have piecewise-constant eigenvectors? The answer is found is proposition 2: Proposition 2: Let P be a matrix with rows and columns indexed by v i V that has independent eigenvectors, and let (A 1, A 2, A k ) be a partition of V Then, P has k eigenvectors that are piecewise constant with respect to and correspond to non-zero eigenvalues if and only if the sums P is v j A s P ij are constant for all v i A s and all s, s 1,, k and the matrix R [P ss ] is nonsingular, where P ss v j A P ij and v i A s s Proof (as given in paper, with some details filled in): Assume that P has k independent and piecewise-constant eigenvectors 3

4 x 1,, x k with respect to the partition that correspond to non-zero eigenvalues λ 1,, λ k For a vector x that is piecewise-constant wrt, let y(x) s be the injective mapping from x to a k-dimensional vector y, where number y(x) s x i, for v i A s and s 1 k For an arbitrary s {1, 2, k}, look at two vertices v i v j, v i, v j A s Then, for each eigenvector x l, l 1,, k, we have: n k (11) (P x l ) i P im x l m y(x l ) s λ l x l i (12) (P x l ) j m1 n P jm x l m m1 s 1 k s 1 P jr y(x l ) s λ l x l j Now let P is be the probability of transitioning to partition s from vertex i, in other words: P is v P r A s ir Then, if we subtract (12) from (11) above, we get: k (P x l ) i (P x l ) j k y(x l ) s P jr y(x l ) s λ l x l i λ l x l j s 1 s 1 Since every vector is piecewise-constant with respect to the partition, and v i and v j are both in the same partition, this implies that x l i y(xl ) s x l j and therefore λ l x l i λl x l j 0 Thus: (P x l ) i (P x l ) j k (P is P js ) y(x l ) s 0 for l 1,, k s 1 We have a system of k linear equations in k unknowns (P i1 P j1 ) and coefficients y(x l ) s For a single l, we have: (P i1 P j1 ) y(x l ) 1 0 (P i2 P j2 ) y(x l ) 2 0 (P ik P jk ) y(x l ) k 0 Thus, for l 1, k we have the system: y(x 1 ) 1 y(x 1 ) 2 y(x 1 ) k y(x 2 ) 1 y(x 2 ) 2 y(x 2 ) k y(x k ) 1 y(x k ) 2 y(x k ) k (P i1 P j1 ) (P i2 P j2 ) (P ik P jk )

5 Let C be matrix of coefficients above, and ρ be the vector of probability differences, ie, the system can be written as C k k ρ k 1 0 k 1 Recall now that y(x l ) s x l i for v i A s The entries in the coefficient matrix, therefore, are made up of the entries of the eigenvectors x l of P Furthermore, because the eigenvectors are piecewise-constant with x l i c 1 for all vertices v i that belong to the same partition A c1, and because the eigenvectors are independent, this matrix is non-singular, meaning that the only solution to Cρ 0 is the trivial solution 0 Thus, we have that P is P js for all s {1,, k} In other words: (P i1 P j1 ) 0 P i1 P j1 (P i2 P j2 ) 0 P i2 P j2 (P ik P jk ) 0 P ik P jk Thus, the probability of moving from vertex v i into partition A s is the same as the probability of moving from vertex v j into partition A s in one step of a random walk that has reached its limiting distribution We chose the partition s and the vertices v i, v j A s arbitrarily Therefore, the probability of moving from any vertex v in a partition A s to any partition A s is constant This constant probability is denoted by P ss in Proposition 2, and represents the probability of moving from an arbitrary vertex in A s to any vertex in A s in one step of a random walk that is in its stationary distribution Now, if we take the matrix ˆP [P ss ] s,s 1,,k, its eigenvalues will be λ 1,, λ k and its corresponding eigenvectors y(x 1 ), y(x k ) Proof: P 11 P 12 P 1k y(x l ) 1 ˆP y(x l P 21 P 22 P 2k y(x l ) 2 ) P k1 P k2 P kk y(x l ) k k s1 P 1sy(x l ) 1 k s1 P 2sy(x l ) 2 k s1 P ksy(x l ) k ( k ( k ( k ( )) s1 v P j A s ijx l i, v i A 1 ( )) s1 v P j A s ijx l i, v i A 2 ( )) s1 v P j A s ijx l i, v i A k λ l x l i, v i A 1 λ l x l i, v i A 2 λ l x l i, v i A 2 λ l y(x l ) 1 λ l y(x l ) 2 λ l y(x l ) k λl y(x l ) 5

6 Now, since λ 1, λ k are nonzero, ˆP is nonsingular This finishes the proof in the forward direction backward direction : Suppose ˆP as defined above exists and is nonsingular Let y l, λ l, l 1, k be the eigenvectors and eigenvalues of ˆP Let x l x(y l ) be the vector defined as x i ys l if v i is in partition s Then x l is piecewise constant Furthermore, (P x l ) i k P ij x l j k x l j yi l j1n s 1 k P is yi l λ l yi l λ l x l i s 1 s 1 Therefore, the eigenvalues λ l of ˆP correspond exactly to the eigenvectors x l of P The implications of Proposition 2 are explored further in Meila and Shi s Learning Segmentation by Random Walks, but the main point is P has piecewiseconstant eigenvectors, corresponding to piecewise-constant eigenvectors of L, if the random walk on all the vertices in the graph can be aggregated into a random walk on a discrete state space {A 1,, A k } and transition probability matrix ˆP This gives an intuitive and interesting interpretation of the NCut algorithm Finally, Meila and Shi propose a variation of NCut which finds a partitioning of the graph into k partitions by selecting the largest k eigenvalues and finding the approximately equal elements in the corresponding k eigenvectors, as opposed to proceeding recursively, as does NCut, using only the eigenvector corresponding to the second-largest eigenvalue λ l at each recursive step The authors point out that the number of partitions k can be determined automatically if the first k eigenvalues are well-separated from the k + 1st to nth eigenvalues This is what Lin and Cohen s Power Iteration Clustering algorithm relies on in order to achieve a fast spectral clustering 6

Introduction to Spectral Graph Theory and Graph Clustering

Introduction to Spectral Graph Theory and Graph Clustering Chengming Jiang ECS 231 Spring 2016 University of California, Davis 1 / 40 Motivation Image partitioning in computer vision 2 / 40 Motivation