Filtering and Sampling Graph Signals, and its Application to Compressive Spectral Clustering Nicolas Tremblay (1,2), Gilles Puy (1), Rémi Gribonval (1), Pierre Vandergheynst (1,2) (1) PANAMA Team, INRIA Rennes, France (2) Signal Processing Laboratory 2, EPFL, Switzerland
Introduction to GSP Graph sampling Application to clustering Conclusion Why graph signal processing? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 1 / 47
Introduction to GSP Graph Fourier Transform Graph filtering Graph sampling Application to clustering What is Spectral Clustering? Compressive Spectral Clustering A toy experiment Experiments on the SBM Conclusion N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 2 / 47
Introduction to graph signal processing : graph Fourier transform N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 3 / 47
What s a graph signal? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 4 / 47
Three useful matrices The adjacency matrix : The degree matrix : 0 1 1 0 2 0 0 0 W = 1 0 1 1 1 1 0 0 S = 0 3 0 0 0 0 2 0 0 1 0 0 0 0 0 1 The Laplacian matrix : 2 1 1 0 L = S W = 1 3 1 1 1 1 2 0 0 1 0 1 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 5 / 47
Three useful matrices The adjacency matrix : The degree matrix : 0.5.5 0 1 0 0 0 W =.5 0.5 4.5.5 0 0 S = 0 5 0 0 0 0 1 0 0 4 0 0 0 0 0 4 The Laplacian matrix : 1.5.5 0 L = S W =.5 5.5 4.5.5 1 0 0 4 0 4 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 5 / 47
What s a graph Fourier transform? [Hammond 11] L = S W = UΛU U is the Fourier basis of the graph the Fourier transform of a signal x reads : ˆx = U x Λ = Diag(λ 1, λ 2,, λ N ) the spectrum A low frequency Fourier mode A high frequency Fourier mode N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 6 / 47
The graph Fourier transform encodes the structure of the graph Slide courtesy of D. Shuman N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 7 / 47
Introduction to graph signal processing : filtering graph signals N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 8 / 47
Graph filtering 1 Given a filter function h defined in the Fourier space. g(λ) h 0.8 0.6 0.4 0.2 0 1 2 λ In the node space, the signal x filtered by h reads : x h = U h(λ) U x = Hx N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 9 / 47
Graph filtering 1 Given a filter function h defined in the Fourier space. g(λ) h 0.8 0.6 0.4 0.2 0 1 2 λ In the node space, the signal x filtered by h reads : Problem : this costs L s diagonalisation [O(N 3 )]. x h = U h(λ) U x = Hx N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 9 / 47
Graph filtering 1 Given a filter function h defined in the Fourier space. g(λ) h 0.8 0.6 0.4 0.2 0 1 2 λ In the node space, the signal x filtered by h reads : Problem : this costs L s diagonalisation [O(N 3 )]. x h = U h(λ) U x = Hx Solution : we use a poly approx of order p of h : p h(λ) = α l λ l h(λ). Indeed, in this case : p Hx = U h(λ)u x = U α l Λ l U x = l=1 l=1 l=1 p α l L l x Hx Only involves matrix-vector multiplications [costs O(pN)]. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 9 / 47
A few applications Tikhonov regularization for denoising : argmin f { f y 2 2 + γf Lf } Wavelet denoising : argmin a { f W a 2 2 + γ a 1,µ } Compression via filterbanks, etc. Slide courtesy of D. Shuman N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 10 / 47
Introduction to GSP Graph Fourier Transform Graph filtering Graph sampling Application to clustering What is Spectral Clustering? Compressive Spectral Clustering A toy experiment Experiments on the SBM Conclusion N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 11 / 47
Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 12 / 47
Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 12 / 47
Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only How to reconstruct the original signal? Basically, we need : 1. a (low-dimensional) model for the signal to sample 2. a method to choose the nodes to sample 3. a decoder that exactly recovers the signal given its samples N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 12 / 47
Smoothness assumption In 1D signal processing, a smooth signal has most of its energy at low frequencies. 1 3 0.9 0.8 2.5 0.7 2 0.6 0.5 1.5 0.4 0.3 1 0.2 0.5 0.1 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Smooth signal in time 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Fourier transform N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 13 / 47
Smoothness assumption In 1D signal processing, a smooth signal has most of its energy at low frequencies. 1 3 0.9 0.8 2.5 0.7 2 0.6 0.5 1.5 0.4 0.3 1 0.2 0.5 0.1 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Smooth signal in time 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Fourier transform Definition (Bandlimited graph signal [Puy 15, Chen 15, Anis 16, Segarra 15] ) A k-bandlimited signal x R N on G is a signal that satisfies, for some ˆα R k x = U k ˆα, N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 13 / 47
Sampling band-limited graph signals Preparation : Associate to each node i a probability p i to draw this node. This defines a probability distribution p R N. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 14 / 47
Sampling band-limited graph signals Preparation : Associate to each node i a probability p i to draw this node. This defines a probability distribution p R N. Sampling procedure : draw n nodes according to p : {ω i } i [1,n]. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 14 / 47
Sampling band-limited graph signals Preparation : Associate to each node i a probability p i to draw this node. This defines a probability distribution p R N. Sampling procedure : draw n nodes according to p : {ω i } i [1,n]. We create a matrix M that measures the signal x only on the selected nodes : { 1 if j = ωi M ij := 0 otherwise, For any signal x R N on G, its sampled version is y = Mx (it has size n < N). N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 14 / 47
Optimizing the sampling distribution Some nodes are more important to sample than others. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47
Optimizing the sampling distribution Some nodes are more important to sample than others. For any signal x, remember that U k x is the energy of x on the first k 2 frequencies. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47
Optimizing the sampling distribution Some nodes are more important to sample than others. For any signal x, remember that U k x is the energy of x on the first k 2 frequencies. Then : 1. For each node i, construct the Dirac δ i centered at node i. 2. Compute U k δ i 2 (we have 0 U k δ i 2 1). N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47
Optimizing the sampling distribution Some nodes are more important to sample than others. For any signal x, remember that U k x is the energy of x on the first k 2 frequencies. Then : 1. For each node i, construct the Dirac δ i centered at node i. 2. Compute U k δ i 2 (we have 0 U k δ i 2 1). If U k δ i 2 1 : there exists a smooth signal concentrated on node i. Node i is important. If U k δ i 2 0 : no smooth signal has energy concentrated on node i. Node i can be sampled with less probability. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47
The graph weighted coherence We measure the quality of p with the graph weighted coherence. Definition (Graph weighted coherence) Let p R n represent a sampling distribution on {1,..., N}. The graph weighted coherence of order k for the pair (G, p) is { } νp k := max p 1/2 i U k δ i 1 i N 2. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 16 / 47
How many nodes to select? Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ɛ (0, 1), with probability at least 1 ɛ, (1 δ) x 1 x 2 2 2 1 MP 1/2 (x 1 x 2) 2 (1 + δ) x 1 x 2 2 2 n 2 for all x 1, x 2 span(u k ) provided that n 3 δ 2 (νk p ) 2 log ( ) 2k. ɛ N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47
How many nodes to select? Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ɛ (0, 1), with probability at least 1 ɛ, (1 δ) x 1 x 2 2 2 1 MP 1/2 (x 1 x 2) 2 (1 + δ) x 1 x 2 2 2 n 2 for all x 1, x 2 span(u k ) provided that n 3 δ 2 (νk p ) 2 log ( ) 2k. ɛ Let s minimize ν k p! Its lower bound, k, may always be reached for p : i [1, N] p i = U k δ i 2 2 /k N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47
How many nodes to select? Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ɛ (0, 1), with probability at least 1 ɛ, (1 δ) x 1 x 2 2 2 1 MP 1/2 (x 1 x 2) 2 (1 + δ) x 1 x 2 2 2 n 2 for all x 1, x 2 span(u k ) provided that n 3 δ 2 (νk p ) 2 log ( ) 2k. ɛ Let s minimize ν k p! Its lower bound, k, may always be reached for p : i [1, N] p i = U k δ i 2 2 /k With p, one needs n k log (k) up to the log factor, it is optimal! N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47
How many nodes to select? Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ɛ (0, 1), with probability at least 1 ɛ, (1 δ) x 1 x 2 2 2 1 MP 1/2 (x 1 x 2) 2 (1 + δ) x 1 x 2 2 2 n 2 for all x 1, x 2 span(u k ) provided that n 3 δ 2 (νk p ) 2 log ( ) 2k. ɛ Let s minimize ν k p! Its lower bound, k, may always be reached for p : i [1, N] p i = U k δ i 2 2 /k With p, one needs n k log (k) up to the log factor, it is optimal! We have an efficient algorithm that estimates p in O(pN log N)! N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47
Reconstruction We sampled the signal x R N, i.e., we measured y = Mx + n (n R n models noise). The goal is to estimate x from y. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 18 / 47
Reconstruction We sampled the signal x R N, i.e., we measured y = Mx + n (n R n models noise). The goal is to estimate x from y. We propose to solve (links with the SSL literature [Chapelle 10, Fu 12]) P 1/2 min z R N Ω (Mz y) 2 + γ z g(l)z, 2 where γ > 0 and g : R R is a nonnegative and nondecreasing poly function. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 18 / 47
Reconstruction Solving P 1/2 min z R N Ω (Mz y) 2 + γ z g(l)z, 2 can be done, e.g., by gradient descent or conjugate gradient. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 19 / 47
Reconstruction Solving P 1/2 min z R N Ω (Mz y) 2 + γ z g(l)z, 2 can be done, e.g., by gradient descent or conjugate gradient. It is fast as it involves only matrix-vector multiplications with sparse matrices. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 19 / 47
Reconstruction Solving P 1/2 min z R N Ω (Mz y) 2 + γ z g(l)z, 2 can be done, e.g., by gradient descent or conjugate gradient. It is fast as it involves only matrix-vector multiplications with sparse matrices. We proved that the result is accurate and stable to noise : The quality of the reconstruction depends on the eigengap ratio g(λ k )/g(λ k+1 ). γ should be adjusted with the signal-to-noise ratio. In absence of noise, the reconstruction quality improves when g(λ k )/g(λ k+1 ) 0 and γ 0. If g(λ k ) = 0 and g(λ k+1 ) > 0, we have exact recovery. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 19 / 47
Recap Given a graph and its Laplacian matrix L. Given any graph signal x defined on this graph, one can : 1. filter this signal with any filter h(λ) : x h = U h(λ) U [O(N 3 )], 2. fast filter it w/ poly approx h(λ) p l=1 α lλ l : x h p l=1 α ll l x [O(pN)]. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 20 / 47
Recap Given a graph and its Laplacian matrix L. Given any graph signal x defined on this graph, one can : 1. filter this signal with any filter h(λ) : x h = U h(λ) U [O(N 3 )], 2. fast filter it w/ poly approx h(λ) p l=1 α lλ l : x h p l=1 α ll l x [O(pN)]. Given a k-bandlimited graph signal x defined on this graph, one can : 1. estimate the optimal probability distrib pi = U k δ i 2 /k [O(pN log N)] 2 2. sample n = O(k log k) nodes from this distribution 3. measure the signal y = Mx R n 4. reconstruct the signal : [O(pN)] x rec = argmin P 1/2 Ω (Mz y) 2 + γ z g(l)z z R N 2 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 20 / 47
Introduction to GSP Graph Fourier Transform Graph filtering Graph sampling Application to clustering What is Spectral Clustering? Compressive Spectral Clustering A toy experiment Experiments on the SBM Conclusion N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 21 / 47
Application to clustering : What is Spectral Clustering? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 22 / 47
Given a series of N objects : N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 23 / 47
Given a series of N objects : 1/ Find adapted descriptors N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 23 / 47
Given a series of N objects : 1/ Find adapted descriptors 2/ Cluster N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 23 / 47
From the N objects, one creates : N vectors : x 1, x 2,, x N and their distance matrix R N N. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 24 / 47
From the N objects, one creates : N vectors : x 1, x 2,, x N and their distance matrix R N N. Goal of clustering : assign a label c(i) = 1,, k to each object i in order to organize / simplify / analyze the data. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 24 / 47
From the N objects, one creates : N vectors : x 1, x 2,, x N and their distance matrix R N N. Goal of clustering : assign a label c(i) = 1,, k to each object i in order to organize / simplify / analyze the data. There exists two different general types of methods : methods directly based on the x i and/or like k-means or hierarchical clustering. graph-based methods. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 24 / 47
Graph construction from the distance matrix Create a graph G = (V, E) : each node in V is one of the N objects each pair of nodes (i, j) is connected if (i, j) is small enough. For example, two connectivity possibilities : Gaussian kernel : 1. all pairs of nodes are connected with links of weights exp( (i, j)/σ) 2. remove all links of weight inferior to ɛ k nearest neighbors : connect each node to its k nearest neighbors. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 25 / 47
The clustering problem now states : Given the graph G representing the similarity between the N objects, find a partition of all nodes into k clusters. Many methods exist [Fortunato 10] : Modularity (or other cost-function) optimisation methods [Newman 06] Random walk methods [Schaub 12] Methods inspired from statistical physics [Krzakala 13], information theory [Rosvall 08]... spectral methods... N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 26 / 47
The classical spectral clustering (SC) algorithm [Von Luxburg 06] : Given the N-node graph G of laplacian matrix L : 1. Compute L s first k eigenvectors : U k = (u 1 u 2 u k ). 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 27 / 47
The classical spectral clustering (SC) algorithm [Von Luxburg 06] : Given the N-node graph G of laplacian matrix L : 1. Compute L s first k eigenvectors : U k = (u 1 u 2 u k ). 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. Definition : Let us call D ij the spectral clustering distance. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 27 / 47
What s the point of using a graph? N points in d = 2 dims. Result with k-means (k=2) on : After creating a graph, partial diagonalisation of L and running k-means (k=2) on D : N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 28 / 47
Application to clustering : Compressive Spectral Clustering N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 29 / 47
Our goal Problem : N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k 3 + Nk 2 )] [Chen 11a] 2. high-dimensional k-means [O(Nk 2 )]. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 30 / 47
Our goal Problem : N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k 3 + Nk 2 )] [Chen 11a] 2. high-dimensional k-means [O(Nk 2 )]. Goal : SC in high dimensions : with N 10 6 nodes and/or k 100. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 30 / 47
Our goal Problem : N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k 3 + Nk 2 )] [Chen 11a] 2. high-dimensional k-means [O(Nk 2 )]. Goal : SC in high dimensions : with N 10 6 nodes and/or k 100. Contribution : an algorithm that approximates the true SC solution with controlled relative error with a running time in O(k 2 log 2 k + pn(log(n) + k)). N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 30 / 47
Main ideas of Compressive Spectral Clustering (CSC) : CSC is based on two main observations : 1. SC does not need explicitly f i = U k δ i, but only D ij = f i f j 2. each cluster indicator function c j R N is in fact approx. k-bandlimited! j [1, k] c j is close to span(u k ) CSC follows 4 steps : 1. Estimate D ij by filtering d random graph signals, 2. Sample n nodes out of the N available ones, 3. Run low-dim k-means on these n nodes to obtain c r j R n, 4. Reconstruct each reduced cluster indicator function c r j back on the whole graph to obtain c j, as desired. (Steps 2 to 4 already covered!) Step 1 : How to estimate D ij without computing U k? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 31 / 47
Remember : the classical spectral clustering algorithm Given the N-node graph G of Laplacian matrix L : 1. Compute L s first k eigenvectors : U k = (u 1 u 2 u k ). 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with D ij = f i f j and obtain k clusters. Our goal : Estimate D ij without computing exactly U k. D ij = U k (δ i δ j ) = U 1.5 k δ ij 1 = U k U k δ ij 0.5 = U h λk (Λ) U 0 δ ij h λk (λ) -0.5 0 λ k 1 2 = H λk δ ij. λ N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 32 / 47
Fast filtering [Hammond, ACHA 11] In practice, we use a poly approx of order p of h λk : h λk = p α l λ l h λk. l=1 h λk (λ) 1.5 ideal 1 m=100 p m=20 p 0.5 m=5 p 0-0.5 0 λ k 1 2 λ Such that : D ij = H λk δ ij = lim p H λk δ ij N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 33 / 47
Norm conservation result [Tremblay 16a, Ramasamy 15 ] The spectral distance reads : D ij = H λk δ ij = lim p Hλk δ ij N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47
Norm conservation result [Tremblay 16a, Ramasamy 15 ] The spectral distance reads : D ij = H λk δ ij = lim p Hλk δ ij Let R = (r 1 r 2 r d ) R N d be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d. We define f i = ( H λk R) δ i R d and D ij = f i f j N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47
Norm conservation result [Tremblay 16a, Ramasamy 15 ] The spectral distance reads : D ij = H λk δ ij = lim p Hλk δ ij Let R = (r 1 r 2 r d ) R N d be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d. We define f i = ( H λk R) δ i R d and D ij = f i f j Theorem (Norm conservation theorem in the case of infinite p) Let ɛ > 0, if d > d 0 log N/ɛ 2, then, with proba > 1 1/N, we have : (i, j) [1, N] 2 (1 ɛ)d ij D ij (1 + ɛ)d ij. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47
Norm conservation result [Tremblay 16a, Ramasamy 15 ] The spectral distance reads : D ij = H λk δ ij = lim p Hλk δ ij Let R = (r 1 r 2 r d ) R N d be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d. We define f i = ( H λk R) δ i R d and D ij = f i f j Theorem (Norm conservation theorem in the case of infinite p) Let ɛ > 0, if d > d 0 log N/ɛ 2, then, with proba > 1 1/N, we have : (i, j) [1, N] 2 (1 ɛ)d ij D ij (1 + ɛ)d ij. Consequence : to estimate D ij with no partial diagonalisation of L, fast filter only d log N random signals! N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47
How to quickly estimate λ k, the sole unknown of the fast filtering operation? Goal : given a SDP L, estimate its k-th eigenvalue as fast as possible. We use eigencount techniques [Napoli 13] (also based on polynomial filtering of random vectors!) : given the interval [0, b], get an approximation of the number of enclosed eigenvalues. And find λ k by dichotomy on b. done in [O(pN log N)] N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 35 / 47
The CSC algorithm [Tremblay 16b, Puy 16 ] 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate d random graph signals in matrix R R N d. 3. Filter them with H λk and treat each node i as a point in R d : f i = δ i H λk R. If d log N, we prove that D ij = f i f j D ij. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 36 / 47
The CSC algorithm [Tremblay 16b, Puy 16 ] 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate d random graph signals in matrix R R N d. 3. Filter them with H λk and treat each node i as a point in R d : f i = δ i H λk R. If d log N, we prove that D ij = f i f j D ij. Next steps (sampling) : 4. sample n nodes from p 5. run k-means on the n associated feature vectors and obtain {cj r } j=1:k 6. reconstruct all k indicator functions {c j } j=1:k If n k log k and c r j = Mc j, we prove that we control the reconstruction error. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 36 / 47
Application to clustering : A toy experiment N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 37 / 47
Introduction to GSP Graph sampling Application to clustering Conclusion SC on a toy example N = 1000, k = 2 Com 1 : 300 nodes } } Com 2 : 700 nodes 200 400 600 800 1000 200 400 600 800 1000 Compute U2 = (u1, u2 ) N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47
Introduction to GSP Graph sampling Application to clustering Conclusion SC on a toy example 2 fi = U> 2 δi R : N = 1000, k = 2 Com 1 : 300 nodes 0.5 0 } } Com 2 : 700 nodes 1-0.5 200-1 -1 400-0.5 0 600 800 1000 200 400 600 800 1000 Compute U2 = (u1, u2 ) N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47
Introduction to GSP Graph sampling Application to clustering Conclusion SC on a toy example 2 fi = U> 2 δi R : N = 1000, k = 2 Com 1 : 300 nodes 0.5 0 } } Com 2 : 700 nodes 1-0.5 200-1 -1 400-0.5 0 Dij : 600 800 1.5 200 1000 200 400 600 800 1000 400 1 600 0.5 800 Compute U2 = (u1, u2 ) 1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47
Introduction to GSP Graph sampling Application to clustering Conclusion SC on a toy example 2 fi = U> 2 δi R : N = 1000, k = 2 Com 1 : 300 nodes 0.5 } k-means 0 perf = 0.996 } Com 2 : 700 nodes 1-0.5 200-1 -1 400-0.5 0 Dij : 600 800 1.5 200 1000 200 400 600 800 1000 400 1 600 0.5 800 Compute U2 = (u1, u2 ) 1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47
Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. 4. 5. 6. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47
Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : -0.4-0.6-0.8 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i -1-0.5 0 0.5 0.5 0-0.5-1 4. 5. 6. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47
Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : -0.4-0.6-0.8 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i -1-0.5 0 0.5 0.5 4. 0-0.5-1 D ij ' Dij : 5. 6. 200 1 400 600 0.5 800 1000 N. Tremblay Compressive Spectral Clustering 0 200 400 600 800 1000 Gdr ISIS, 17th of June 2016 39 / 47
Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : -0.4-0.6-0.8 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i -1-0.5 0 0.5 0.5 4. sample n = 3 nodes from p 0-0.5-1 D ij ' Dij : 5. 6. 200 1 400 600 0.5 800 1000 N. Tremblay Compressive Spectral Clustering 0 200 400 600 800 1000 Gdr ISIS, 17th of June 2016 39 / 47
Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : -0.6-0.8 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i 4. sample n = 3 nodes from p -1-0.5 0 0.5 0.5 0-0.5-1 D ij ' Dij : 5. 6. 200 1 400 600 0.5 800 1000 N. Tremblay Compressive Spectral Clustering 0 200 400 600 800 1000 Gdr ISIS, 17th of June 2016 39 / 47
Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : k-means -0.6-0.8 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i -1-0.5 0 0.5 0.5 4. sample n = 3 nodes from p 0-0.5-1 D ij ' Dij : 5. run low-dim. k-means 6. 200 1 400 600 0.5 800 1000 N. Tremblay Compressive Spectral Clustering 0 200 400 600 800 1000 Gdr ISIS, 17th of June 2016 39 / 47
Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : -0.4 after interpolation -0.6-0.8 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i -1-0.5 0 0.5 0.5 4. sample n = 3 nodes from p perf = 0.951 0-0.5-1 D ij ' Dij : 5. run low-dim. k-means 6. reconstruct the result 200 1 400 600 0.5 800 1000 N. Tremblay Compressive Spectral Clustering 0 200 400 600 800 1000 Gdr ISIS, 17th of June 2016 39 / 47
Application to clustering : Experiments on the SBM N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 40 / 47
Experiments The Stochastic Block Model (SBM) : C 1 C 2 C k N nodes and k communities of equal size N/k. { { q 1 q 2 q 2 q 1 q 2 q 2 { q 2 q 2 q 1 proba q 1 if in same community proba q 2 if not. define the ratio ɛ = q 2/q 1 SBM fully defined by ɛ and average degree s. define critical ratio ɛ c = (s s)/(s + s(k 1)) [Decelle 11] N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 41 / 47
Experiments The Stochastic Block Model (SBM) : C 1 C 2 C k N nodes and k communities of equal size N/k. { { q 1 q 2 q 2 q 1 q 2 q 2 { q 2 q 2 q 1 proba q 1 if in same community proba q 2 if not. define the ratio ɛ = q 2/q 1 SBM fully defined by ɛ and average degree s. define critical ratio ɛ c = (s s)/(s + s(k 1)) [Decelle 11] Experiments with N = 10 3, k = 20, s = 16, wrt to different parameters : Recovery performance 1 0.5 SC n = k log(k) n = 2 k log(k) n = 3 k log(k) n = 4 k log(k) 0 0 0.05 0.1 ǫ c 0.15 ǫ Recovery performance 1 SC 0.5 d = 2 log(n) d = 3 log(n) d = 4 log(n) d = 5 log(n) 0 0 0.05 0.1 ǫ c 0.15 ǫ Recovery performance 1 0.5 SC p = 10 p = 20 p = 50 p = 100 0 0 0.05 0.1 ǫ c 0.15 ǫ N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 41 / 47
Experiments With params d = 4 log (k), n = 2k log (k), p = 50, γ = 10 3, and ɛ = ɛ c/4 : Recovery performance 1 0.95 0.9 0.85 0.8 20 50 100 200 # of classes k N=10 4, CSC N=10 4, PM N=10 4, SC N=10 5, CSC N=10 5, PM N=10 5, SC N=10 6, CSC N=10 6, PM N=10 6, SC PM = Power Method [Lin 10, Boutsidis 15] Computation time (s) 10 5 10 3 10 1 20 50 100 200 # of classes k N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 42 / 47
Experiments With params d = 4 log (k), n = 2k log (k), p = 50, γ = 10 3, and ɛ = ɛ c/4 : Recovery performance 1 0.95 0.9 0.85 0.8 20 50 100 200 # of classes k N=10 4, CSC N=10 4, PM N=10 4, SC N=10 5, CSC N=10 5, PM N=10 5, SC N=10 6, CSC N=10 6, PM N=10 6, SC PM = Power Method [Lin 10, Boutsidis 15] Computation time (s) 10 5 10 3 10 1 20 50 100 200 # of classes k On a real-world graph : Amazon graph with 335000 nodes and 926000 edges : SC CSC k=250 7h17m, 0.84 1h20m, 0.83 k=500 15h29m, 0.84 3h34m, 0.84 17h36m (eigs) k=1000 + at least 21 h 10h18m, 0.84 for k-means, unknown N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 42 / 47
Conclusion N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 43 / 47
Two main ideas Low-pass graph fast filtering of random signals : a way to by-pass the Laplacian s diagonalisation for learning tasks. Cluster indicator functions live in a low-dimensional space (are k-bandlimited) : we can use sampling schemes to recover them efficiently. Details of this work are found in : (Sampling part) Random sampling of bandlimited signals on graphs, ACHA 2016. A MATLAB toolbox is available at : grsamplingbox.gforge.inria.fr (Clustering part) Compressive Spectral Clustering, ICML 2016. A MATLAB toolbox is available at : cscbox.gforge.inria.fr N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 44 / 47
Links with literature Low-rank approximation : Nystrom methods [Sun 15], leverage scores [Mahoney 11] Machine Learning : semi-supervised learning [Chapelle 10], active learning [Fu 12, Gadde 14], coresets [Har-Peled 04, Frahling 08] Compressed sensing : variable density sampling [Puy 11] Other fast approximate SC algorithms : [Lin 10, Fowlkes 04, Wang 09, Chen 11a, Chen 11b] N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 45 / 47
Perspectives and difficult questions Two difficult questions (among others) : 1. Given a SDP matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one? 2. How to choose automatically the appropriate polynomial order p? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 46 / 47
Perspectives and difficult questions Two difficult questions (among others) : 1. Given a SDP matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one? 2. How to choose automatically the appropriate polynomial order p? Perspectives 1. Rational filters instead of polynomial filters? [Shi 15, Isufi 16] 2. Smoother filters for better approximation? [Sakiyama 16] 3. How about if nodes are added one by one? 4. SBMO! [cf. E. Kaufmann] 5. Experiments shown were done with L = I D 1/2 WD 1/2. Test for L = D 1 2 ˆα D ˆα WD ˆα! [cf. R. Couillet] N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 46 / 47
[Ramasamy 15] Compressive spectral embedding : sidestepping... NIPS. [Fortunato 10] Community detection in graphs, Physics Reports [Newman 06] Modularity and community structure in networks, PNAS [Schaub 12] Markov dynamics as a zooming lens for multiscale..., Plos One [Krzakala 13] Spectral redemption : clustering sparse networks, PNAS [Rosvall 08] Maps of random walks on complex networks reveal..., PLOS One [Von Luxburg 06] A tutorial on spectral clustering, Statistics and Computing. [Chen 11a] Parallel spectral clustering in distributed systems, IEEE TPAMI [Lin 10] Power iteration clustering, ICML [Boutsidis 15] Spectral clustering via the power method - provably, ICML [Fowlkes 04] Spectral grouping using the nystrom method, IEEE TPAMI [Wang 09] Approximate spectral clustering, AKDDM [Chen 11b] Large scale spectral clustering with landmark-based..., CAI [Shuman 13] The emerging field of signal processing on graphs..., SPMag [Hammond 11] Wavelets on graphs via spectral graph theory, ACHA [Napoli 13] Efficient estimation of eigenvalue counts in an interval, arxiv [Tremblay 16a] Accelerated spectral clustering using graph..., ICASSP [Tremblay 16b] Compressive spectral clustering, ICML [Puy 16] Random sampling of bandlimited signals..., ACHA [Shi 15] Infinite impulse response graph filters in wireless sensor networks, SPL [Chen 15] Discrete Signal Processing on Graphs : Sampling Theory, TSP [Anis 16] Efficient Sampling Set Selection for Bandlimited Graph..., TSP [Segarra 15] Sampling of graph signals with successive local aggregations, TSP [Chapelle 10] Semi-Supervised Learning, The MIT Press [Fu 12] A survey on instance selection for active learning, KIS [Mahoney 11] Randomized algorithms for matrices and data, Found. and Trends in ML [Sun 15] A review of Nyström methods for large-scale machine learning, Inf. Fus. [Puy 11] On variable density compressive sampling, SPL [Gadde 14] Active Semi-supervised Learning Using Sampling Theory..., SIGKDD [Isufi 16] Distributed Time-Varying Graph Filtering, ArXiv [Sakiyama 16] Sp. Gr. Wav. and Filter Banks with Low Approximation Error, not published yet N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 47 / 47