Filtering and Sampling Graph Signals, and its Application to Compressive Spectral Clustering

Size: px

Start display at page:

Download "Filtering and Sampling Graph Signals, and its Application to Compressive Spectral Clustering"

Gavin Brown
5 years ago
Views:

(1), Rémi Gribonval (1), Pierre Vandergheynst (1,2) (1) PANAMA

1 Filtering and Sampling Graph Signals, and its Application to Compressive Spectral Clustering Nicolas Tremblay (1,2), Gilles Puy (1), Rémi Gribonval (1), Pierre Vandergheynst (1,2) (1) PANAMA Team, INRIA Rennes, France (2) Signal Processing Laboratory 2, EPFL, Switzerland

2 Introduction to GSP Graph sampling Application to clustering Conclusion Why graph signal processing? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

3 Introduction to GSP Graph Fourier Transform Graph filtering Graph sampling Application to clustering What is Spectral Clustering? Compressive Spectral Clustering A toy experiment Experiments on the SBM Conclusion N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

4 Introduction to graph signal processing : graph Fourier transform N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

5 What s a graph signal? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

6 Three useful matrices The adjacency matrix : The degree matrix : W = S = The Laplacian matrix : L = S W = N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

7 Three useful matrices The adjacency matrix : The degree matrix : W = S = The Laplacian matrix : L = S W = N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

8 What s a graph Fourier transform? [Hammond 11] L = S W = UΛU U is the Fourier basis of the graph the Fourier transform of a signal x reads : ˆx = U x Λ = Diag(λ 1, λ 2,, λ N ) the spectrum A low frequency Fourier mode A high frequency Fourier mode N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

9 The graph Fourier transform encodes the structure of the graph Slide courtesy of D. Shuman N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

10 Introduction to graph signal processing : filtering graph signals N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

11 Graph filtering 1 Given a filter function h defined in the Fourier space. g(λ) h λ In the node space, the signal x filtered by h reads : x h = U h(λ) U x = Hx N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

12 Graph filtering 1 Given a filter function h defined in the Fourier space. g(λ) h λ In the node space, the signal x filtered by h reads : Problem : this costs L s diagonalisation [O(N 3 )]. x h = U h(λ) U x = Hx N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

13 Graph filtering 1 Given a filter function h defined in the Fourier space. g(λ) h λ In the node space, the signal x filtered by h reads : Problem : this costs L s diagonalisation [O(N 3 )]. x h = U h(λ) U x = Hx Solution : we use a poly approx of order p of h : p h(λ) = α l λ l h(λ). Indeed, in this case : p Hx = U h(λ)u x = U α l Λ l U x = l=1 l=1 l=1 p α l L l x Hx Only involves matrix-vector multiplications [costs O(pN)]. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

Compression via filterbanks, etc. Slide courtesy of D. Shuman N.

14 A few applications Tikhonov regularization for denoising : argmin f { f y γf Lf } Wavelet denoising : argmin a { f W a γ a 1,µ } Compression via filterbanks, etc. Slide courtesy of D. Shuman N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

15 Introduction to GSP Graph Fourier Transform Graph filtering Graph sampling Application to clustering What is Spectral Clustering? Compressive Spectral Clustering A toy experiment Experiments on the SBM Conclusion N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

16 Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

17 Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

18 Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only How to reconstruct the original signal? Basically, we need : 1. a (low-dimensional) model for the signal to sample 2. a method to choose the nodes to sample 3. a decoder that exactly recovers the signal given its samples N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

19 Smoothness assumption In 1D signal processing, a smooth signal has most of its energy at low frequencies Smooth signal in time Fourier transform N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

20 Smoothness assumption In 1D signal processing, a smooth signal has most of its energy at low frequencies Smooth signal in time Fourier transform Definition (Bandlimited graph signal [Puy 15, Chen 15, Anis 16, Segarra 15] ) A k-bandlimited signal x R N on G is a signal that satisfies, for some ˆα R k x = U k ˆα, N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

21 Sampling band-limited graph signals Preparation : Associate to each node i a probability p i to draw this node. This defines a probability distribution p R N. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

22 Sampling band-limited graph signals Preparation : Associate to each node i a probability p i to draw this node. This defines a probability distribution p R N. Sampling procedure : draw n nodes according to p : {ω i } i [1,n]. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

23 Sampling band-limited graph signals Preparation : Associate to each node i a probability p i to draw this node. This defines a probability distribution p R N. Sampling procedure : draw n nodes according to p : {ω i } i [1,n]. We create a matrix M that measures the signal x only on the selected nodes : { 1 if j = ωi M ij := 0 otherwise, For any signal x R N on G, its sampled version is y = Mx (it has size n < N). N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

24 Optimizing the sampling distribution Some nodes are more important to sample than others. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

25 Optimizing the sampling distribution Some nodes are more important to sample than others. For any signal x, remember that U k x is the energy of x on the first k 2 frequencies. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

26 Optimizing the sampling distribution Some nodes are more important to sample than others. For any signal x, remember that U k x is the energy of x on the first k 2 frequencies. Then : 1. For each node i, construct the Dirac δ i centered at node i. 2. Compute U k δ i 2 (we have 0 U k δ i 2 1). N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

27 Optimizing the sampling distribution Some nodes are more important to sample than others. For any signal x, remember that U k x is the energy of x on the first k 2 frequencies. Then : 1. For each node i, construct the Dirac δ i centered at node i. 2. Compute U k δ i 2 (we have 0 U k δ i 2 1). If U k δ i 2 1 : there exists a smooth signal concentrated on node i. Node i is important. If U k δ i 2 0 : no smooth signal has energy concentrated on node i. Node i can be sampled with less probability. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

28 The graph weighted coherence We measure the quality of p with the graph weighted coherence. Definition (Graph weighted coherence) Let p R n represent a sampling distribution on {1,..., N}. The graph weighted coherence of order k for the pair (G, p) is { } νp k := max p 1/2 i U k δ i 1 i N 2. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

29 How many nodes to select? Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ɛ (0, 1), with probability at least 1 ɛ, (1 δ) x 1 x MP 1/2 (x 1 x 2) 2 (1 + δ) x 1 x n 2 for all x 1, x 2 span(u k ) provided that n 3 δ 2 (νk p ) 2 log ( ) 2k. ɛ N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

30 How many nodes to select? Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ɛ (0, 1), with probability at least 1 ɛ, (1 δ) x 1 x MP 1/2 (x 1 x 2) 2 (1 + δ) x 1 x n 2 for all x 1, x 2 span(u k ) provided that n 3 δ 2 (νk p ) 2 log ( ) 2k. ɛ Let s minimize ν k p! Its lower bound, k, may always be reached for p : i [1, N] p i = U k δ i 2 2 /k N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

31 How many nodes to select? Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ɛ (0, 1), with probability at least 1 ɛ, (1 δ) x 1 x MP 1/2 (x 1 x 2) 2 (1 + δ) x 1 x n 2 for all x 1, x 2 span(u k ) provided that n 3 δ 2 (νk p ) 2 log ( ) 2k. ɛ Let s minimize ν k p! Its lower bound, k, may always be reached for p : i [1, N] p i = U k δ i 2 2 /k With p, one needs n k log (k) up to the log factor, it is optimal! N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

32 How many nodes to select? Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ɛ (0, 1), with probability at least 1 ɛ, (1 δ) x 1 x MP 1/2 (x 1 x 2) 2 (1 + δ) x 1 x n 2 for all x 1, x 2 span(u k ) provided that n 3 δ 2 (νk p ) 2 log ( ) 2k. ɛ Let s minimize ν k p! Its lower bound, k, may always be reached for p : i [1, N] p i = U k δ i 2 2 /k With p, one needs n k log (k) up to the log factor, it is optimal! We have an efficient algorithm that estimates p in O(pN log N)! N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

33 Reconstruction We sampled the signal x R N, i.e., we measured y = Mx + n (n R n models noise). The goal is to estimate x from y. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

34 Reconstruction We sampled the signal x R N, i.e., we measured y = Mx + n (n R n models noise). The goal is to estimate x from y. We propose to solve (links with the SSL literature [Chapelle 10, Fu 12]) P 1/2 min z R N Ω (Mz y) 2 + γ z g(l)z, 2 where γ > 0 and g : R R is a nonnegative and nondecreasing poly function. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

35 Reconstruction Solving P 1/2 min z R N Ω (Mz y) 2 + γ z g(l)z, 2 can be done, e.g., by gradient descent or conjugate gradient. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

36 Reconstruction Solving P 1/2 min z R N Ω (Mz y) 2 + γ z g(l)z, 2 can be done, e.g., by gradient descent or conjugate gradient. It is fast as it involves only matrix-vector multiplications with sparse matrices. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

37 Reconstruction Solving P 1/2 min z R N Ω (Mz y) 2 + γ z g(l)z, 2 can be done, e.g., by gradient descent or conjugate gradient. It is fast as it involves only matrix-vector multiplications with sparse matrices. We proved that the result is accurate and stable to noise : The quality of the reconstruction depends on the eigengap ratio g(λ k )/g(λ k+1 ). γ should be adjusted with the signal-to-noise ratio. In absence of noise, the reconstruction quality improves when g(λ k )/g(λ k+1 ) 0 and γ 0. If g(λ k ) = 0 and g(λ k+1 ) > 0, we have exact recovery. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

38 Recap Given a graph and its Laplacian matrix L. Given any graph signal x defined on this graph, one can : 1. filter this signal with any filter h(λ) : x h = U h(λ) U [O(N 3 )], 2. fast filter it w/ poly approx h(λ) p l=1 α lλ l : x h p l=1 α ll l x [O(pN)]. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

39 Recap Given a graph and its Laplacian matrix L. Given any graph signal x defined on this graph, one can : 1. filter this signal with any filter h(λ) : x h = U h(λ) U [O(N 3 )], 2. fast filter it w/ poly approx h(λ) p l=1 α lλ l : x h p l=1 α ll l x [O(pN)]. Given a k-bandlimited graph signal x defined on this graph, one can : 1. estimate the optimal probability distrib pi = U k δ i 2 /k [O(pN log N)] 2 2. sample n = O(k log k) nodes from this distribution 3. measure the signal y = Mx R n 4. reconstruct the signal : [O(pN)] x rec = argmin P 1/2 Ω (Mz y) 2 + γ z g(l)z z R N 2 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

40 Introduction to GSP Graph Fourier Transform Graph filtering Graph sampling Application to clustering What is Spectral Clustering? Compressive Spectral Clustering A toy experiment Experiments on the SBM Conclusion N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

41 Application to clustering : What is Spectral Clustering? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

42 Given a series of N objects : N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

43 Given a series of N objects : 1/ Find adapted descriptors N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

44 Given a series of N objects : 1/ Find adapted descriptors 2/ Cluster N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

45 From the N objects, one creates : N vectors : x 1, x 2,, x N and their distance matrix R N N. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

46 From the N objects, one creates : N vectors : x 1, x 2,, x N and their distance matrix R N N. Goal of clustering : assign a label c(i) = 1,, k to each object i in order to organize / simplify / analyze the data. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

47 From the N objects, one creates : N vectors : x 1, x 2,, x N and their distance matrix R N N. Goal of clustering : assign a label c(i) = 1,, k to each object i in order to organize / simplify / analyze the data. There exists two different general types of methods : methods directly based on the x i and/or like k-means or hierarchical clustering. graph-based methods. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

48 Graph construction from the distance matrix Create a graph G = (V, E) : each node in V is one of the N objects each pair of nodes (i, j) is connected if (i, j) is small enough. For example, two connectivity possibilities : Gaussian kernel : 1. all pairs of nodes are connected with links of weights exp( (i, j)/σ) 2. remove all links of weight inferior to ɛ k nearest neighbors : connect each node to its k nearest neighbors. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

49 The clustering problem now states : Given the graph G representing the similarity between the N objects, find a partition of all nodes into k clusters. Many methods exist [Fortunato 10] : Modularity (or other cost-function) optimisation methods [Newman 06] Random walk methods [Schaub 12] Methods inspired from statistical physics [Krzakala 13], information theory [Rosvall 08]... spectral methods... N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

50 The classical spectral clustering (SC) algorithm [Von Luxburg 06] : Given the N-node graph G of laplacian matrix L : 1. Compute L s first k eigenvectors : U k = (u 1 u 2 u k ). 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

51 The classical spectral clustering (SC) algorithm [Von Luxburg 06] : Given the N-node graph G of laplacian matrix L : 1. Compute L s first k eigenvectors : U k = (u 1 u 2 u k ). 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. Definition : Let us call D ij the spectral clustering distance. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

diagonalisation of L and running k-means (k=2) on D : N.

52 What s the point of using a graph? N points in d = 2 dims. Result with k-means (k=2) on : After creating a graph, partial diagonalisation of L and running k-means (k=2) on D : N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

53 Application to clustering : Compressive Spectral Clustering N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

54 Our goal Problem : N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k 3 + Nk 2 )] [Chen 11a] 2. high-dimensional k-means [O(Nk 2 )]. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

55 Our goal Problem : N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k 3 + Nk 2 )] [Chen 11a] 2. high-dimensional k-means [O(Nk 2 )]. Goal : SC in high dimensions : with N 10 6 nodes and/or k 100. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

56 Our goal Problem : N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k 3 + Nk 2 )] [Chen 11a] 2. high-dimensional k-means [O(Nk 2 )]. Goal : SC in high dimensions : with N 10 6 nodes and/or k 100. Contribution : an algorithm that approximates the true SC solution with controlled relative error with a running time in O(k 2 log 2 k + pn(log(n) + k)). N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

57 Main ideas of Compressive Spectral Clustering (CSC) : CSC is based on two main observations : 1. SC does not need explicitly f i = U k δ i, but only D ij = f i f j 2. each cluster indicator function c j R N is in fact approx. k-bandlimited! j [1, k] c j is close to span(u k ) CSC follows 4 steps : 1. Estimate D ij by filtering d random graph signals, 2. Sample n nodes out of the N available ones, 3. Run low-dim k-means on these n nodes to obtain c r j R n, 4. Reconstruct each reduced cluster indicator function c r j back on the whole graph to obtain c j, as desired. (Steps 2 to 4 already covered!) Step 1 : How to estimate D ij without computing U k? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

58 Remember : the classical spectral clustering algorithm Given the N-node graph G of Laplacian matrix L : 1. Compute L s first k eigenvectors : U k = (u 1 u 2 u k ). 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with D ij = f i f j and obtain k clusters. Our goal : Estimate D ij without computing exactly U k. D ij = U k (δ i δ j ) = U 1.5 k δ ij 1 = U k U k δ ij 0.5 = U h λk (Λ) U 0 δ ij h λk (λ) λ k 1 2 = H λk δ ij. λ N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

59 Fast filtering [Hammond, ACHA 11] In practice, we use a poly approx of order p of h λk : h λk = p α l λ l h λk. l=1 h λk (λ) 1.5 ideal 1 m=100 p m=20 p 0.5 m=5 p λ k 1 2 λ Such that : D ij = H λk δ ij = lim p H λk δ ij N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

60 Norm conservation result [Tremblay 16a, Ramasamy 15 ] The spectral distance reads : D ij = H λk δ ij = lim p Hλk δ ij N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

61 Norm conservation result [Tremblay 16a, Ramasamy 15 ] The spectral distance reads : D ij = H λk δ ij = lim p Hλk δ ij Let R = (r 1 r 2 r d ) R N d be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d. We define f i = ( H λk R) δ i R d and D ij = f i f j N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

62 Norm conservation result [Tremblay 16a, Ramasamy 15 ] The spectral distance reads : D ij = H λk δ ij = lim p Hλk δ ij Let R = (r 1 r 2 r d ) R N d be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d. We define f i = ( H λk R) δ i R d and D ij = f i f j Theorem (Norm conservation theorem in the case of infinite p) Let ɛ > 0, if d > d 0 log N/ɛ 2, then, with proba > 1 1/N, we have : (i, j) [1, N] 2 (1 ɛ)d ij D ij (1 + ɛ)d ij. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

63 Norm conservation result [Tremblay 16a, Ramasamy 15 ] The spectral distance reads : D ij = H λk δ ij = lim p Hλk δ ij Let R = (r 1 r 2 r d ) R N d be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d. We define f i = ( H λk R) δ i R d and D ij = f i f j Theorem (Norm conservation theorem in the case of infinite p) Let ɛ > 0, if d > d 0 log N/ɛ 2, then, with proba > 1 1/N, we have : (i, j) [1, N] 2 (1 ɛ)d ij D ij (1 + ɛ)d ij. Consequence : to estimate D ij with no partial diagonalisation of L, fast filter only d log N random signals! N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

64 How to quickly estimate λ k, the sole unknown of the fast filtering operation? Goal : given a SDP L, estimate its k-th eigenvalue as fast as possible. We use eigencount techniques [Napoli 13] (also based on polynomial filtering of random vectors!) : given the interval [0, b], get an approximation of the number of enclosed eigenvalues. And find λ k by dichotomy on b. done in [O(pN log N)] N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

65 The CSC algorithm [Tremblay 16b, Puy 16 ] 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate d random graph signals in matrix R R N d. 3. Filter them with H λk and treat each node i as a point in R d : f i = δ i H λk R. If d log N, we prove that D ij = f i f j D ij. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

66 The CSC algorithm [Tremblay 16b, Puy 16 ] 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate d random graph signals in matrix R R N d. 3. Filter them with H λk and treat each node i as a point in R d : f i = δ i H λk R. If d log N, we prove that D ij = f i f j D ij. Next steps (sampling) : 4. sample n nodes from p 5. run k-means on the n associated feature vectors and obtain {cj r } j=1:k 6. reconstruct all k indicator functions {c j } j=1:k If n k log k and c r j = Mc j, we prove that we control the reconstruction error. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

67 Application to clustering : A toy experiment N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

68 Introduction to GSP Graph sampling Application to clustering Conclusion SC on a toy example N = 1000, k = 2 Com 1 : 300 nodes } } Com 2 : 700 nodes Compute U2 = (u1, u2 ) N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

69 Introduction to GSP Graph sampling Application to clustering Conclusion SC on a toy example 2 fi = U> 2 δi R : N = 1000, k = 2 Com 1 : 300 nodes } } Com 2 : 700 nodes Compute U2 = (u1, u2 ) N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

70 Introduction to GSP Graph sampling Application to clustering Conclusion SC on a toy example 2 fi = U> 2 δi R : N = 1000, k = 2 Com 1 : 300 nodes } } Com 2 : 700 nodes Dij : Compute U2 = (u1, u2 ) N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

5 200-1 -1 400-0.5 0 Dij : 600 800 1.5 200 1000 200 400 600 800 1000 400 1 600 0.

71 Introduction to GSP Graph sampling Application to clustering Conclusion SC on a toy example 2 fi = U> 2 δi R : N = 1000, k = 2 Com 1 : 300 nodes 0.5 } k-means 0 perf = } Com 2 : 700 nodes Dij : Compute U2 = (u1, u2 ) N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

72 Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example 1. estimate λ2 and p 2. gen. d = 3 random graph signals N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

73 Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

74 Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i D ij ' Dij : N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

75 Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i sample n = 3 nodes from p D ij ' Dij : N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

76 Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i 4. sample n = 3 nodes from p D ij ' Dij : N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

77 Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : k-means estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i sample n = 3 nodes from p D ij ' Dij : 5. run low-dim. k-means N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

78 Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : -0.4 after interpolation estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i sample n = 3 nodes from p perf = D ij ' Dij : 5. run low-dim. k-means 6. reconstruct the result N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

79 Application to clustering : Experiments on the SBM N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

80 Experiments The Stochastic Block Model (SBM) : C 1 C 2 C k N nodes and k communities of equal size N/k. { { q 1 q 2 q 2 q 1 q 2 q 2 { q 2 q 2 q 1 proba q 1 if in same community proba q 2 if not. define the ratio ɛ = q 2/q 1 SBM fully defined by ɛ and average degree s. define critical ratio ɛ c = (s s)/(s + s(k 1)) [Decelle 11] N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

81 Experiments The Stochastic Block Model (SBM) : C 1 C 2 C k N nodes and k communities of equal size N/k. { { q 1 q 2 q 2 q 1 q 2 q 2 { q 2 q 2 q 1 proba q 1 if in same community proba q 2 if not. define the ratio ɛ = q 2/q 1 SBM fully defined by ɛ and average degree s. define critical ratio ɛ c = (s s)/(s + s(k 1)) [Decelle 11] Experiments with N = 10 3, k = 20, s = 16, wrt to different parameters : Recovery performance SC n = k log(k) n = 2 k log(k) n = 3 k log(k) n = 4 k log(k) ǫ c 0.15 ǫ Recovery performance 1 SC 0.5 d = 2 log(n) d = 3 log(n) d = 4 log(n) d = 5 log(n) ǫ c 0.15 ǫ Recovery performance SC p = 10 p = 20 p = 50 p = ǫ c 0.15 ǫ N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

82 Experiments With params d = 4 log (k), n = 2k log (k), p = 50, γ = 10 3, and ɛ = ɛ c/4 : Recovery performance # of classes k N=10 4, CSC N=10 4, PM N=10 4, SC N=10 5, CSC N=10 5, PM N=10 5, SC N=10 6, CSC N=10 6, PM N=10 6, SC PM = Power Method [Lin 10, Boutsidis 15] Computation time (s) # of classes k N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

83 Experiments With params d = 4 log (k), n = 2k log (k), p = 50, γ = 10 3, and ɛ = ɛ c/4 : Recovery performance # of classes k N=10 4, CSC N=10 4, PM N=10 4, SC N=10 5, CSC N=10 5, PM N=10 5, SC N=10 6, CSC N=10 6, PM N=10 6, SC PM = Power Method [Lin 10, Boutsidis 15] Computation time (s) # of classes k On a real-world graph : Amazon graph with nodes and edges : SC CSC k=250 7h17m, h20m, 0.83 k=500 15h29m, h34m, h36m (eigs) k= at least 21 h 10h18m, 0.84 for k-means, unknown N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

84 Conclusion N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

85 Two main ideas Low-pass graph fast filtering of random signals : a way to by-pass the Laplacian s diagonalisation for learning tasks. Cluster indicator functions live in a low-dimensional space (are k-bandlimited) : we can use sampling schemes to recover them efficiently. Details of this work are found in : (Sampling part) Random sampling of bandlimited signals on graphs, ACHA A MATLAB toolbox is available at : grsamplingbox.gforge.inria.fr (Clustering part) Compressive Spectral Clustering, ICML A MATLAB toolbox is available at : cscbox.gforge.inria.fr N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

86 Links with literature Low-rank approximation : Nystrom methods [Sun 15], leverage scores [Mahoney 11] Machine Learning : semi-supervised learning [Chapelle 10], active learning [Fu 12, Gadde 14], coresets [Har-Peled 04, Frahling 08] Compressed sensing : variable density sampling [Puy 11] Other fast approximate SC algorithms : [Lin 10, Fowlkes 04, Wang 09, Chen 11a, Chen 11b] N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

87 Perspectives and difficult questions Two difficult questions (among others) : 1. Given a SDP matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one? 2. How to choose automatically the appropriate polynomial order p? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

88 Perspectives and difficult questions Two difficult questions (among others) : 1. Given a SDP matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one? 2. How to choose automatically the appropriate polynomial order p? Perspectives 1. Rational filters instead of polynomial filters? [Shi 15, Isufi 16] 2. Smoother filters for better approximation? [Sakiyama 16] 3. How about if nodes are added one by one? 4. SBMO! [cf. E. Kaufmann] 5. Experiments shown were done with L = I D 1/2 WD 1/2. Test for L = D 1 2 ˆα D ˆα WD ˆα! [cf. R. Couillet] N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

89 [Ramasamy 15] Compressive spectral embedding : sidestepping... NIPS. [Fortunato 10] Community detection in graphs, Physics Reports [Newman 06] Modularity and community structure in networks, PNAS [Schaub 12] Markov dynamics as a zooming lens for multiscale..., Plos One [Krzakala 13] Spectral redemption : clustering sparse networks, PNAS [Rosvall 08] Maps of random walks on complex networks reveal..., PLOS One [Von Luxburg 06] A tutorial on spectral clustering, Statistics and Computing. [Chen 11a] Parallel spectral clustering in distributed systems, IEEE TPAMI [Lin 10] Power iteration clustering, ICML [Boutsidis 15] Spectral clustering via the power method - provably, ICML [Fowlkes 04] Spectral grouping using the nystrom method, IEEE TPAMI [Wang 09] Approximate spectral clustering, AKDDM [Chen 11b] Large scale spectral clustering with landmark-based..., CAI [Shuman 13] The emerging field of signal processing on graphs..., SPMag [Hammond 11] Wavelets on graphs via spectral graph theory, ACHA [Napoli 13] Efficient estimation of eigenvalue counts in an interval, arxiv [Tremblay 16a] Accelerated spectral clustering using graph..., ICASSP [Tremblay 16b] Compressive spectral clustering, ICML [Puy 16] Random sampling of bandlimited signals..., ACHA [Shi 15] Infinite impulse response graph filters in wireless sensor networks, SPL [Chen 15] Discrete Signal Processing on Graphs : Sampling Theory, TSP [Anis 16] Efficient Sampling Set Selection for Bandlimited Graph..., TSP [Segarra 15] Sampling of graph signals with successive local aggregations, TSP [Chapelle 10] Semi-Supervised Learning, The MIT Press [Fu 12] A survey on instance selection for active learning, KIS [Mahoney 11] Randomized algorithms for matrices and data, Found. and Trends in ML [Sun 15] A review of Nyström methods for large-scale machine learning, Inf. Fus. [Puy 11] On variable density compressive sampling, SPL [Gadde 14] Active Semi-supervised Learning Using Sampling Theory..., SIGKDD [Isufi 16] Distributed Time-Varying Graph Filtering, ArXiv [Sakiyama 16] Sp. Gr. Wav. and Filter Banks with Low Approximation Error, not published yet N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June / 47

Random Sampling of Bandlimited Signals on Graphs

Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with