Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with Gilles Puy (INRIA), Nicolas Tremblay (INRIA) and Rémi Gribonval (INRIA) NIPS015 Workshop Multiresolution Methods for Large Scale Learning 1
Motivation Energy Networks Social Networks Transportation Networks Biological Networks Point Clouds
Goal Given partially observed information at the nodes of a graph? Can we robustly and efficiently infer missing information? What signal model? How many observations? Influence of the structure of the graph? 3
Notations G = {V, E, W} weighted, undirected V is the set of n nodes E is the set of edges W R n n is the weighted adjacency matrix L R n n combinatorial graph Laplacian L := D normalised Laplacian L := I D 1/ WD 1/ W diagonal degree matrix D has entries d i := P i6=j W ij 4
Notations L is real, symmetric PSD orthonormal eigenvectors U R n n Graph Fourier Matrix non-negative eigenvalues 1 6 6..., n L = U U 5
Notations L is real, symmetric PSD orthonormal eigenvectors U R n n Graph Fourier Matrix non-negative eigenvalues 1 6 6..., n L = U U k-bandlimited signals Fourier coefficients x = U k ˆx k x R n ˆx = U x ˆx k R k U k := (u 1,...,u k ) R n k first k eigenvectors only 5
Sampling Model nx p R n p i > 0 kpk 1 = P := diag(p) R n n i=1 p i =1 6
Sampling Model nx p R n p i > 0 kpk 1 = P := diag(p) R n n i=1 p i =1 Draw independently m samples (random sampling) P(! j = i) =p i, 8j {1,...,m} and 8i {1,...,n} 6
Sampling Model nx p R n p i > 0 kpk 1 = P := diag(p) R n n i=1 p i =1 Draw independently m samples (random sampling) P(! j = i) =p i, 8j {1,...,m} and 8i {1,...,n} y j := x!j, 8j {1,...,m} y = Mx 6
Sampling Model ku k ik ku ik = ku k ik k i k = ku k ik How much a perfect impulse can be concentrated on first k eigenvectors Carries interesting information about the graph 7
Sampling Model ku k ik ku ik = ku k ik k i k = ku k ik How much a perfect impulse can be concentrated on first k eigenvectors Carries interesting information about the graph Ideally: p i large wherever ku k ik is large 7
Sampling Model ku k ik ku ik = ku k ik k i k = ku k ik How much a perfect impulse can be concentrated on first k eigenvectors Carries interesting information about the graph Ideally: p i large wherever ku k ik is large Graph Coherence k p := max 16i6n Rem: k p n p 1/ i ku k ik o > p k 7
Stable Embedding Theorem 1 (Restricted isometry property). Let M be a random subsampling matrix with the sampling distribution p. Forany, (0, 1), with probability at least 1, (1 ) kxk 6 1 m MP 1/ x 6 (1 + ) kxk (1) for all x span(u k ) provided that m > 3 k ( k p) log. () 8
Stable Embedding Theorem 1 (Restricted isometry property). Let M be a random subsampling matrix with the sampling distribution p. Forany, (0, 1), with probability at least 1, (1 ) kxk 6 1 m MP 1/ x 6 (1 + ) kxk (1) for all x span(u k ) provided that m > 3 k ( k p) log. () MP 1/ x = P 1/ Mx Only need M, re-weighting offline 8
Stable Embedding Theorem 1 (Restricted isometry property). Let M be a random subsampling matrix with the sampling distribution p. Forany, (0, 1), with probability at least 1, (1 ) kxk 6 1 m MP 1/ x 6 (1 + ) kxk (1) for all x span(u k ) provided that m > 3 k ( k p) log. () MP 1/ x = P 1/ Mx Only need M, re-weighting offline ( k p) > k Need to sample at least k nodes 8
Stable Embedding Theorem 1 (Restricted isometry property). Let M be a random subsampling matrix with the sampling distribution p. Forany, (0, 1), with probability at least 1, (1 ) kxk 6 1 m MP 1/ x 6 (1 + ) kxk (1) for all x span(u k ) provided that m > 3 k ( k p) log. () MP 1/ x = P 1/ Mx Only need M, re-weighting offline ( k p) > k Need to sample at least k nodes Proof similar to CS in bounded ONB but simpler since model is a subspace (not a union) 8
Stable Embedding ( k p) > k Need to sample at least k nodes 9
Stable Embedding ( k p) > k Need to sample at least k nodes Can we reduce to optimal amount? 9
Stable Embedding ( k p) > k Need to sample at least k nodes Can we reduce to optimal amount? Variable Density Sampling p i := ku k ik k, i =1,...,n is such that: ( k p) = k and depends on structure of graph 9
Stable Embedding ( k p) > k Need to sample at least k nodes Can we reduce to optimal amount? Variable Density Sampling p i := ku k ik k, i =1,...,n is such that: ( k p) = k and depends on structure of graph Corollary 1. Let M be a random subsampling matrix constructed with the sampling distribution p.forany, (0, 1), with probability at least 1, (1 ) kxk 6 1 m MP 1/ x 6 (1 + ) kxk for all x span(u k ) provided that m > 3 k log k. 9
Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding 10
Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Standard Decoder min zspan(u k ) P 1/ (Mz y) 10
Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Standard Decoder min zspan(u k ) P 1/ (Mz y) need projector 10
Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Standard Decoder min zspan(u k ) P 1/ (Mz y) need projector re-weighting for RIP 10
Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding 11
Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Efficient Decoder: min zr n P 1/ (Mz y) + z g(l)z 11
Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z soft constrain on frequencies efficient implementation 11
Analysis of Standard Decoder Standard Decoder: min zspan(u k ) P 1/ (Mz y) 1
Analysis of Standard Decoder Standard Decoder: min zspan(u k ) P 1/ (Mz y) Theorem 1. Let be a set of m indices selected independently from {1,...,n} with sampling distribution p R n,andm the associated sampling matrix. Let, (0, 1) and m > 3 ( k p) log k. With probability at least 1, the following holds for all x span(u k ) and all n R m. i) Let x be the solution of Standard Decoder with y = Mx + n. Then, kx xk 6 p P 1/ n m (1 ). (1) ii) There exist particular vectors n 0 R m such that the solution x of Standard Decoder with y = Mx + n 0 satisfies kx xk > 1 p m (1 + ) P 1/ n 0. () 1
Analysis of Standard Decoder Standard Decoder: min zspan(u k ) P 1/ (Mz y) Theorem 1. Let be a set of m indices selected independently from {1,...,n} with sampling distribution p R n,andm the associated sampling matrix. Let, (0, 1) and m > 3 ( k p) log k. With probability at least 1, the following holds for all x span(u k ) and all n R m. i) Let x be the solution of Standard Decoder with y = Mx + n. Then, kx xk 6 Exact recovery when noiseless p P 1/ n m (1 ). (1) ii) There exist particular vectors n 0 R m such that the solution x of Standard Decoder with y = Mx + n 0 satisfies kx xk > 1 p m (1 + ) P 1/ n 0. () 1
Analysis of Standard Decoder Standard Decoder: min zspan(u k ) P 1/ (Mz y) Theorem 1. Let be a set of m indices selected independently from {1,...,n} with sampling distribution p R n,andm the associated sampling matrix. Let, (0, 1) and m > 3 ( k p) log k. With probability at least 1, the following holds for all x span(u k ) and all n R m. i) Let x be the solution of Standard Decoder with y = Mx + n. Then, kx xk 6 Exact recovery when noiseless p P 1/ n m (1 ). (1) ii) There exist particular vectors n 0 R m such that the solution x of Standard Decoder with y = Mx + n 0 satisfies kx xk > 1 p m (1 + ) P 1/ n 0. () 1
Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z non-negative 13
Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z Filter reshapes Fourier coefficients h : R! R x h := U diag(ĥ) U x R n non-negative ĥ =(h( 1 ),...,h( n )) R n 13
Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z Filter reshapes Fourier coefficients h : R! R x h := U diag(ĥ) U x R n non-negative ĥ =(h( 1 ),...,h( n )) R n p(t) = dx i t i x p = U diag( ˆp) U x = i=0 dx i L i x i=0 Pick special polynomials and use e.g. recurrence relations for fast filtering (with sparse matrix-vector multiply only) 13
Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z non-negative non-decreasing = penalizes high-frequencies 14
Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z non-negative non-decreasing = penalizes high-frequencies Favours reconstruction of approximately band-limited signals Ideal filter yields Standard Decoder i k (t) := 0 if t [0, k], +1 otherwise, 14
Analysis of Efficient Decoder Theorem 1. Let, M, P, m as before and M max > 0 be a constant such that MP 1/ 6 M max. Let, (0, 1). With probability at least 1, the following holds for all x span(u k ),allnr n,all > 0, and all nonnegative and nondecreasing polynomial functions g such that g( k+1 ) > 0. Let x be the solution of E cient Decoder with y = Mx + n. Then, "! k 1 M xk 6 p + p max P 1/ n m(1 ) g( k+1 ) s g( k ) + M max g( k+1 ) + p! # g( k ) kxk, (1) and k k 6 1 p g( k+1 ) P 1/ n + s g( k ) g( k+1 ) kxk, () where := U k U k x and := (I U k U k ) x. 15
Analysis of Efficient Decoder Noiseless case: x xk 6 s 1 g( p k ) M max m(1 ) g( k+1 ) + p! g( k ) kxk + s g( k ) g( k+1 ) kxk g( k )=0 + non-decreasing implies perfect reconstruction 16
Analysis of Efficient Decoder Noiseless case: x xk 6 s 1 g( p k ) M max m(1 ) g( k+1 ) + p! g( k ) kxk + s g( k ) g( k+1 ) kxk g( k )=0 + non-decreasing implies perfect reconstruction Otherwise: choose as close as possible to 0 and seek to minimise the ratio g( k )/g( k+1 ) Noise: kp 1/ nk / kxk Choose filter to increase spectral gap? Clusters are of course good 16
Estimating the Optimal Distribution 17
Estimating the Optimal Distribution Need to estimate ku k ik 17
Estimating the Optimal Distribution Need to estimate ku k ik Filter random signals with ideal low-pass filter: r b k = U diag( 1,..., k, 0,...,0) U r = U k U k r E (r b k ) i = i U ku k E(rr ) U k U k i = ku k ik 17
Estimating the Optimal Distribution Need to estimate ku k ik Filter random signals with ideal low-pass filter: r b k = U diag( 1,..., k, 0,...,0) U r = U k U k r E (r b k ) i = i U ku k E(rr ) U k U k i = ku k ik In practice, one may use a polynomial approximation of the ideal filter and: p i := P n i=1 P L l=1 (rl c k ) i P L l=1 (rl c k ) i L > C log n 17
Estimating the Eigengap 18
Estimating the Eigengap Again, low-pass filtering random signals: (1 ) nx i=1 U j i 6 X n i=1 LX l=1 (r l b ) i 6 (1 + ) nx i=1 U j i 18
Estimating the Eigengap Again, low-pass filtering random signals: (1 ) Since: nx i=1 nx i=1 U j i U j 6 X n i i=1 LX l=1 = ku j k Frob = j (r l b ) i 6 (1 + ) nx i=1 U j i We have: (1 ) j 6 nx i=1 LX l=1 (r l b ) i 6 (1 + ) j Dichotomy using the filter bandwidth 18
Experiments unbalanced clusters 19
Experiments 0
Experiments 1
Experiments 7%
Compressive Spectral Clustering Clustering equivalent to recovery of cluster assignment functions Well-defined clusters -> band-limited assignment functions! 3
Compressive Spectral Clustering Clustering equivalent to recovery of cluster assignment functions Well-defined clusters -> band-limited assignment functions! Generate features by filtering random signals by Johnson-Lindenstrauss = 4+ / 3 /3 log n 3
Compressive Spectral Clustering Clustering equivalent to recovery of cluster assignment functions Well-defined clusters -> band-limited assignment functions! Generate features by filtering random signals by Johnson-Lindenstrauss = 4+ / 3 /3 log n Each feature map is smooth, therefore keep m > 6 k k log 0 Use k-means on compressed data and feed into Efficient Decoder 4
Compressive Spectral Clustering log k k log k 5
Conclusion 6
Conclusion Stable, robust and universal random sampling of smoothly varying information on graphs. 6
Conclusion Stable, robust and universal random sampling of smoothly varying information on graphs. Tractable decoder with guarantees 6
Conclusion Stable, robust and universal random sampling of smoothly varying information on graphs. Tractable decoder with guarantees Optimal sampling distribution depends on graph structure 6
Conclusion Stable, robust and universal random sampling of smoothly varying information on graphs. Tractable decoder with guarantees Optimal sampling distribution depends on graph structure Can be used for inference, (SVD less) compressive clustering 6
Thank you! 7