Random Sampling of Bandlimited Signals on Graphs

Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with Gilles Puy (INRIA), Nicolas Tremblay (INRIA) and Rémi Gribonval (INRIA) NIPS015 Workshop Multiresolution Methods for Large Scale Learning 1

Motivation Energy Networks Social Networks Transportation Networks Biological Networks Point Clouds

Goal Given partially observed information at the nodes of a graph? Can we robustly and efficiently infer missing information? What signal model? How many observations? Influence of the structure of the graph? 3

Notations G = {V, E, W} weighted, undirected V is the set of n nodes E is the set of edges W R n n is the weighted adjacency matrix L R n n combinatorial graph Laplacian L := D normalised Laplacian L := I D 1/ WD 1/ W diagonal degree matrix D has entries d i := P i6=j W ij 4

Notations L is real, symmetric PSD orthonormal eigenvectors U R n n Graph Fourier Matrix non-negative eigenvalues 1 6 6..., n L = U U 5

Notations L is real, symmetric PSD orthonormal eigenvectors U R n n Graph Fourier Matrix non-negative eigenvalues 1 6 6..., n L = U U k-bandlimited signals Fourier coefficients x = U k ˆx k x R n ˆx = U x ˆx k R k U k := (u 1,...,u k ) R n k first k eigenvectors only 5

Sampling Model nx p R n p i > 0 kpk 1 = P := diag(p) R n n i=1 p i =1 6

Sampling Model nx p R n p i > 0 kpk 1 = P := diag(p) R n n i=1 p i =1 Draw independently m samples (random sampling) P(! j = i) =p i, 8j {1,...,m} and 8i {1,...,n} 6

Sampling Model nx p R n p i > 0 kpk 1 = P := diag(p) R n n i=1 p i =1 Draw independently m samples (random sampling) P(! j = i) =p i, 8j {1,...,m} and 8i {1,...,n} y j := x!j, 8j {1,...,m} y = Mx 6

Sampling Model ku k ik ku ik = ku k ik k i k = ku k ik How much a perfect impulse can be concentrated on first k eigenvectors Carries interesting information about the graph 7

Sampling Model ku k ik ku ik = ku k ik k i k = ku k ik How much a perfect impulse can be concentrated on first k eigenvectors Carries interesting information about the graph Ideally: p i large wherever ku k ik is large 7

Stable Embedding Theorem 1 (Restricted isometry property). Let M be a random subsampling matrix with the sampling distribution p. Forany, (0, 1), with probability at least 1, (1 ) kxk 6 1 m MP 1/ x 6 (1 + ) kxk (1) for all x span(u k ) provided that m > 3 k ( k p) log. () MP 1/ x = P 1/ Mx Only need M, re-weighting offline 8

Stable Embedding ( k p) > k Need to sample at least k nodes 9

Stable Embedding ( k p) > k Need to sample at least k nodes Can we reduce to optimal amount? 9

Stable Embedding ( k p) > k Need to sample at least k nodes Can we reduce to optimal amount? Variable Density Sampling p i := ku k ik k, i =1,...,n is such that: ( k p) = k and depends on structure of graph Corollary 1. Let M be a random subsampling matrix constructed with the sampling distribution p.forany, (0, 1), with probability at least 1, (1 ) kxk 6 1 m MP 1/ x 6 (1 + ) kxk for all x span(u k ) provided that m > 3 k log k. 9

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding 10

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Standard Decoder min zspan(u k ) P 1/ (Mz y) 10

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Standard Decoder min zspan(u k ) P 1/ (Mz y) need projector 10

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Standard Decoder min zspan(u k ) P 1/ (Mz y) need projector re-weighting for RIP 10

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding 11

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Efficient Decoder: min zr n P 1/ (Mz y) + z g(l)z 11

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z soft constrain on frequencies efficient implementation 11

Analysis of Standard Decoder Standard Decoder: min zspan(u k ) P 1/ (Mz y) 1

Analysis of Standard Decoder Standard Decoder: min zspan(u k ) P 1/ (Mz y) Theorem 1. Let be a set of m indices selected independently from {1,...,n} with sampling distribution p R n,andm the associated sampling matrix. Let, (0, 1) and m > 3 ( k p) log k. With probability at least 1, the following holds for all x span(u k ) and all n R m. i) Let x be the solution of Standard Decoder with y = Mx + n. Then, kx xk 6 p P 1/ n m (1 ). (1) ii) There exist particular vectors n 0 R m such that the solution x of Standard Decoder with y = Mx + n 0 satisfies kx xk > 1 p m (1 + ) P 1/ n 0. () 1

Analysis of Standard Decoder Standard Decoder: min zspan(u k ) P 1/ (Mz y) Theorem 1. Let be a set of m indices selected independently from {1,...,n} with sampling distribution p R n,andm the associated sampling matrix. Let, (0, 1) and m > 3 ( k p) log k. With probability at least 1, the following holds for all x span(u k ) and all n R m. i) Let x be the solution of Standard Decoder with y = Mx + n. Then, kx xk 6 Exact recovery when noiseless p P 1/ n m (1 ). (1) ii) There exist particular vectors n 0 R m such that the solution x of Standard Decoder with y = Mx + n 0 satisfies kx xk > 1 p m (1 + ) P 1/ n 0. () 1

Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z non-negative 13

Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z Filter reshapes Fourier coefficients h : R! R x h := U diag(ĥ) U x R n non-negative ĥ =(h( 1 ),...,h( n )) R n 13

Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z Filter reshapes Fourier coefficients h : R! R x h := U diag(ĥ) U x R n non-negative ĥ =(h( 1 ),...,h( n )) R n p(t) = dx i t i x p = U diag( ˆp) U x = i=0 dx i L i x i=0 Pick special polynomials and use e.g. recurrence relations for fast filtering (with sparse matrix-vector multiply only) 13

Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z non-negative non-decreasing = penalizes high-frequencies 14

Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z non-negative non-decreasing = penalizes high-frequencies Favours reconstruction of approximately band-limited signals Ideal filter yields Standard Decoder i k (t) := 0 if t [0, k], +1 otherwise, 14

Analysis of Efficient Decoder Theorem 1. Let, M, P, m as before and M max > 0 be a constant such that MP 1/ 6 M max. Let, (0, 1). With probability at least 1, the following holds for all x span(u k ),allnr n,all > 0, and all nonnegative and nondecreasing polynomial functions g such that g( k+1 ) > 0. Let x be the solution of E cient Decoder with y = Mx + n. Then, "! k 1 M xk 6 p + p max P 1/ n m(1 ) g( k+1 ) s g( k ) + M max g( k+1 ) + p! # g( k ) kxk, (1) and k k 6 1 p g( k+1 ) P 1/ n + s g( k ) g( k+1 ) kxk, () where := U k U k x and := (I U k U k ) x. 15

Analysis of Efficient Decoder Noiseless case: x xk 6 s 1 g( p k ) M max m(1 ) g( k+1 ) + p! g( k ) kxk + s g( k ) g( k+1 ) kxk g( k )=0 + non-decreasing implies perfect reconstruction 16

Analysis of Efficient Decoder Noiseless case: x xk 6 s 1 g( p k ) M max m(1 ) g( k+1 ) + p! g( k ) kxk + s g( k ) g( k+1 ) kxk g( k )=0 + non-decreasing implies perfect reconstruction Otherwise: choose as close as possible to 0 and seek to minimise the ratio g( k )/g( k+1 ) Noise: kp 1/ nk / kxk Choose filter to increase spectral gap? Clusters are of course good 16

Estimating the Optimal Distribution 17

Estimating the Optimal Distribution Need to estimate ku k ik 17

Estimating the Optimal Distribution Need to estimate ku k ik Filter random signals with ideal low-pass filter: r b k = U diag( 1,..., k, 0,...,0) U r = U k U k r E (r b k ) i = i U ku k E(rr ) U k U k i = ku k ik In practice, one may use a polynomial approximation of the ideal filter and: p i := P n i=1 P L l=1 (rl c k ) i P L l=1 (rl c k ) i L > C log n 17

Estimating the Eigengap 18

Estimating the Eigengap Again, low-pass filtering random signals: (1 ) nx i=1 U j i 6 X n i=1 LX l=1 (r l b ) i 6 (1 + ) nx i=1 U j i 18

Estimating the Eigengap Again, low-pass filtering random signals: (1 ) Since: nx i=1 nx i=1 U j i U j 6 X n i i=1 LX l=1 = ku j k Frob = j (r l b ) i 6 (1 + ) nx i=1 U j i We have: (1 ) j 6 nx i=1 LX l=1 (r l b ) i 6 (1 + ) j Dichotomy using the filter bandwidth 18

Experiments unbalanced clusters 19

Experiments 0

Experiments 1

Experiments 7%

Compressive Spectral Clustering Clustering equivalent to recovery of cluster assignment functions Well-defined clusters -> band-limited assignment functions! 3

Compressive Spectral Clustering Clustering equivalent to recovery of cluster assignment functions Well-defined clusters -> band-limited assignment functions! Generate features by filtering random signals by Johnson-Lindenstrauss = 4+ / 3 /3 log n Each feature map is smooth, therefore keep m > 6 k k log 0 Use k-means on compressed data and feed into Efficient Decoder 4

Compressive Spectral Clustering log k k log k 5

Conclusion 6

Conclusion Stable, robust and universal random sampling of smoothly varying information on graphs. 6

Conclusion Stable, robust and universal random sampling of smoothly varying information on graphs. Tractable decoder with guarantees 6

Conclusion Stable, robust and universal random sampling of smoothly varying information on graphs. Tractable decoder with guarantees Optimal sampling distribution depends on graph structure 6

Conclusion Stable, robust and universal random sampling of smoothly varying information on graphs. Tractable decoder with guarantees Optimal sampling distribution depends on graph structure Can be used for inference, (SVD less) compressive clustering 6

Thank you! 7