Random Sampling of Bandlimited Signals on Graphs

Similar documents
Sampling, Inference and Clustering for Data on Graphs

Random sampling of bandlimited signals on graphs

Random sampling of bandlimited signals on graphs

Filtering and Sampling Graph Signals, and its Application to Compressive Spectral Clustering

Graph Partitioning Using Random Walks

Machine Learning for Data Science (CS4786) Lecture 11

arxiv: v1 [cs.it] 26 Sep 2018

Bayesian Methods for Sparse Signal Recovery

INTERPOLATION OF GRAPH SIGNALS USING SHIFT-INVARIANT GRAPH FILTERS

COMPSCI 514: Algorithms for Data Science

Super-resolution via Convex Programming

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Introduction to Compressed Sensing

Discrete Signal Processing on Graphs: Sampling Theory

LECTURE NOTE #11 PROF. ALAN YUILLE

1 Matrix notation and preliminaries from spectral graph theory

Chapter 4. Signed Graphs. Intuitively, in a weighted graph, an edge with a positive weight denotes similarity or proximity of its endpoints.

Digraph Fourier Transform via Spectral Dispersion Minimization

Data Mining Techniques

A Statistical Look at Spectral Graph Analysis. Deep Mukhopadhyay

Sparse Solutions of an Undetermined Linear System

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Learning Parametric Dictionaries for. Signals on Graphs

1 Matrix notation and preliminaries from spectral graph theory

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

Network Topology Inference from Non-stationary Graph Signals

Inverse problems, Dictionary based Signal Models and Compressed Sensing

Spectra of Adjacency and Laplacian Matrices

Networks and Their Spectra

An algebraic perspective on integer sparse recovery

Recovering overcomplete sparse representations from structured sensing

arxiv: v3 [stat.ml] 29 May 2018

CS281 Section 4: Factor Analysis and PCA

Compressed Sensing and Neural Networks

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Reconstruction of Graph Signals through Percolation from Seeding Nodes

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory

Sketching for Large-Scale Learning of Mixture Models

Lecture 10: Dimension Reduction Techniques

Chris Bishop s PRML Ch. 8: Graphical Models

The Analysis Cosparse Model for Signals and Images

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Reconstruction of graph signals: percolation from a single seeding node

Unsupervised dimensionality reduction

Solutions to Review Problems for Chapter 6 ( ), 7.1

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles

Graph sampling with determinantal processes

Communities, Spectral Clustering, and Random Walks

Markov Chains, Random Walks on Graphs, and the Laplacian

Markov Chains and Spectral Clustering

Methods for sparse analysis of high-dimensional data, II

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

On the coherence barrier and analogue problems in compressed sensing

Fast and Robust Phase Retrieval

A New Estimate of Restricted Isometry Constants for Sparse Solutions

Discrete Signal Processing on Graphs: Sampling Theory

Compressive Sensing and Beyond

Convergence of Eigenspaces in Kernel Principal Component Analysis

Statistical Pattern Recognition

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Uncertainty Principle and Sampling of Signals Defined on Graphs

Learning graphs from data:

Community detection in stochastic block models via spectral methods

Compressed Sensing: Extending CLEAN and NNLS

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Rémi Gribonval Inria Rennes - Bretagne Atlantique.

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

Compressed Sensing: Lecture I. Ronald DeVore

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

FuncICA for time series pattern discovery

Tutorial: Sparse Signal Recovery

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

Spectral Clustering. Zitao Liu

A New Spectral Technique Using Normalized Adjacency Matrices for Graph Matching 1

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

2.3. Clustering or vector quantization 57

Reconstruction of Graph Signals through Percolation from Seeding Nodes

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

8.1 Concentration inequality for Gaussian random matrix (cont d)

Irregularity-Aware Graph Fourier Transforms

Fitting a Graph to Vector Data. Samuel I. Daitch (Yale) Jonathan A. Kelner (MIT) Daniel A. Spielman (Yale)

Lecture 8 : Eigenvalues and Eigenvectors

arxiv: v1 [cs.ds] 5 Feb 2016

Recurrence Relations and Fast Algorithms

Graph Functional Methods for Climate Partitioning

Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering. Some Slides

Sparse Subspace Clustering

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016

Graph Signal Processing

Eigenvalue problems and optimization

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

A fast randomized algorithm for approximating an SVD of a matrix

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

Transcription:

Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with Gilles Puy (INRIA), Nicolas Tremblay (INRIA) and Rémi Gribonval (INRIA) NIPS015 Workshop Multiresolution Methods for Large Scale Learning 1

Motivation Energy Networks Social Networks Transportation Networks Biological Networks Point Clouds

Goal Given partially observed information at the nodes of a graph? Can we robustly and efficiently infer missing information? What signal model? How many observations? Influence of the structure of the graph? 3

Notations G = {V, E, W} weighted, undirected V is the set of n nodes E is the set of edges W R n n is the weighted adjacency matrix L R n n combinatorial graph Laplacian L := D normalised Laplacian L := I D 1/ WD 1/ W diagonal degree matrix D has entries d i := P i6=j W ij 4

Notations L is real, symmetric PSD orthonormal eigenvectors U R n n Graph Fourier Matrix non-negative eigenvalues 1 6 6..., n L = U U 5

Notations L is real, symmetric PSD orthonormal eigenvectors U R n n Graph Fourier Matrix non-negative eigenvalues 1 6 6..., n L = U U k-bandlimited signals Fourier coefficients x = U k ˆx k x R n ˆx = U x ˆx k R k U k := (u 1,...,u k ) R n k first k eigenvectors only 5

Sampling Model nx p R n p i > 0 kpk 1 = P := diag(p) R n n i=1 p i =1 6

Sampling Model nx p R n p i > 0 kpk 1 = P := diag(p) R n n i=1 p i =1 Draw independently m samples (random sampling) P(! j = i) =p i, 8j {1,...,m} and 8i {1,...,n} 6

Sampling Model nx p R n p i > 0 kpk 1 = P := diag(p) R n n i=1 p i =1 Draw independently m samples (random sampling) P(! j = i) =p i, 8j {1,...,m} and 8i {1,...,n} y j := x!j, 8j {1,...,m} y = Mx 6

Sampling Model ku k ik ku ik = ku k ik k i k = ku k ik How much a perfect impulse can be concentrated on first k eigenvectors Carries interesting information about the graph 7

Sampling Model ku k ik ku ik = ku k ik k i k = ku k ik How much a perfect impulse can be concentrated on first k eigenvectors Carries interesting information about the graph Ideally: p i large wherever ku k ik is large 7

Sampling Model ku k ik ku ik = ku k ik k i k = ku k ik How much a perfect impulse can be concentrated on first k eigenvectors Carries interesting information about the graph Ideally: p i large wherever ku k ik is large Graph Coherence k p := max 16i6n Rem: k p n p 1/ i ku k ik o > p k 7

Stable Embedding Theorem 1 (Restricted isometry property). Let M be a random subsampling matrix with the sampling distribution p. Forany, (0, 1), with probability at least 1, (1 ) kxk 6 1 m MP 1/ x 6 (1 + ) kxk (1) for all x span(u k ) provided that m > 3 k ( k p) log. () 8

Stable Embedding Theorem 1 (Restricted isometry property). Let M be a random subsampling matrix with the sampling distribution p. Forany, (0, 1), with probability at least 1, (1 ) kxk 6 1 m MP 1/ x 6 (1 + ) kxk (1) for all x span(u k ) provided that m > 3 k ( k p) log. () MP 1/ x = P 1/ Mx Only need M, re-weighting offline 8

Stable Embedding Theorem 1 (Restricted isometry property). Let M be a random subsampling matrix with the sampling distribution p. Forany, (0, 1), with probability at least 1, (1 ) kxk 6 1 m MP 1/ x 6 (1 + ) kxk (1) for all x span(u k ) provided that m > 3 k ( k p) log. () MP 1/ x = P 1/ Mx Only need M, re-weighting offline ( k p) > k Need to sample at least k nodes 8

Stable Embedding Theorem 1 (Restricted isometry property). Let M be a random subsampling matrix with the sampling distribution p. Forany, (0, 1), with probability at least 1, (1 ) kxk 6 1 m MP 1/ x 6 (1 + ) kxk (1) for all x span(u k ) provided that m > 3 k ( k p) log. () MP 1/ x = P 1/ Mx Only need M, re-weighting offline ( k p) > k Need to sample at least k nodes Proof similar to CS in bounded ONB but simpler since model is a subspace (not a union) 8

Stable Embedding ( k p) > k Need to sample at least k nodes 9

Stable Embedding ( k p) > k Need to sample at least k nodes Can we reduce to optimal amount? 9

Stable Embedding ( k p) > k Need to sample at least k nodes Can we reduce to optimal amount? Variable Density Sampling p i := ku k ik k, i =1,...,n is such that: ( k p) = k and depends on structure of graph 9

Stable Embedding ( k p) > k Need to sample at least k nodes Can we reduce to optimal amount? Variable Density Sampling p i := ku k ik k, i =1,...,n is such that: ( k p) = k and depends on structure of graph Corollary 1. Let M be a random subsampling matrix constructed with the sampling distribution p.forany, (0, 1), with probability at least 1, (1 ) kxk 6 1 m MP 1/ x 6 (1 + ) kxk for all x span(u k ) provided that m > 3 k log k. 9

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding 10

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Standard Decoder min zspan(u k ) P 1/ (Mz y) 10

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Standard Decoder min zspan(u k ) P 1/ (Mz y) need projector 10

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Standard Decoder min zspan(u k ) P 1/ (Mz y) need projector re-weighting for RIP 10

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding 11

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Efficient Decoder: min zr n P 1/ (Mz y) + z g(l)z 11

Recovery Procedures y = Mx + n y R m x span(u k ) stable embedding Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z soft constrain on frequencies efficient implementation 11

Analysis of Standard Decoder Standard Decoder: min zspan(u k ) P 1/ (Mz y) 1

Analysis of Standard Decoder Standard Decoder: min zspan(u k ) P 1/ (Mz y) Theorem 1. Let be a set of m indices selected independently from {1,...,n} with sampling distribution p R n,andm the associated sampling matrix. Let, (0, 1) and m > 3 ( k p) log k. With probability at least 1, the following holds for all x span(u k ) and all n R m. i) Let x be the solution of Standard Decoder with y = Mx + n. Then, kx xk 6 p P 1/ n m (1 ). (1) ii) There exist particular vectors n 0 R m such that the solution x of Standard Decoder with y = Mx + n 0 satisfies kx xk > 1 p m (1 + ) P 1/ n 0. () 1

Analysis of Standard Decoder Standard Decoder: min zspan(u k ) P 1/ (Mz y) Theorem 1. Let be a set of m indices selected independently from {1,...,n} with sampling distribution p R n,andm the associated sampling matrix. Let, (0, 1) and m > 3 ( k p) log k. With probability at least 1, the following holds for all x span(u k ) and all n R m. i) Let x be the solution of Standard Decoder with y = Mx + n. Then, kx xk 6 Exact recovery when noiseless p P 1/ n m (1 ). (1) ii) There exist particular vectors n 0 R m such that the solution x of Standard Decoder with y = Mx + n 0 satisfies kx xk > 1 p m (1 + ) P 1/ n 0. () 1

Analysis of Standard Decoder Standard Decoder: min zspan(u k ) P 1/ (Mz y) Theorem 1. Let be a set of m indices selected independently from {1,...,n} with sampling distribution p R n,andm the associated sampling matrix. Let, (0, 1) and m > 3 ( k p) log k. With probability at least 1, the following holds for all x span(u k ) and all n R m. i) Let x be the solution of Standard Decoder with y = Mx + n. Then, kx xk 6 Exact recovery when noiseless p P 1/ n m (1 ). (1) ii) There exist particular vectors n 0 R m such that the solution x of Standard Decoder with y = Mx + n 0 satisfies kx xk > 1 p m (1 + ) P 1/ n 0. () 1

Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z non-negative 13

Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z Filter reshapes Fourier coefficients h : R! R x h := U diag(ĥ) U x R n non-negative ĥ =(h( 1 ),...,h( n )) R n 13

Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z Filter reshapes Fourier coefficients h : R! R x h := U diag(ĥ) U x R n non-negative ĥ =(h( 1 ),...,h( n )) R n p(t) = dx i t i x p = U diag( ˆp) U x = i=0 dx i L i x i=0 Pick special polynomials and use e.g. recurrence relations for fast filtering (with sparse matrix-vector multiply only) 13

Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z non-negative non-decreasing = penalizes high-frequencies 14

Analysis of Efficient Decoder Efficient Decoder: min P 1/ zr n (Mz y) + z g(l)z non-negative non-decreasing = penalizes high-frequencies Favours reconstruction of approximately band-limited signals Ideal filter yields Standard Decoder i k (t) := 0 if t [0, k], +1 otherwise, 14

Analysis of Efficient Decoder Theorem 1. Let, M, P, m as before and M max > 0 be a constant such that MP 1/ 6 M max. Let, (0, 1). With probability at least 1, the following holds for all x span(u k ),allnr n,all > 0, and all nonnegative and nondecreasing polynomial functions g such that g( k+1 ) > 0. Let x be the solution of E cient Decoder with y = Mx + n. Then, "! k 1 M xk 6 p + p max P 1/ n m(1 ) g( k+1 ) s g( k ) + M max g( k+1 ) + p! # g( k ) kxk, (1) and k k 6 1 p g( k+1 ) P 1/ n + s g( k ) g( k+1 ) kxk, () where := U k U k x and := (I U k U k ) x. 15

Analysis of Efficient Decoder Noiseless case: x xk 6 s 1 g( p k ) M max m(1 ) g( k+1 ) + p! g( k ) kxk + s g( k ) g( k+1 ) kxk g( k )=0 + non-decreasing implies perfect reconstruction 16

Analysis of Efficient Decoder Noiseless case: x xk 6 s 1 g( p k ) M max m(1 ) g( k+1 ) + p! g( k ) kxk + s g( k ) g( k+1 ) kxk g( k )=0 + non-decreasing implies perfect reconstruction Otherwise: choose as close as possible to 0 and seek to minimise the ratio g( k )/g( k+1 ) Noise: kp 1/ nk / kxk Choose filter to increase spectral gap? Clusters are of course good 16

Estimating the Optimal Distribution 17

Estimating the Optimal Distribution Need to estimate ku k ik 17

Estimating the Optimal Distribution Need to estimate ku k ik Filter random signals with ideal low-pass filter: r b k = U diag( 1,..., k, 0,...,0) U r = U k U k r E (r b k ) i = i U ku k E(rr ) U k U k i = ku k ik 17

Estimating the Optimal Distribution Need to estimate ku k ik Filter random signals with ideal low-pass filter: r b k = U diag( 1,..., k, 0,...,0) U r = U k U k r E (r b k ) i = i U ku k E(rr ) U k U k i = ku k ik In practice, one may use a polynomial approximation of the ideal filter and: p i := P n i=1 P L l=1 (rl c k ) i P L l=1 (rl c k ) i L > C log n 17

Estimating the Eigengap 18

Estimating the Eigengap Again, low-pass filtering random signals: (1 ) nx i=1 U j i 6 X n i=1 LX l=1 (r l b ) i 6 (1 + ) nx i=1 U j i 18

Estimating the Eigengap Again, low-pass filtering random signals: (1 ) Since: nx i=1 nx i=1 U j i U j 6 X n i i=1 LX l=1 = ku j k Frob = j (r l b ) i 6 (1 + ) nx i=1 U j i We have: (1 ) j 6 nx i=1 LX l=1 (r l b ) i 6 (1 + ) j Dichotomy using the filter bandwidth 18

Experiments unbalanced clusters 19

Experiments 0

Experiments 1

Experiments 7%

Compressive Spectral Clustering Clustering equivalent to recovery of cluster assignment functions Well-defined clusters -> band-limited assignment functions! 3

Compressive Spectral Clustering Clustering equivalent to recovery of cluster assignment functions Well-defined clusters -> band-limited assignment functions! Generate features by filtering random signals by Johnson-Lindenstrauss = 4+ / 3 /3 log n 3

Compressive Spectral Clustering Clustering equivalent to recovery of cluster assignment functions Well-defined clusters -> band-limited assignment functions! Generate features by filtering random signals by Johnson-Lindenstrauss = 4+ / 3 /3 log n Each feature map is smooth, therefore keep m > 6 k k log 0 Use k-means on compressed data and feed into Efficient Decoder 4

Compressive Spectral Clustering log k k log k 5

Conclusion 6

Conclusion Stable, robust and universal random sampling of smoothly varying information on graphs. 6

Conclusion Stable, robust and universal random sampling of smoothly varying information on graphs. Tractable decoder with guarantees 6

Conclusion Stable, robust and universal random sampling of smoothly varying information on graphs. Tractable decoder with guarantees Optimal sampling distribution depends on graph structure 6

Conclusion Stable, robust and universal random sampling of smoothly varying information on graphs. Tractable decoder with guarantees Optimal sampling distribution depends on graph structure Can be used for inference, (SVD less) compressive clustering 6

Thank you! 7