Filtering and Sampling Graph Signals, and its Application to Compressive Spectral Clustering

Similar documents
Random Sampling of Bandlimited Signals on Graphs

Sampling, Inference and Clustering for Data on Graphs

Random sampling of bandlimited signals on graphs

arxiv: v1 [cs.ds] 5 Feb 2016

Random sampling of bandlimited signals on graphs

Graph sampling with determinantal processes

arxiv: v3 [stat.ml] 29 May 2018

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

Spectral Clustering. Guokun Lai 2016/10

A PROBABILISTIC INTERPRETATION OF SAMPLING THEORY OF GRAPH SIGNALS. Akshay Gadde and Antonio Ortega

arxiv: v1 [cs.it] 26 Sep 2018

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)

Sketching for Large-Scale Learning of Mixture Models

Spectral Graph Wavelets on the Cortical Connectome and Regularization of the EEG Inverse Problem

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Digraph Fourier Transform via Spectral Dispersion Minimization

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Global vs. Multiscale Approaches

Spectral Clustering. Zitao Liu

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Statistical Machine Learning

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Inverse problems, Dictionary based Signal Models and Compressed Sensing

Spectral Techniques for Clustering

DATA defined on network-like structures are encountered

The non-backtracking operator

Data-dependent representations: Laplacian Eigenmaps

INTERPOLATION OF GRAPH SIGNALS USING SHIFT-INVARIANT GRAPH FILTERS

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Graph Partitioning Using Random Walks

LECTURE NOTE #11 PROF. ALAN YUILLE

Spectral Clustering on Handwritten Digits Database

Graphs in Machine Learning

Network Topology Inference from Non-stationary Graph Signals

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

CMPSCI 791BB: Advanced ML: Laplacian Learning

Convergence Rates of Kernel Quadrature Rules

Multiscale Wavelets on Trees, Graphs and High Dimensional Data

Weighted Nonlocal Laplacian on Interpolation from Sparse Data

Graph Signal Processing

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels

Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions

SPECTRAL ANOMALY DETECTION USING GRAPH-BASED FILTERING FOR WIRELESS SENSOR NETWORKS. Hilmi E. Egilmez and Antonio Ortega

Kernel Learning via Random Fourier Representations

Learning gradients: prescriptive models

CS6220: DATA MINING TECHNIQUES

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

1 Matrix notation and preliminaries from spectral graph theory

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

TUM 2016 Class 3 Large scale learning by regularization

Probabilistic & Unsupervised Learning

Provable Alternating Minimization Methods for Non-convex Optimization

CS249: ADVANCED DATA MINING

Graph Functional Methods for Climate Partitioning

Conjugate gradient acceleration of non-linear smoothing filters

Kernels A Machine Learning Overview

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

Semi-Supervised Learning by Multi-Manifold Separation

Conjugate gradient acceleration of non-linear smoothing filters Iterated edge-preserving smoothing

Locally-biased analytics

Reconstruction in the Generalized Stochastic Block Model

Data dependent operators for the spatial-spectral fusion problem

Recovering any low-rank matrix, provably

MATH 829: Introduction to Data Mining and Analysis Clustering II

Graphs, Geometry and Semi-supervised Learning

How to learn from very few examples?

Machine Learning - MT Clustering

Combining geometry and combinatorics

1 Matrix notation and preliminaries from spectral graph theory

Wavelets and Filter Banks on Graphs

Nonlinear Dimensionality Reduction

Wavelets on Graphs, an Introduction

Lecture 22: More On Compressed Sensing

8.1 Concentration inequality for Gaussian random matrix (cont d)

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

DELFT UNIVERSITY OF TECHNOLOGY

Analysis of Spectral Kernel Design based Semi-supervised Learning

Geometric Modeling Summer Semester 2012 Linear Algebra & Function Spaces

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

Discrete Signal Processing on Graphs: Sampling Theory

Linear Spectral Hashing

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Machine Learning for Data Science (CS4786) Lecture 11

Communities, Spectral Clustering, and Random Walks

PCA and admixture models

Preprocessing & dimensionality reduction

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Design of graph filters and filterbanks

A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities

Sketched Ridge Regression:

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Spectral Feature Selection for Supervised and Unsupervised Learning

Predicting Graph Labels using Perceptron. Shuang Song

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Data Mining Techniques

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Transcription:

Filtering and Sampling Graph Signals, and its Application to Compressive Spectral Clustering Nicolas Tremblay (1,2), Gilles Puy (1), Rémi Gribonval (1), Pierre Vandergheynst (1,2) (1) PANAMA Team, INRIA Rennes, France (2) Signal Processing Laboratory 2, EPFL, Switzerland

Introduction to GSP Graph sampling Application to clustering Conclusion Why graph signal processing? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 1 / 47

Introduction to GSP Graph Fourier Transform Graph filtering Graph sampling Application to clustering What is Spectral Clustering? Compressive Spectral Clustering A toy experiment Experiments on the SBM Conclusion N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 2 / 47

Introduction to graph signal processing : graph Fourier transform N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 3 / 47

What s a graph signal? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 4 / 47

Three useful matrices The adjacency matrix : The degree matrix : 0 1 1 0 2 0 0 0 W = 1 0 1 1 1 1 0 0 S = 0 3 0 0 0 0 2 0 0 1 0 0 0 0 0 1 The Laplacian matrix : 2 1 1 0 L = S W = 1 3 1 1 1 1 2 0 0 1 0 1 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 5 / 47

Three useful matrices The adjacency matrix : The degree matrix : 0.5.5 0 1 0 0 0 W =.5 0.5 4.5.5 0 0 S = 0 5 0 0 0 0 1 0 0 4 0 0 0 0 0 4 The Laplacian matrix : 1.5.5 0 L = S W =.5 5.5 4.5.5 1 0 0 4 0 4 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 5 / 47

What s a graph Fourier transform? [Hammond 11] L = S W = UΛU U is the Fourier basis of the graph the Fourier transform of a signal x reads : ˆx = U x Λ = Diag(λ 1, λ 2,, λ N ) the spectrum A low frequency Fourier mode A high frequency Fourier mode N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 6 / 47

The graph Fourier transform encodes the structure of the graph Slide courtesy of D. Shuman N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 7 / 47

Introduction to graph signal processing : filtering graph signals N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 8 / 47

Graph filtering 1 Given a filter function h defined in the Fourier space. g(λ) h 0.8 0.6 0.4 0.2 0 1 2 λ In the node space, the signal x filtered by h reads : x h = U h(λ) U x = Hx N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 9 / 47

Graph filtering 1 Given a filter function h defined in the Fourier space. g(λ) h 0.8 0.6 0.4 0.2 0 1 2 λ In the node space, the signal x filtered by h reads : Problem : this costs L s diagonalisation [O(N 3 )]. x h = U h(λ) U x = Hx N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 9 / 47

Graph filtering 1 Given a filter function h defined in the Fourier space. g(λ) h 0.8 0.6 0.4 0.2 0 1 2 λ In the node space, the signal x filtered by h reads : Problem : this costs L s diagonalisation [O(N 3 )]. x h = U h(λ) U x = Hx Solution : we use a poly approx of order p of h : p h(λ) = α l λ l h(λ). Indeed, in this case : p Hx = U h(λ)u x = U α l Λ l U x = l=1 l=1 l=1 p α l L l x Hx Only involves matrix-vector multiplications [costs O(pN)]. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 9 / 47

A few applications Tikhonov regularization for denoising : argmin f { f y 2 2 + γf Lf } Wavelet denoising : argmin a { f W a 2 2 + γ a 1,µ } Compression via filterbanks, etc. Slide courtesy of D. Shuman N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 10 / 47

Introduction to GSP Graph Fourier Transform Graph filtering Graph sampling Application to clustering What is Spectral Clustering? Compressive Spectral Clustering A toy experiment Experiments on the SBM Conclusion N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 11 / 47

Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 12 / 47

Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 12 / 47

Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only How to reconstruct the original signal? Basically, we need : 1. a (low-dimensional) model for the signal to sample 2. a method to choose the nodes to sample 3. a decoder that exactly recovers the signal given its samples N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 12 / 47

Smoothness assumption In 1D signal processing, a smooth signal has most of its energy at low frequencies. 1 3 0.9 0.8 2.5 0.7 2 0.6 0.5 1.5 0.4 0.3 1 0.2 0.5 0.1 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Smooth signal in time 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Fourier transform N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 13 / 47

Smoothness assumption In 1D signal processing, a smooth signal has most of its energy at low frequencies. 1 3 0.9 0.8 2.5 0.7 2 0.6 0.5 1.5 0.4 0.3 1 0.2 0.5 0.1 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Smooth signal in time 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Fourier transform Definition (Bandlimited graph signal [Puy 15, Chen 15, Anis 16, Segarra 15] ) A k-bandlimited signal x R N on G is a signal that satisfies, for some ˆα R k x = U k ˆα, N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 13 / 47

Sampling band-limited graph signals Preparation : Associate to each node i a probability p i to draw this node. This defines a probability distribution p R N. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 14 / 47

Sampling band-limited graph signals Preparation : Associate to each node i a probability p i to draw this node. This defines a probability distribution p R N. Sampling procedure : draw n nodes according to p : {ω i } i [1,n]. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 14 / 47

Sampling band-limited graph signals Preparation : Associate to each node i a probability p i to draw this node. This defines a probability distribution p R N. Sampling procedure : draw n nodes according to p : {ω i } i [1,n]. We create a matrix M that measures the signal x only on the selected nodes : { 1 if j = ωi M ij := 0 otherwise, For any signal x R N on G, its sampled version is y = Mx (it has size n < N). N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 14 / 47

Optimizing the sampling distribution Some nodes are more important to sample than others. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47

Optimizing the sampling distribution Some nodes are more important to sample than others. For any signal x, remember that U k x is the energy of x on the first k 2 frequencies. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47

Optimizing the sampling distribution Some nodes are more important to sample than others. For any signal x, remember that U k x is the energy of x on the first k 2 frequencies. Then : 1. For each node i, construct the Dirac δ i centered at node i. 2. Compute U k δ i 2 (we have 0 U k δ i 2 1). N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47

Optimizing the sampling distribution Some nodes are more important to sample than others. For any signal x, remember that U k x is the energy of x on the first k 2 frequencies. Then : 1. For each node i, construct the Dirac δ i centered at node i. 2. Compute U k δ i 2 (we have 0 U k δ i 2 1). If U k δ i 2 1 : there exists a smooth signal concentrated on node i. Node i is important. If U k δ i 2 0 : no smooth signal has energy concentrated on node i. Node i can be sampled with less probability. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47

The graph weighted coherence We measure the quality of p with the graph weighted coherence. Definition (Graph weighted coherence) Let p R n represent a sampling distribution on {1,..., N}. The graph weighted coherence of order k for the pair (G, p) is { } νp k := max p 1/2 i U k δ i 1 i N 2. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 16 / 47

How many nodes to select? Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ɛ (0, 1), with probability at least 1 ɛ, (1 δ) x 1 x 2 2 2 1 MP 1/2 (x 1 x 2) 2 (1 + δ) x 1 x 2 2 2 n 2 for all x 1, x 2 span(u k ) provided that n 3 δ 2 (νk p ) 2 log ( ) 2k. ɛ N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47

How many nodes to select? Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ɛ (0, 1), with probability at least 1 ɛ, (1 δ) x 1 x 2 2 2 1 MP 1/2 (x 1 x 2) 2 (1 + δ) x 1 x 2 2 2 n 2 for all x 1, x 2 span(u k ) provided that n 3 δ 2 (νk p ) 2 log ( ) 2k. ɛ Let s minimize ν k p! Its lower bound, k, may always be reached for p : i [1, N] p i = U k δ i 2 2 /k N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47

How many nodes to select? Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ɛ (0, 1), with probability at least 1 ɛ, (1 δ) x 1 x 2 2 2 1 MP 1/2 (x 1 x 2) 2 (1 + δ) x 1 x 2 2 2 n 2 for all x 1, x 2 span(u k ) provided that n 3 δ 2 (νk p ) 2 log ( ) 2k. ɛ Let s minimize ν k p! Its lower bound, k, may always be reached for p : i [1, N] p i = U k δ i 2 2 /k With p, one needs n k log (k) up to the log factor, it is optimal! N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47

How many nodes to select? Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ɛ (0, 1), with probability at least 1 ɛ, (1 δ) x 1 x 2 2 2 1 MP 1/2 (x 1 x 2) 2 (1 + δ) x 1 x 2 2 2 n 2 for all x 1, x 2 span(u k ) provided that n 3 δ 2 (νk p ) 2 log ( ) 2k. ɛ Let s minimize ν k p! Its lower bound, k, may always be reached for p : i [1, N] p i = U k δ i 2 2 /k With p, one needs n k log (k) up to the log factor, it is optimal! We have an efficient algorithm that estimates p in O(pN log N)! N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47

Reconstruction We sampled the signal x R N, i.e., we measured y = Mx + n (n R n models noise). The goal is to estimate x from y. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 18 / 47

Reconstruction We sampled the signal x R N, i.e., we measured y = Mx + n (n R n models noise). The goal is to estimate x from y. We propose to solve (links with the SSL literature [Chapelle 10, Fu 12]) P 1/2 min z R N Ω (Mz y) 2 + γ z g(l)z, 2 where γ > 0 and g : R R is a nonnegative and nondecreasing poly function. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 18 / 47

Reconstruction Solving P 1/2 min z R N Ω (Mz y) 2 + γ z g(l)z, 2 can be done, e.g., by gradient descent or conjugate gradient. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 19 / 47

Reconstruction Solving P 1/2 min z R N Ω (Mz y) 2 + γ z g(l)z, 2 can be done, e.g., by gradient descent or conjugate gradient. It is fast as it involves only matrix-vector multiplications with sparse matrices. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 19 / 47

Reconstruction Solving P 1/2 min z R N Ω (Mz y) 2 + γ z g(l)z, 2 can be done, e.g., by gradient descent or conjugate gradient. It is fast as it involves only matrix-vector multiplications with sparse matrices. We proved that the result is accurate and stable to noise : The quality of the reconstruction depends on the eigengap ratio g(λ k )/g(λ k+1 ). γ should be adjusted with the signal-to-noise ratio. In absence of noise, the reconstruction quality improves when g(λ k )/g(λ k+1 ) 0 and γ 0. If g(λ k ) = 0 and g(λ k+1 ) > 0, we have exact recovery. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 19 / 47

Recap Given a graph and its Laplacian matrix L. Given any graph signal x defined on this graph, one can : 1. filter this signal with any filter h(λ) : x h = U h(λ) U [O(N 3 )], 2. fast filter it w/ poly approx h(λ) p l=1 α lλ l : x h p l=1 α ll l x [O(pN)]. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 20 / 47

Recap Given a graph and its Laplacian matrix L. Given any graph signal x defined on this graph, one can : 1. filter this signal with any filter h(λ) : x h = U h(λ) U [O(N 3 )], 2. fast filter it w/ poly approx h(λ) p l=1 α lλ l : x h p l=1 α ll l x [O(pN)]. Given a k-bandlimited graph signal x defined on this graph, one can : 1. estimate the optimal probability distrib pi = U k δ i 2 /k [O(pN log N)] 2 2. sample n = O(k log k) nodes from this distribution 3. measure the signal y = Mx R n 4. reconstruct the signal : [O(pN)] x rec = argmin P 1/2 Ω (Mz y) 2 + γ z g(l)z z R N 2 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 20 / 47

Introduction to GSP Graph Fourier Transform Graph filtering Graph sampling Application to clustering What is Spectral Clustering? Compressive Spectral Clustering A toy experiment Experiments on the SBM Conclusion N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 21 / 47

Application to clustering : What is Spectral Clustering? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 22 / 47

Given a series of N objects : N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 23 / 47

Given a series of N objects : 1/ Find adapted descriptors N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 23 / 47

Given a series of N objects : 1/ Find adapted descriptors 2/ Cluster N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 23 / 47

From the N objects, one creates : N vectors : x 1, x 2,, x N and their distance matrix R N N. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 24 / 47

From the N objects, one creates : N vectors : x 1, x 2,, x N and their distance matrix R N N. Goal of clustering : assign a label c(i) = 1,, k to each object i in order to organize / simplify / analyze the data. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 24 / 47

From the N objects, one creates : N vectors : x 1, x 2,, x N and their distance matrix R N N. Goal of clustering : assign a label c(i) = 1,, k to each object i in order to organize / simplify / analyze the data. There exists two different general types of methods : methods directly based on the x i and/or like k-means or hierarchical clustering. graph-based methods. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 24 / 47

Graph construction from the distance matrix Create a graph G = (V, E) : each node in V is one of the N objects each pair of nodes (i, j) is connected if (i, j) is small enough. For example, two connectivity possibilities : Gaussian kernel : 1. all pairs of nodes are connected with links of weights exp( (i, j)/σ) 2. remove all links of weight inferior to ɛ k nearest neighbors : connect each node to its k nearest neighbors. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 25 / 47

The clustering problem now states : Given the graph G representing the similarity between the N objects, find a partition of all nodes into k clusters. Many methods exist [Fortunato 10] : Modularity (or other cost-function) optimisation methods [Newman 06] Random walk methods [Schaub 12] Methods inspired from statistical physics [Krzakala 13], information theory [Rosvall 08]... spectral methods... N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 26 / 47

The classical spectral clustering (SC) algorithm [Von Luxburg 06] : Given the N-node graph G of laplacian matrix L : 1. Compute L s first k eigenvectors : U k = (u 1 u 2 u k ). 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 27 / 47

The classical spectral clustering (SC) algorithm [Von Luxburg 06] : Given the N-node graph G of laplacian matrix L : 1. Compute L s first k eigenvectors : U k = (u 1 u 2 u k ). 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. Definition : Let us call D ij the spectral clustering distance. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 27 / 47

What s the point of using a graph? N points in d = 2 dims. Result with k-means (k=2) on : After creating a graph, partial diagonalisation of L and running k-means (k=2) on D : N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 28 / 47

Application to clustering : Compressive Spectral Clustering N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 29 / 47

Our goal Problem : N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k 3 + Nk 2 )] [Chen 11a] 2. high-dimensional k-means [O(Nk 2 )]. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 30 / 47

Our goal Problem : N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k 3 + Nk 2 )] [Chen 11a] 2. high-dimensional k-means [O(Nk 2 )]. Goal : SC in high dimensions : with N 10 6 nodes and/or k 100. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 30 / 47

Our goal Problem : N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k 3 + Nk 2 )] [Chen 11a] 2. high-dimensional k-means [O(Nk 2 )]. Goal : SC in high dimensions : with N 10 6 nodes and/or k 100. Contribution : an algorithm that approximates the true SC solution with controlled relative error with a running time in O(k 2 log 2 k + pn(log(n) + k)). N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 30 / 47

Main ideas of Compressive Spectral Clustering (CSC) : CSC is based on two main observations : 1. SC does not need explicitly f i = U k δ i, but only D ij = f i f j 2. each cluster indicator function c j R N is in fact approx. k-bandlimited! j [1, k] c j is close to span(u k ) CSC follows 4 steps : 1. Estimate D ij by filtering d random graph signals, 2. Sample n nodes out of the N available ones, 3. Run low-dim k-means on these n nodes to obtain c r j R n, 4. Reconstruct each reduced cluster indicator function c r j back on the whole graph to obtain c j, as desired. (Steps 2 to 4 already covered!) Step 1 : How to estimate D ij without computing U k? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 31 / 47

Remember : the classical spectral clustering algorithm Given the N-node graph G of Laplacian matrix L : 1. Compute L s first k eigenvectors : U k = (u 1 u 2 u k ). 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with D ij = f i f j and obtain k clusters. Our goal : Estimate D ij without computing exactly U k. D ij = U k (δ i δ j ) = U 1.5 k δ ij 1 = U k U k δ ij 0.5 = U h λk (Λ) U 0 δ ij h λk (λ) -0.5 0 λ k 1 2 = H λk δ ij. λ N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 32 / 47

Fast filtering [Hammond, ACHA 11] In practice, we use a poly approx of order p of h λk : h λk = p α l λ l h λk. l=1 h λk (λ) 1.5 ideal 1 m=100 p m=20 p 0.5 m=5 p 0-0.5 0 λ k 1 2 λ Such that : D ij = H λk δ ij = lim p H λk δ ij N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 33 / 47

Norm conservation result [Tremblay 16a, Ramasamy 15 ] The spectral distance reads : D ij = H λk δ ij = lim p Hλk δ ij N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47

Norm conservation result [Tremblay 16a, Ramasamy 15 ] The spectral distance reads : D ij = H λk δ ij = lim p Hλk δ ij Let R = (r 1 r 2 r d ) R N d be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d. We define f i = ( H λk R) δ i R d and D ij = f i f j N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47

Norm conservation result [Tremblay 16a, Ramasamy 15 ] The spectral distance reads : D ij = H λk δ ij = lim p Hλk δ ij Let R = (r 1 r 2 r d ) R N d be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d. We define f i = ( H λk R) δ i R d and D ij = f i f j Theorem (Norm conservation theorem in the case of infinite p) Let ɛ > 0, if d > d 0 log N/ɛ 2, then, with proba > 1 1/N, we have : (i, j) [1, N] 2 (1 ɛ)d ij D ij (1 + ɛ)d ij. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47

Norm conservation result [Tremblay 16a, Ramasamy 15 ] The spectral distance reads : D ij = H λk δ ij = lim p Hλk δ ij Let R = (r 1 r 2 r d ) R N d be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d. We define f i = ( H λk R) δ i R d and D ij = f i f j Theorem (Norm conservation theorem in the case of infinite p) Let ɛ > 0, if d > d 0 log N/ɛ 2, then, with proba > 1 1/N, we have : (i, j) [1, N] 2 (1 ɛ)d ij D ij (1 + ɛ)d ij. Consequence : to estimate D ij with no partial diagonalisation of L, fast filter only d log N random signals! N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47

How to quickly estimate λ k, the sole unknown of the fast filtering operation? Goal : given a SDP L, estimate its k-th eigenvalue as fast as possible. We use eigencount techniques [Napoli 13] (also based on polynomial filtering of random vectors!) : given the interval [0, b], get an approximation of the number of enclosed eigenvalues. And find λ k by dichotomy on b. done in [O(pN log N)] N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 35 / 47

The CSC algorithm [Tremblay 16b, Puy 16 ] 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate d random graph signals in matrix R R N d. 3. Filter them with H λk and treat each node i as a point in R d : f i = δ i H λk R. If d log N, we prove that D ij = f i f j D ij. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 36 / 47

The CSC algorithm [Tremblay 16b, Puy 16 ] 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate d random graph signals in matrix R R N d. 3. Filter them with H λk and treat each node i as a point in R d : f i = δ i H λk R. If d log N, we prove that D ij = f i f j D ij. Next steps (sampling) : 4. sample n nodes from p 5. run k-means on the n associated feature vectors and obtain {cj r } j=1:k 6. reconstruct all k indicator functions {c j } j=1:k If n k log k and c r j = Mc j, we prove that we control the reconstruction error. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 36 / 47

Application to clustering : A toy experiment N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 37 / 47

Introduction to GSP Graph sampling Application to clustering Conclusion SC on a toy example N = 1000, k = 2 Com 1 : 300 nodes } } Com 2 : 700 nodes 200 400 600 800 1000 200 400 600 800 1000 Compute U2 = (u1, u2 ) N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47

Introduction to GSP Graph sampling Application to clustering Conclusion SC on a toy example 2 fi = U> 2 δi R : N = 1000, k = 2 Com 1 : 300 nodes 0.5 0 } } Com 2 : 700 nodes 1-0.5 200-1 -1 400-0.5 0 600 800 1000 200 400 600 800 1000 Compute U2 = (u1, u2 ) N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47

Introduction to GSP Graph sampling Application to clustering Conclusion SC on a toy example 2 fi = U> 2 δi R : N = 1000, k = 2 Com 1 : 300 nodes 0.5 0 } } Com 2 : 700 nodes 1-0.5 200-1 -1 400-0.5 0 Dij : 600 800 1.5 200 1000 200 400 600 800 1000 400 1 600 0.5 800 Compute U2 = (u1, u2 ) 1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47

Introduction to GSP Graph sampling Application to clustering Conclusion SC on a toy example 2 fi = U> 2 δi R : N = 1000, k = 2 Com 1 : 300 nodes 0.5 } k-means 0 perf = 0.996 } Com 2 : 700 nodes 1-0.5 200-1 -1 400-0.5 0 Dij : 600 800 1.5 200 1000 200 400 600 800 1000 400 1 600 0.5 800 Compute U2 = (u1, u2 ) 1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47

Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. 4. 5. 6. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47

Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : -0.4-0.6-0.8 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i -1-0.5 0 0.5 0.5 0-0.5-1 4. 5. 6. N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47

Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : -0.4-0.6-0.8 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i -1-0.5 0 0.5 0.5 4. 0-0.5-1 D ij ' Dij : 5. 6. 200 1 400 600 0.5 800 1000 N. Tremblay Compressive Spectral Clustering 0 200 400 600 800 1000 Gdr ISIS, 17th of June 2016 39 / 47

Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : -0.4-0.6-0.8 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i -1-0.5 0 0.5 0.5 4. sample n = 3 nodes from p 0-0.5-1 D ij ' Dij : 5. 6. 200 1 400 600 0.5 800 1000 N. Tremblay Compressive Spectral Clustering 0 200 400 600 800 1000 Gdr ISIS, 17th of June 2016 39 / 47

Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : -0.6-0.8 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i 4. sample n = 3 nodes from p -1-0.5 0 0.5 0.5 0-0.5-1 D ij ' Dij : 5. 6. 200 1 400 600 0.5 800 1000 N. Tremblay Compressive Spectral Clustering 0 200 400 600 800 1000 Gdr ISIS, 17th of June 2016 39 / 47

Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : k-means -0.6-0.8 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i -1-0.5 0 0.5 0.5 4. sample n = 3 nodes from p 0-0.5-1 D ij ' Dij : 5. run low-dim. k-means 6. 200 1 400 600 0.5 800 1000 N. Tremblay Compressive Spectral Clustering 0 200 400 600 800 1000 Gdr ISIS, 17th of June 2016 39 / 47

Introduction to GSP Graph sampling Application to clustering Conclusion CSC on the same toy example f i R3 : -0.4 after interpolation -0.6-0.8 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. low-pass filter them : f i -1-0.5 0 0.5 0.5 4. sample n = 3 nodes from p perf = 0.951 0-0.5-1 D ij ' Dij : 5. run low-dim. k-means 6. reconstruct the result 200 1 400 600 0.5 800 1000 N. Tremblay Compressive Spectral Clustering 0 200 400 600 800 1000 Gdr ISIS, 17th of June 2016 39 / 47

Application to clustering : Experiments on the SBM N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 40 / 47

Experiments The Stochastic Block Model (SBM) : C 1 C 2 C k N nodes and k communities of equal size N/k. { { q 1 q 2 q 2 q 1 q 2 q 2 { q 2 q 2 q 1 proba q 1 if in same community proba q 2 if not. define the ratio ɛ = q 2/q 1 SBM fully defined by ɛ and average degree s. define critical ratio ɛ c = (s s)/(s + s(k 1)) [Decelle 11] N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 41 / 47

Experiments The Stochastic Block Model (SBM) : C 1 C 2 C k N nodes and k communities of equal size N/k. { { q 1 q 2 q 2 q 1 q 2 q 2 { q 2 q 2 q 1 proba q 1 if in same community proba q 2 if not. define the ratio ɛ = q 2/q 1 SBM fully defined by ɛ and average degree s. define critical ratio ɛ c = (s s)/(s + s(k 1)) [Decelle 11] Experiments with N = 10 3, k = 20, s = 16, wrt to different parameters : Recovery performance 1 0.5 SC n = k log(k) n = 2 k log(k) n = 3 k log(k) n = 4 k log(k) 0 0 0.05 0.1 ǫ c 0.15 ǫ Recovery performance 1 SC 0.5 d = 2 log(n) d = 3 log(n) d = 4 log(n) d = 5 log(n) 0 0 0.05 0.1 ǫ c 0.15 ǫ Recovery performance 1 0.5 SC p = 10 p = 20 p = 50 p = 100 0 0 0.05 0.1 ǫ c 0.15 ǫ N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 41 / 47

Experiments With params d = 4 log (k), n = 2k log (k), p = 50, γ = 10 3, and ɛ = ɛ c/4 : Recovery performance 1 0.95 0.9 0.85 0.8 20 50 100 200 # of classes k N=10 4, CSC N=10 4, PM N=10 4, SC N=10 5, CSC N=10 5, PM N=10 5, SC N=10 6, CSC N=10 6, PM N=10 6, SC PM = Power Method [Lin 10, Boutsidis 15] Computation time (s) 10 5 10 3 10 1 20 50 100 200 # of classes k N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 42 / 47

Experiments With params d = 4 log (k), n = 2k log (k), p = 50, γ = 10 3, and ɛ = ɛ c/4 : Recovery performance 1 0.95 0.9 0.85 0.8 20 50 100 200 # of classes k N=10 4, CSC N=10 4, PM N=10 4, SC N=10 5, CSC N=10 5, PM N=10 5, SC N=10 6, CSC N=10 6, PM N=10 6, SC PM = Power Method [Lin 10, Boutsidis 15] Computation time (s) 10 5 10 3 10 1 20 50 100 200 # of classes k On a real-world graph : Amazon graph with 335000 nodes and 926000 edges : SC CSC k=250 7h17m, 0.84 1h20m, 0.83 k=500 15h29m, 0.84 3h34m, 0.84 17h36m (eigs) k=1000 + at least 21 h 10h18m, 0.84 for k-means, unknown N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 42 / 47

Conclusion N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 43 / 47

Two main ideas Low-pass graph fast filtering of random signals : a way to by-pass the Laplacian s diagonalisation for learning tasks. Cluster indicator functions live in a low-dimensional space (are k-bandlimited) : we can use sampling schemes to recover them efficiently. Details of this work are found in : (Sampling part) Random sampling of bandlimited signals on graphs, ACHA 2016. A MATLAB toolbox is available at : grsamplingbox.gforge.inria.fr (Clustering part) Compressive Spectral Clustering, ICML 2016. A MATLAB toolbox is available at : cscbox.gforge.inria.fr N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 44 / 47

Links with literature Low-rank approximation : Nystrom methods [Sun 15], leverage scores [Mahoney 11] Machine Learning : semi-supervised learning [Chapelle 10], active learning [Fu 12, Gadde 14], coresets [Har-Peled 04, Frahling 08] Compressed sensing : variable density sampling [Puy 11] Other fast approximate SC algorithms : [Lin 10, Fowlkes 04, Wang 09, Chen 11a, Chen 11b] N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 45 / 47

Perspectives and difficult questions Two difficult questions (among others) : 1. Given a SDP matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one? 2. How to choose automatically the appropriate polynomial order p? N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 46 / 47

Perspectives and difficult questions Two difficult questions (among others) : 1. Given a SDP matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one? 2. How to choose automatically the appropriate polynomial order p? Perspectives 1. Rational filters instead of polynomial filters? [Shi 15, Isufi 16] 2. Smoother filters for better approximation? [Sakiyama 16] 3. How about if nodes are added one by one? 4. SBMO! [cf. E. Kaufmann] 5. Experiments shown were done with L = I D 1/2 WD 1/2. Test for L = D 1 2 ˆα D ˆα WD ˆα! [cf. R. Couillet] N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 46 / 47

[Ramasamy 15] Compressive spectral embedding : sidestepping... NIPS. [Fortunato 10] Community detection in graphs, Physics Reports [Newman 06] Modularity and community structure in networks, PNAS [Schaub 12] Markov dynamics as a zooming lens for multiscale..., Plos One [Krzakala 13] Spectral redemption : clustering sparse networks, PNAS [Rosvall 08] Maps of random walks on complex networks reveal..., PLOS One [Von Luxburg 06] A tutorial on spectral clustering, Statistics and Computing. [Chen 11a] Parallel spectral clustering in distributed systems, IEEE TPAMI [Lin 10] Power iteration clustering, ICML [Boutsidis 15] Spectral clustering via the power method - provably, ICML [Fowlkes 04] Spectral grouping using the nystrom method, IEEE TPAMI [Wang 09] Approximate spectral clustering, AKDDM [Chen 11b] Large scale spectral clustering with landmark-based..., CAI [Shuman 13] The emerging field of signal processing on graphs..., SPMag [Hammond 11] Wavelets on graphs via spectral graph theory, ACHA [Napoli 13] Efficient estimation of eigenvalue counts in an interval, arxiv [Tremblay 16a] Accelerated spectral clustering using graph..., ICASSP [Tremblay 16b] Compressive spectral clustering, ICML [Puy 16] Random sampling of bandlimited signals..., ACHA [Shi 15] Infinite impulse response graph filters in wireless sensor networks, SPL [Chen 15] Discrete Signal Processing on Graphs : Sampling Theory, TSP [Anis 16] Efficient Sampling Set Selection for Bandlimited Graph..., TSP [Segarra 15] Sampling of graph signals with successive local aggregations, TSP [Chapelle 10] Semi-Supervised Learning, The MIT Press [Fu 12] A survey on instance selection for active learning, KIS [Mahoney 11] Randomized algorithms for matrices and data, Found. and Trends in ML [Sun 15] A review of Nyström methods for large-scale machine learning, Inf. Fus. [Puy 11] On variable density compressive sampling, SPL [Gadde 14] Active Semi-supervised Learning Using Sampling Theory..., SIGKDD [Isufi 16] Distributed Time-Varying Graph Filtering, ArXiv [Sakiyama 16] Sp. Gr. Wav. and Filter Banks with Low Approximation Error, not published yet N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 47 / 47