Diffuse interface methods on graphs: Data clustering and Gamma-limits

Similar documents
DIFFUSE INTERFACE METHODS ON GRAPHS

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

MATH 567: Mathematical Techniques in Data Science Clustering II

Machine Learning for Data Science (CS4786) Lecture 11

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Learning Spectral Graph Segmentation

Introduction to Spectral Graph Theory and Graph Clustering

MATH 829: Introduction to Data Mining and Analysis Clustering II

MATH 567: Mathematical Techniques in Data Science Clustering II

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Spectral Clustering. Zitao Liu

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

Spectral Clustering. Guokun Lai 2016/10

Spectral Clustering on Handwritten Digits Database

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

Grouping with Bias. Stella X. Yu 1,2 Jianbo Shi 1. Robotics Institute 1 Carnegie Mellon University Center for the Neural Basis of Cognition 2

A generalized MBO diffusion generated motion for constrained harmonic maps

Modified Cheeger and Ratio Cut Methods Using the Ginzburg-Landau Functional for Classification of High-Dimensional Data

Sparse Approximation: from Image Restoration to High Dimensional Classification

Regularizing objective functionals in semi-supervised learning

Graphs in Machine Learning

8.1 Concentration inequality for Gaussian random matrix (cont d)

(Non)local phase transitions and minimal surfaces

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Spectral Techniques for Clustering

Convergence of the Graph Allen-Cahn Scheme

CS 664 Segmentation (2) Daniel Huttenlocher

Multislice community detection

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

A spectral clustering algorithm based on Gram operators

Semi-Supervised Learning by Multi-Manifold Separation

Grouping with Directed Relationships

Downloaded 09/15/14 to Redistribution subject to SIAM license or copyright; see

Limits of Spectral Clustering

CONVERGENCE ANALYSIS OF THE GRAPH ALLEN-CAHN SCHEME

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory

Community detection in time-dependent, multiscale, and multiplex networks

Learning gradients: prescriptive models

Inverse Power Method for Non-linear Eigenproblems

Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks

Access from the University of Nottingham repository:

Segmentation with Pairwise Attraction and Repulsion

CMPSCI 791BB: Advanced ML: Laplacian Learning

Graph Metrics and Dimension Reduction

Spectra of Adjacency and Laplacian Matrices

Digraph Fourier Transform via Spectral Dispersion Minimization

NONLINEAR DIFFUSION PDES

Fitting a Graph to Vector Data. Samuel I. Daitch (Yale) Jonathan A. Kelner (MIT) Daniel A. Spielman (Yale)

A Nonlocal p-laplacian Equation With Variable Exponent For Image Restoration

Mean Curvature, Threshold Dynamics, and Phase Field Theory on Finite Graphs

Lecture 10: Dimension Reduction Techniques

CS 556: Computer Vision. Lecture 13

Weighted Nonlocal Laplacian on Interpolation from Sparse Data

An indicator for the number of clusters using a linear map to simplex structure

Graph Partitioning Using Random Walks

Spectral Graph Theory

Space-Variant Computer Vision: A Graph Theoretic Approach

Networks and Their Spectra

Graphs in Machine Learning

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory

SOS Boosting of Image Denoising Algorithms

Network Topology Inference from Non-stationary Graph Signals

STA141C: Big Data & High Performance Statistical Computing

Data Mining and Analysis: Fundamental Concepts and Algorithms

Lecture 4 Colorization and Segmentation

Undirected Graphical Models: Markov Random Fields

Cholesky Decomposition Rectification for Non-negative Matrix Factorization

Unsupervised Clustering of Human Pose Using Spectral Embedding

Crime Modeling with Lévy Flights

COMPSCI 514: Algorithms for Data Science

Nonlinear Diffusion. Journal Club Presentation. Xiaowei Zhou

Graphs in Machine Learning

Exploiting Sparse Non-Linear Structure in Astronomical Data

Spectral Clustering on Handwritten Digits Database Mid-Year Pr

Doug Cochran. 3 October 2011

Lecture: Some Statistical Inference Issues (2 of 3)

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

EE 224 Lab: Graph Signal Processing

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

Spectral Feature Selection for Supervised and Unsupervised Learning

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Manifold Coarse Graining for Online Semi-supervised Learning

The non-backtracking operator

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Language Acquisition and Parameters: Part II

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels

Communities, Spectral Clustering, and Random Walks

Lecture 13 Spectral Graph Algorithms

UNCERTAINTY QUANTIFICATION IN THE CLASSIFICATION OF HIGH DIMENSIONAL DATA

Spectral Generative Models for Graphs

ORIE 6334 Spectral Graph Theory September 22, Lecture 11

Asymmetric Cheeger cut and application to multi-class unsupervised clustering

Reconstruction in the Generalized Stochastic Block Model

Efficient Solvers for Stochastic Finite Element Saddle Point Problems

Transcription:

Diffuse interface methods on graphs: Data clustering and Gamma-limits Yves van Gennip joint work with Andrea Bertozzi, Jeff Brantingham, Blake Hunter Department of Mathematics, UCLA Research made possible by ONR grants N141214 and N1411221 Graph Exploitation Symposium MIT Lincoln Laboratory April 18 212 Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 1 / 36

Overview Graphs and data clustering Graph Laplacian and clustering Spectral clustering and street gangs Ginzburg-Landau for nonlinear clustering Γ-limits and nonlocal means Future work Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 2 / 36

Overview Graphs and data clustering Graph Laplacian and clustering Spectral clustering and street gangs Ginzburg-Landau for nonlinear clustering Γ-limits and nonlocal means Future work Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 3 / 36

The settings: graphs Data points are represented by nodes in an undirected graph. Similarity is encoded in edge weights ω ij. Data clustering is represented by node labels u i. u j u ω ij i Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 4 / 36

The goal Clustering: Label all the nodes with a k-valued label, u i {1,..., k}, such that an appropriate quantity is optimized. We choose to minimize normalized cut (Shi, Malik, 2). If V, the set of all nodes, is divided into k disjoint clusters C i : ratio(c i ) := i C i,j V \C i ω i,j ω i,j and Ncut := i C i,j V k ratio(c i ). i=1 Normalization is to avoid small clusters. Note: Here we prescribe the number of clusters k. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 5 / 36

Problem! Minimizing NCut is an NP-complete problem (Papadimitriou 1997). Practical solution: Solve a relaxed version of the problem. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 6 / 36

Overview Graphs and data clustering Graph Laplacian and clustering Spectral clustering and street gangs Ginzburg-Landau for nonlinear clustering Γ-limits and nonlocal means Future work Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 7 / 36

Graph Laplacians in the literature Unnormalized graph Laplacian ( u) i = ω ij (u i u j ) j Random walk graph Laplacian ( u) i = 1 d i ω ij (u i u j ) j with degree d i = ω ij j Symmetric normalized graph Laplacian ( ω ij u ( u) i = i u ) j j di di dj For an overview, see e.g. Von Luxburg (27). Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 8 / 36

Focus on the random walk Laplacian Adjacency matrix A and degree matrix D: ω 11... ω 1m ω 21... ω 2m A = D =... ω m1... ω mm. d 1... d 2.......... d m The random walk Laplacian is then given by u = (1 D 1 A)u. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 9 / 36

Eigenvectors as solutions to a relaxed problem Shi, Malik (2), Von Luxburg (27) The solution to the NCut problem is given by the minimization of Tr(H (1 D 1 A)H) over all D-orthonormal matrices H whose columns are indicator functions of clusters. If we relax the condition on H to allow any real-valued D-orthonormal matrix, then the minimization of Tr(H H) is solved by the matrix H whose columns are the first k eigenvectors (ordered according to increasing eigenvalue) of = 1 D 1 A. Use the first k eigenvectors of the random walk Laplacian as approximations to the cluster indicator functions. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 1 / 36

Trivial, clean, example. A = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 1 3 2 1 1 1 2 1 1 1 2 2 1 1 1 2 1 1 1 2 The eigenvalues are,, 1, 1, 1, 1 with (unnormalized) eigenvectors (next page) Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 11 / 36

Trivial, clean, example, cont d v 1 = 1 1 1, v 2 = 1 1 1, v 3 = 2 1 1, v 4 = 2 1 1, v 5 = 1 1, v 6 = 1 1 Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 12 / 36

Noisy example A = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 3 4 1 4 1 4 1 4 1 2 3 3 1 3 1 3 1 3 1 1 3 2 3 1 3 1 3 1 5 1 5 1 4 5 5 1 5 1 2 1 2 1 Eigenvalues:,.2649,.8748, 1.112, 1.2599, 1.3826. The (unnormalized) eigenvectors corresponding to and.2649: 1.369 1.4882 v 1 = 1 1, v 2 =.2816.5571. 1.1742 1.4974 Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 13 / 36

From approximation to indicator function How do we go from eigenvector to indicator function. Two often used options: Threshold the values in the eigenvector. Spectral clustering (Ng, Jordan, Weiss, 22): Use eigenvectors as basis for data points. Then use k-means to separate clusters in this space. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 14 / 36

Overview Graphs and data clustering Graph Laplacian and clustering Spectral clustering and street gangs Ginzburg-Landau for nonlinear clustering Γ-limits and nonlocal means Future work Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 15 / 36

Identify gang membership based on geosocial information This work with Jeff Brantingham and Blake Hunter started out as a REU project in the summer of 211. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 16 / 36

From gangs to graphs Given data: Average location of (nonviolent) stops, with pairwise distances d i,j (Anonymized) individuals involved in stop, with pairwise social interactions S i,j Gang affiliation is available as ground truth. Construct a graph via the adjacency/weight matrix with parameters α and σ. A i,j = αs i,j + (1 α)e d 2 i,j /σ2, Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 17 / 36

Eigenvectors for specific choice of S Eigenvector 2 Eigenvector 3 Eigenvector 4 Hotspots! Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 18 / 36

But... 135 13 1295 129 1285 128 4545 455 4555 456 The pies are the clusters we find, the colors indicate gang affiliation. Pie charts made with code from Traud, Frost, Mucha, Porter (29) Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 19 / 36

Possible explanations Sparse data? Social structures in reality do not follow gang affiliations? Wrong method? Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 2 / 36

Overview Graphs and data clustering Graph Laplacian and clustering Spectral clustering and street gangs Ginzburg-Landau for nonlinear clustering Γ-limits and nonlocal means Future work Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 21 / 36

The Ginzburg-Landau functional: phase separation From materials science and image analysis: Ginzburg-Landau functional GL ε (u) = ε u 2 + 1 2 ε W (u) When minimized (under some extra constraint, e.g. fixed mass or fidelity to data): phase separation L 2 gradient flow Allen-Cahn equation: u t = ε u 1 ε W (u) (+ constraint) Use graph Laplacian to formulate AC equation on graphs. Double well W Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 22 / 36

Allen-Cahn as nonlinear relaxation The Allen-Cahn equation on graphs (u i ) t = ε( u) i + 1 ε W (u i ) (+ constraint) can be seen as a more stringent relaxation of the NCut minimization problem. (N.B. Sign convention: is a negative operator on graphs) The double well term involving W forces the solution to be close to binary. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 23 / 36

Application: Machine learning (Bertozzi, Flenner, 212) (u i ) t = ε( u) i + 1 ε W (u i ) + λ i (u i (u orig ) i ) λ i = { 1 i in original otherwise Original Image Image to Segment Training Region Segmented Image Vertices: the pixels from both images Edge weight: ω ij = e x i x j 2 /τ where τ is a scale parameter and x i is the feature vector of pixel i Constraint: fidelity to data in original image Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 24 / 36

Application: Data inpainting (Bertozzi, Flenner, 212) (u i ) t = ε( u) i + 1 ε W (u i ) + λ i (u i (u orig ) i ), λ i = Feature vector consisting of n votes: yes (1), no (-1), or did not vote (). n {8, 1, 12, 14, 16} { 1 known data otherwise Vertices: 435 members of the 1984 US House of Representatives Edge weight: ω ij = e x i x j 2 /τ where τ is a scale parameter and x i is the voting vector of individual i Constraint: fidelity to 5 individuals of known party affiliation (dem. or rep.) Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 25 / 36

Application: Data clustering (Bertozzi, Flenner, 212) (u i ) t = ε( u) i + 1 ε W (u i ) with mass constraint i u i = (wells @ ±1) Vertices: points forming two 2D moons in R 1 (with noise) Edge weight dependent on distance between points Mass constraint Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 26 / 36

Overview Graphs and data clustering Graph Laplacian and clustering Spectral clustering and street gangs Ginzburg-Landau for nonlinear clustering Γ-limits and nonlocal means Future work Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 27 / 36

What about small ε? In the continuum case it is known that GL ε Γ-converges to the total variation functional on binary functions u: GL ε (u) Γ σ W u, as ε. This measures the interface between the two phases u = and u = 1, with surface tension σ W. (Modica, Mortola, 1977) What happens on graphs? GL ε (u) = ε? 4 ij ω ij (u i u j ) 2 + 1 ε? W (u i ) i What scaling in ε to use? By going to a graph we have lost the intrinsic length scale in the gradient. Do we want to keep ε in the first term? Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 28 / 36

Why is Γ-convergence interesting? If GL ε Γ GL and a compactness property holds, then: If u ε minimizes GL ε and u ε u, then u minimizes GL Definitions, if you are interested: Γ-limit of sequence of functionals A sequence {F n } Γ-converges to F as n if, for all u Dom(F ) 1 u n u lim inf n F n(u n ) F (u) and 2 u n u lim sup n Compactness property F n (u n ) F (u). Compactness: F n (u n ) < C {u n } has a convergent subsequence. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 29 / 36

From GL to TV For a fixed general graph with edge weights ω ij independent of ε 1 4 ij ω ij (u i u j ) 2 + 1 ε i W (u i ) Γ C W ω ij u i u j as ε, i,j where the limit functional is defined on binary functions u taking values in {, 1}. C W is a constant depending only on W. The total variation ω ij u i u j is related to graph cuts. i,j Dirichlet energy 1 4 ij ω ij (u i u j ) 2 does not scale with ε. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 3 / 36

Graph based functional vs numerical discretization For a regular square grid (4-regular graph) of N N nodes on the torus T 2 we compare Graph based function with uniform edge weights ω ij = 1 N : GL g ε (u) = N 1 N i,j=1 (u i+1,j u i,j ) 2 + (u i,j+1 u i,j ) 2 + ε 1 N i,j=1 W (u i,j ) (The double subscripts denote horizontal and vertical directions.) Functional given by the discretization of the continuum GL functional: GL d ε (u) = ε N i,j=1 (u i+1,j u i,j ) 2 + (u i,j+1 u i,j ) 2 + ε 1 N 2 N i,j=1 W (u i,j ) Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 31 / 36

Results: Graph based functional By the earlier result, for fixed N and ε : GL g ε (u) Γ N 1 with u i,j {, 1} Then let N : N i,j=1... ( ui+1,j u i,j + u i,j+1 u i,j ), Γ u x + u y, u(x) {, 1}. Combine both limits via scaling ε = N α,n : N 1 N i,j=1 (u i+1,j u i,j ) 2 + (u i,j+1 u i,j ) 2 + N α N with u(x) {, 1}, if α is large enough. Γ i,j=1 W (u i,j ) u x + u y, Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 32 / 36

Results: Discretized functional Fix ε and let N : GL d ε (u) Γ ε u 2 + 1 2 ε W (u). Then, by Modica-Mortola, if ε :... Γ ˆσ W u, with u(x) {, 1}. Combine both limits via ε = N α, N : N α N i,j=1 (u i+1,j u i,j ) 2 + (u i,j+1 u i,j ) 2 + N α 2 N with u(x) {, 1}, if α small enough. i,j=1 W (u i,j ), Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 33 / 36

Nonlocal means type functional Still consider square grid on T 2. Given f C (T 2 ) create completely connected graph with weights ω = e d 2 /σ 2 where σ > and d compares patches of L nodes by L nodes sampled from f. Then, for binary valued u, N 4 Γ ω i,j,k,l u i,j u k,l ω(x, y) u(x) u(y) dx dy, T 2 T 2 where i,j,k,l ω(x, y) = e 4L2 σ 2 ( f (x) f (y) ω(x, y) = e c2 S l ( f (x+z) f (y+z) where S l = {z T 2 : z 1 + z 2 l}. ) 2 if σ, L are fixed and N, ) 2 dz if σ = N/c and L/N l as N, Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 34 / 36

Overview Graphs and data clustering Graph Laplacian and clustering Spectral clustering and street gangs Ginzburg-Landau for nonlinear clustering Γ-limits and nonlocal means Future work Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 35 / 36

Some future work Generalize the asymptotic (graph to continuum) results, e.g. on graphs sampled from a manifold Apply Ginzburg-Landau to other data clustering or image segmentation problems See if and how other clustering/community detection methods fit into this framework, e.g. the multiplex method of Mucha, Richardson, Macon, Porter, Onnela (21) Look at other PDE and PDE questions translated onto graphs Thank you for your connectivity Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 36 / 36

Some future work Generalize the asymptotic (graph to continuum) results, e.g. on graphs sampled from a manifold Apply Ginzburg-Landau to other data clustering or image segmentation problems See if and how other clustering/community detection methods fit into this framework, e.g. the multiplex method of Mucha, Richardson, Macon, Porter, Onnela (21) Look at other PDE and PDE questions translated onto graphs Thank you for your connectivity Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 36 / 36