Diffuse interface methods on graphs: Data clustering and Gamma-limits

Diffuse interface methods on graphs: Data clustering and Gamma-limits Yves van Gennip joint work with Andrea Bertozzi, Jeff Brantingham, Blake Hunter Department of Mathematics, UCLA Research made possible by ONR grants N141214 and N1411221 Graph Exploitation Symposium MIT Lincoln Laboratory April 18 212 Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 1 / 36

Overview Graphs and data clustering Graph Laplacian and clustering Spectral clustering and street gangs Ginzburg-Landau for nonlinear clustering Γ-limits and nonlocal means Future work Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 2 / 36

The settings: graphs Data points are represented by nodes in an undirected graph. Similarity is encoded in edge weights ω ij. Data clustering is represented by node labels u i. u j u ω ij i Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 4 / 36

The goal Clustering: Label all the nodes with a k-valued label, u i {1,..., k}, such that an appropriate quantity is optimized. We choose to minimize normalized cut (Shi, Malik, 2). If V, the set of all nodes, is divided into k disjoint clusters C i : ratio(c i ) := i C i,j V \C i ω i,j ω i,j and Ncut := i C i,j V k ratio(c i ). i=1 Normalization is to avoid small clusters. Note: Here we prescribe the number of clusters k. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 5 / 36

Problem! Minimizing NCut is an NP-complete problem (Papadimitriou 1997). Practical solution: Solve a relaxed version of the problem. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 6 / 36

Graph Laplacians in the literature Unnormalized graph Laplacian ( u) i = ω ij (u i u j ) j Random walk graph Laplacian ( u) i = 1 d i ω ij (u i u j ) j with degree d i = ω ij j Symmetric normalized graph Laplacian ( ω ij u ( u) i = i u ) j j di di dj For an overview, see e.g. Von Luxburg (27). Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 8 / 36

Focus on the random walk Laplacian Adjacency matrix A and degree matrix D: ω 11... ω 1m ω 21... ω 2m A = D =... ω m1... ω mm. d 1... d 2.......... d m The random walk Laplacian is then given by u = (1 D 1 A)u. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 9 / 36

Eigenvectors as solutions to a relaxed problem Shi, Malik (2), Von Luxburg (27) The solution to the NCut problem is given by the minimization of Tr(H (1 D 1 A)H) over all D-orthonormal matrices H whose columns are indicator functions of clusters. If we relax the condition on H to allow any real-valued D-orthonormal matrix, then the minimization of Tr(H H) is solved by the matrix H whose columns are the first k eigenvectors (ordered according to increasing eigenvalue) of = 1 D 1 A. Use the first k eigenvectors of the random walk Laplacian as approximations to the cluster indicator functions. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 1 / 36

Trivial, clean, example. A = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 1 3 2 1 1 1 2 1 1 1 2 2 1 1 1 2 1 1 1 2 The eigenvalues are,, 1, 1, 1, 1 with (unnormalized) eigenvectors (next page) Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 11 / 36

Trivial, clean, example, cont d v 1 = 1 1 1, v 2 = 1 1 1, v 3 = 2 1 1, v 4 = 2 1 1, v 5 = 1 1, v 6 = 1 1 Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 12 / 36

Noisy example A = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 3 4 1 4 1 4 1 4 1 2 3 3 1 3 1 3 1 3 1 1 3 2 3 1 3 1 3 1 5 1 5 1 4 5 5 1 5 1 2 1 2 1 Eigenvalues:,.2649,.8748, 1.112, 1.2599, 1.3826. The (unnormalized) eigenvectors corresponding to and.2649: 1.369 1.4882 v 1 = 1 1, v 2 =.2816.5571. 1.1742 1.4974 Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 13 / 36

From approximation to indicator function How do we go from eigenvector to indicator function. Two often used options: Threshold the values in the eigenvector. Spectral clustering (Ng, Jordan, Weiss, 22): Use eigenvectors as basis for data points. Then use k-means to separate clusters in this space. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 14 / 36

Identify gang membership based on geosocial information This work with Jeff Brantingham and Blake Hunter started out as a REU project in the summer of 211. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 16 / 36

From gangs to graphs Given data: Average location of (nonviolent) stops, with pairwise distances d i,j (Anonymized) individuals involved in stop, with pairwise social interactions S i,j Gang affiliation is available as ground truth. Construct a graph via the adjacency/weight matrix with parameters α and σ. A i,j = αs i,j + (1 α)e d 2 i,j /σ2, Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 17 / 36

Eigenvectors for specific choice of S Eigenvector 2 Eigenvector 3 Eigenvector 4 Hotspots! Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 18 / 36

But... 135 13 1295 129 1285 128 4545 455 4555 456 The pies are the clusters we find, the colors indicate gang affiliation. Pie charts made with code from Traud, Frost, Mucha, Porter (29) Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 19 / 36

Possible explanations Sparse data? Social structures in reality do not follow gang affiliations? Wrong method? Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 2 / 36

The Ginzburg-Landau functional: phase separation From materials science and image analysis: Ginzburg-Landau functional GL ε (u) = ε u 2 + 1 2 ε W (u) When minimized (under some extra constraint, e.g. fixed mass or fidelity to data): phase separation L 2 gradient flow Allen-Cahn equation: u t = ε u 1 ε W (u) (+ constraint) Use graph Laplacian to formulate AC equation on graphs. Double well W Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 22 / 36

Allen-Cahn as nonlinear relaxation The Allen-Cahn equation on graphs (u i ) t = ε( u) i + 1 ε W (u i ) (+ constraint) can be seen as a more stringent relaxation of the NCut minimization problem. (N.B. Sign convention: is a negative operator on graphs) The double well term involving W forces the solution to be close to binary. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 23 / 36

Application: Machine learning (Bertozzi, Flenner, 212) (u i ) t = ε( u) i + 1 ε W (u i ) + λ i (u i (u orig ) i ) λ i = { 1 i in original otherwise Original Image Image to Segment Training Region Segmented Image Vertices: the pixels from both images Edge weight: ω ij = e x i x j 2 /τ where τ is a scale parameter and x i is the feature vector of pixel i Constraint: fidelity to data in original image Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 24 / 36

Application: Data inpainting (Bertozzi, Flenner, 212) (u i ) t = ε( u) i + 1 ε W (u i ) + λ i (u i (u orig ) i ), λ i = Feature vector consisting of n votes: yes (1), no (-1), or did not vote (). n {8, 1, 12, 14, 16} { 1 known data otherwise Vertices: 435 members of the 1984 US House of Representatives Edge weight: ω ij = e x i x j 2 /τ where τ is a scale parameter and x i is the voting vector of individual i Constraint: fidelity to 5 individuals of known party affiliation (dem. or rep.) Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 25 / 36

Application: Data clustering (Bertozzi, Flenner, 212) (u i ) t = ε( u) i + 1 ε W (u i ) with mass constraint i u i = (wells @ ±1) Vertices: points forming two 2D moons in R 1 (with noise) Edge weight dependent on distance between points Mass constraint Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 26 / 36

What about small ε? In the continuum case it is known that GL ε Γ-converges to the total variation functional on binary functions u: GL ε (u) Γ σ W u, as ε. This measures the interface between the two phases u = and u = 1, with surface tension σ W. (Modica, Mortola, 1977) What happens on graphs? GL ε (u) = ε? 4 ij ω ij (u i u j ) 2 + 1 ε? W (u i ) i What scaling in ε to use? By going to a graph we have lost the intrinsic length scale in the gradient. Do we want to keep ε in the first term? Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 28 / 36

Why is Γ-convergence interesting? If GL ε Γ GL and a compactness property holds, then: If u ε minimizes GL ε and u ε u, then u minimizes GL Definitions, if you are interested: Γ-limit of sequence of functionals A sequence {F n } Γ-converges to F as n if, for all u Dom(F ) 1 u n u lim inf n F n(u n ) F (u) and 2 u n u lim sup n Compactness property F n (u n ) F (u). Compactness: F n (u n ) < C {u n } has a convergent subsequence. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 29 / 36

From GL to TV For a fixed general graph with edge weights ω ij independent of ε 1 4 ij ω ij (u i u j ) 2 + 1 ε i W (u i ) Γ C W ω ij u i u j as ε, i,j where the limit functional is defined on binary functions u taking values in {, 1}. C W is a constant depending only on W. The total variation ω ij u i u j is related to graph cuts. i,j Dirichlet energy 1 4 ij ω ij (u i u j ) 2 does not scale with ε. Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 3 / 36

Graph based functional vs numerical discretization For a regular square grid (4-regular graph) of N N nodes on the torus T 2 we compare Graph based function with uniform edge weights ω ij = 1 N : GL g ε (u) = N 1 N i,j=1 (u i+1,j u i,j ) 2 + (u i,j+1 u i,j ) 2 + ε 1 N i,j=1 W (u i,j ) (The double subscripts denote horizontal and vertical directions.) Functional given by the discretization of the continuum GL functional: GL d ε (u) = ε N i,j=1 (u i+1,j u i,j ) 2 + (u i,j+1 u i,j ) 2 + ε 1 N 2 N i,j=1 W (u i,j ) Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 31 / 36

Results: Graph based functional By the earlier result, for fixed N and ε : GL g ε (u) Γ N 1 with u i,j {, 1} Then let N : N i,j=1... ( ui+1,j u i,j + u i,j+1 u i,j ), Γ u x + u y, u(x) {, 1}. Combine both limits via scaling ε = N α,n : N 1 N i,j=1 (u i+1,j u i,j ) 2 + (u i,j+1 u i,j ) 2 + N α N with u(x) {, 1}, if α is large enough. Γ i,j=1 W (u i,j ) u x + u y, Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 32 / 36

Results: Discretized functional Fix ε and let N : GL d ε (u) Γ ε u 2 + 1 2 ε W (u). Then, by Modica-Mortola, if ε :... Γ ˆσ W u, with u(x) {, 1}. Combine both limits via ε = N α, N : N α N i,j=1 (u i+1,j u i,j ) 2 + (u i,j+1 u i,j ) 2 + N α 2 N with u(x) {, 1}, if α small enough. i,j=1 W (u i,j ), Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 33 / 36

Nonlocal means type functional Still consider square grid on T 2. Given f C (T 2 ) create completely connected graph with weights ω = e d 2 /σ 2 where σ > and d compares patches of L nodes by L nodes sampled from f. Then, for binary valued u, N 4 Γ ω i,j,k,l u i,j u k,l ω(x, y) u(x) u(y) dx dy, T 2 T 2 where i,j,k,l ω(x, y) = e 4L2 σ 2 ( f (x) f (y) ω(x, y) = e c2 S l ( f (x+z) f (y+z) where S l = {z T 2 : z 1 + z 2 l}. ) 2 if σ, L are fixed and N, ) 2 dz if σ = N/c and L/N l as N, Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 34 / 36

Some future work Generalize the asymptotic (graph to continuum) results, e.g. on graphs sampled from a manifold Apply Ginzburg-Landau to other data clustering or image segmentation problems See if and how other clustering/community detection methods fit into this framework, e.g. the multiplex method of Mucha, Richardson, Macon, Porter, Onnela (21) Look at other PDE and PDE questions translated onto graphs Thank you for your connectivity Yves van Gennip (UCLA) Diffuse interface methods on graphs April 17 18 212 36 / 36