Consistency of Modularity Clustering on Random Geometric Graphs

Size: px

Start display at page:

Download "Consistency of Modularity Clustering on Random Geometric Graphs"

Ernest McCarthy
5 years ago
Views:

1 Consistency of Modularity Clustering on Random Geometric Graphs Erik Davis The University of Arizona May 10, 2016

2 Outline Introduction to Modularity Clustering Pointwise Convergence Convergence of Optimal Partitions

3 Graph Clustering G = (X, W )

4 Graph Clustering G = (X, W ) clustering = partition = coloring

5 Graph Clustering G = (X, W ) clustering = partition = coloring 1 2m W ij δ(c i, c j ) where m = 1 2 W ij. i,j i,j

6 Modularity (Newman & Girvan 04) Q(U) = 1 2m where d i = j W ij, U a clustering. i,j W ij δ(c i, c j ) 1 d i d j 2m 2m δ(c i, c j ) i,j

7 Modularity Clustering Modularity Clustering: U = arg max Q(U), U K

8 Modularity Clustering Modularity Clustering: More generally (α-modularity): Q(U) = 1 2m where S = i d α i. U = arg max Q(U), U K W ij δ(c i, c j ) 1 S 2 i,j i,j d α i d α j δ(c i, c j )

9 Random Geometric Graphs Fix open D R d with Lipschitz boundary, ν = ρ(x) dx probability measure on D.

10 Random Geometric Graphs Fix open D R d with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Take {X i } i N i.i.d., and X n = {X i } n i=1.

11 Random Geometric Graphs Fix open D R d with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Take {X i } i N i.i.d., and X n = {X i } n i=1. To assign weights, we pick a kernel η : R d R, length scale ɛ n.

12 Random Geometric Graphs Fix open D R d with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Take {X i } i N i.i.d., and X n = {X i } n i=1. To assign weights, we pick a kernel η : R d R, length scale ɛ n. W ij = { ( Xi X η j ɛ n ), if i j, 0, otherwise.

13 Random Geometric Graphs Fix open D R d with Lipschitz boundary, ν = ρ(x) dx probability measure on D. Take {X i } i N i.i.d., and X n = {X i } n i=1. To assign weights, we pick a kernel η : R d R, length scale ɛ n. ( Xi X j { 1 η W ij = ɛ d n ɛ n )=: η ɛn (X i X j ), if i j, 0, otherwise. Graph G n = (X n, W ).

14 Questions G n gives (random) modularity functional Q n.

15 Questions G n gives (random) modularity functional Q n. 1. What is the behavior of Q n as n?

16 Questions G n gives (random) modularity functional Q n. 1. What is the behavior of Q n as n? 2. What do optimal modularity clusterings look like? Un arg max Q n (U n ) U n K

17 Questions G n gives (random) modularity functional Q n. 1. What is the behavior of Q n as n? 2. What do optimal modularity clusterings look like? Un arg max Q n (U n ) U n K Consistency: Subject to certain technical assumptions, U n U where U is a partition of D characterized as the solution to a (deterministic) continuum optimization problem.

18 Consistency of Clustering Methods K-Means (Pollard 1981) Spectral Clustering (von Luxburg, Belkin, & Bosquet 2008) Modularity (Bickel & Chen 2009, Zhao, Levina, & Zhu 2012) Cheeger Cut (García-Trillos, Slepčev, von Brecht, Laurent, & Bresson 2014)

19 Outline Introduction to Modularity Clustering Pointwise Convergence Convergence of Optimal Partitions

20 Key Identity Let U n = {U n,k } K k=1 be a partition of X n, and let u n,k = 1 Un,k.

21 Key Identity Let U n = {U n,k } K k=1 be a partition of X n, and let u n,k = 1 Un,k. Then 1 1/K Q n (U n ) = 1 S 2 K ( di α [ un,k (X i ) 1/K ]) 2 k=1 + ɛ n n 2 4m i K GTV n (u n,k ), k=1 where GTV n (u) := 1 1 ɛ n n 2 η ɛn (X i X j ) u(x i ) u(x j ). i,j

22 Continuum Partitioning Domain D R d, fixed K 1, and partition U = {U k } K k=1 of D.

23 Continuum Partitioning Domain D R d, fixed K 1, and partition U = {U k } K k=1 of D. U is balanced with respect to µ if µ(u k ) = 1/K for k = 1,..., K.

24 Continuum Partitioning Domain D R d, fixed K 1, and partition U = {U k } K k=1 of D. U is balanced with respect to µ if µ(u k ) = 1/K for k = 1,..., K. The perimeter of U k in D, with respect to a weight ρ 2, is Per(U k ; ρ 2 ) = ρ 2 (x) dh d 1 (x). U k

25 Continuum Partitioning Domain D R d, fixed K 1, and partition U = {U k } K k=1 of D. U is balanced with respect to µ if µ(u k ) = 1/K for k = 1,..., K. The perimeter of U k in D, with respect to a weight ρ 2, is Per(U k ; ρ 2 ) = ρ 2 (x) dh d 1 (x). U k More generally, Per(U k ; ρ 2 ) = TV (1 Uk ; ρ 2 ) := sup 1 Uk (x)div Φ(x) dx. Φ C 1 c (D;Rd ) D Φ(x) ρ 2 (x) Remark: When f smooth, TV (f ; ρ 2 ) = D f ρ2 (x) dx.

26 Continuum Partitioning U = arg min U =K µ(u k )=1/K K Per(U k ; ρ 2 ). k=1

27 Continuum Partitioning U = arg min U =K µ(u k )=1/K K Per(U k ; ρ 2 ). k=1 (a) K = 4 (b) K = 9. (c) K = 25. Figure: Local minimizers on D = (0, 1) 2, with ρ(x) = 1, dµ = dx, produced using The Surface Evolver.

28 Pointwise Convergence A (finite perimeter) partition U of D induces a partition U n of X n D.

29 Pointwise Convergence A (finite perimeter) partition U of D induces a partition U n of X n D.

30 Pointwise Convergence A (finite perimeter) partition U of D induces a partition U n of X n D.

31 Pointwise Convergence What is the behavior of Q n as n?

32 Pointwise Convergence What is the behavior of Q n as n? Theorem (Asymptotics) Let U be a finite perimeter partition. Suppose {ɛ n} satisfies n=1 exp( nɛ(d+1)/2 n ) < when α = 0, 1 and n=1 exp( nɛd n) < otherwise.

33 Pointwise Convergence What is the behavior of Q n as n? Theorem (Asymptotics) Let U be a finite perimeter partition. Suppose {ɛ n} satisfies n=1 exp( nɛ(d+1)/2 n ) < when α = 0, 1 and n=1 exp( nɛd n) < otherwise. Then, as n, 1 1/K Q n(u n) a.s. C K η,ρ k=1 Per(U k; ρ 2 ) if ( 2 K k=1 µ(u k ) 1/K) = 0, ɛ n otherwise, where dµ(x) = ρ1+α (x) dx D ρ1+α (x) dx and Cη,ρ = R n η(x) x 1 dx 2 D ρ2 (x) dx.

34 Sketch of Proof: Convergence of Graph Total Variation Recall GTV n (u) = 1 1 ɛ n n 2 η ɛn (X i X j ) u(x i ) u(x j ). i,j

35 Sketch of Proof: Convergence of Graph Total Variation Recall GTV n (u) = 1 1 ɛ n n 2 η ɛn (X i X j ) u(x i ) u(x j ). i,j Proposition Fix u = 1 U, and let {ɛ n } n N be a sequence converging to zero such that n=1 exp( nɛ n (d+1)/2 ) < +. Then GTV n (u) a.s. σ η TV (u; ρ 2 ).

36 Sketch of Proof: Convergence of Graph Total Variation Recall GTV n (u) = 1 1 ɛ n n 2 η ɛn (X i X j ) u(x i ) u(x j ). i,j Proposition Fix u = 1 U, and let {ɛ n } n N be a sequence converging to zero such that n=1 exp( nɛ n (d+1)/2 ) < +. Then GTV n (u) a.s. σ η TV (u; ρ 2 ). Ingredients:

37 Sketch of Proof: Convergence of Graph Total Variation Recall GTV n (u) = 1 1 ɛ n n 2 η ɛn (X i X j ) u(x i ) u(x j ). i,j Proposition Fix u = 1 U, and let {ɛ n } n N be a sequence converging to zero such that n=1 exp( nɛ n (d+1)/2 ) < +. Then GTV n (u) a.s. σ η TV (u; ρ 2 ). Ingredients: Nonlocal TV (Ponce 04)

38 Sketch of Proof: Convergence of Graph Total Variation Recall GTV n (u) = 1 1 ɛ n n 2 η ɛn (X i X j ) u(x i ) u(x j ). i,j Proposition Fix u = 1 U, and let {ɛ n } n N be a sequence converging to zero such that n=1 exp( nɛ n (d+1)/2 ) < +. Then GTV n (u) a.s. σ η TV (u; ρ 2 ). Ingredients: Nonlocal TV (Ponce 04) Exponential bounds for U-statistics (Giné, Lata la, & Zinn 00)

39 Outline Introduction to Modularity Clustering Pointwise Convergence Convergence of Optimal Partitions

40 Convergence of Partitions

41 Convergence of Partitions U 3,1 U 3,2

42 Convergence of Partitions γ 3,1 P(D {0, 1})

43 Convergence of Partitions γ 3,1 P(D {0, 1}) U 1 U 2

44 Convergence of Partitions γ 3,1 P(D {0, 1}) γ 1 P(D {0, 1})

45 Convergence of Partitions Given partition U n = {U n,k } K k=1 of X n, we associate the measures γ n,k P(D {0, 1}), for k = 1,..., K, by γ n,k = 1 n n δ (Xi,1 Un,k (X i )) = (Id 1 Un,k ) ν n. i=1

46 Convergence of Partitions Given partition U n = {U n,k } K k=1 of X n, we associate the measures γ n,k P(D {0, 1}), for k = 1,..., K, by γ n,k = 1 n n δ (Xi,1 Un,k (X i )) = (Id 1 Un,k ) ν n. i=1 Given partition U = {U k } K k=1 of D, we similarly define γ k = (Id 1 Uk ) ν.

47 Convergence of Partitions Given partition U n = {U n,k } K k=1 of X n, we associate the measures γ n,k P(D {0, 1}), for k = 1,..., K, by γ n,k = 1 n n δ (Xi,1 Un,k (X i )) = (Id 1 Un,k ) ν n. i=1 Given partition U = {U k } K k=1 of D, we similarly define γ k = (Id 1 Uk ) ν. w We say that U n U if there exists a sequence {πn } n N of permutations of {1,..., K} such that γ w n,πnk γ k, for k = 1,..., K.

48 Convergence of Optimal Partitions Theorem (Convergence of Optimal Partitions) Suppose {ɛ n } n N satisfies suitable conditions. For n 1, let Un arg max U K Q n (U) be an optimal partition. If U is the unique solution (up to relabeling of its constituent sets) to the problem minimize U =K µ(u k )=1/K K Per(U k ; ρ 2 ) k=1 with dµ(x) = ρ 1+α (x) dx/ D ρ1+α (x) dx, then U n a.s. U. (P)

49 Convergence of Optimal Partitions Theorem (Convergence of Optimal Partitions) Suppose {ɛ n } n N satisfies suitable conditions. For n 1, let Un arg max U K Q n (U) be an optimal partition. If U is the unique solution (up to relabeling of its constituent sets) to the problem minimize U =K µ(u k )=1/K K Per(U k ; ρ 2 ) k=1 with dµ(x) = ρ 1+α (x) dx/ D ρ1+α (x) dx, then Un a.s. U. If there is more than one solution to (P), then almost surely i) {Un } n N has at least one cluster point, and ii) every cluster point is a solution to (P). (P)

50 Example

51 Remarks on Theorem Weak convergence of measures via Wasserstein metric.

52 Remarks on Theorem Weak convergence of measures via Wasserstein metric. Useful tool: transport maps relating empirical measures ν n to ν, with bound on -transport cost (García-Trillos, Slepčev).

53 Remarks on Theorem Weak convergence of measures via Wasserstein metric. Useful tool: transport maps relating empirical measures ν n to ν, with bound on -transport cost (García-Trillos, Slepčev). Because modularity clusterings are optimizers of discrete energies, we use Γ-convergence to prove that their limit is the optimizer of a continuum energy.

54 Remarks on Theorem Weak convergence of measures via Wasserstein metric. Useful tool: transport maps relating empirical measures ν n to ν, with bound on -transport cost (García-Trillos, Slepčev). Because modularity clusterings are optimizers of discrete energies, we use Γ-convergence to prove that their limit is the optimizer of a continuum energy. Balance constraint in the continuum problem presents a technical difficulty.

55 Remarks on Theorem Weak convergence of measures via Wasserstein metric. Useful tool: transport maps relating empirical measures ν n to ν, with bound on -transport cost (García-Trillos, Slepčev). Because modularity clusterings are optimizers of discrete energies, we use Γ-convergence to prove that their limit is the optimizer of a continuum energy. Balance constraint in the continuum problem presents a technical difficulty. We modified the notion of Γ-convergence for random functionals to allow the use of our pointwise convergence result.

56 Existence of Transport Maps Let D R d be open, connected with Lipschitz boundary. Assume ν = ρ(x) dx with ρ continuous and bounded above/below by positive constants. Proposition (García-Trillos, Slepčev 14) There is a constant C > 0 such that, with probability one, there exists a sequence of transportation maps {T n} n N, T n ν = ν n with lim sup n lim sup n lim sup n n 1/2 Id T n (2 log log n) 1/2 C (d = 1), n 1/d Id T n (log n) 3/4 C (d = 2), n 1/d Id T n (log n) 1/d C (d 3).

57 Assumption on ɛ n Assume {ɛ n } n N is such that ɛ n > 0 and ɛ n 0. For α = 0, 1, assume that 2 log log n 1 lim = 0, if d = 1 n n ɛ n For α 0, 1, assume that (log n) 3/4 1 lim = 0 n n 1/2 ɛ n if d = 2 (log n) 1/d 1 lim = 0 n n 1/d ɛ n if d 3. n=1 n exp( nɛ d+1 n ) <.

58 The role of α (a) Level sets (b) α = 1 (c) α = 0 (d) α = 1 dµ(x) ρ(x) 1+α, ρ(x) min(2 exp( 4 x x 0 2 ), 1/2).

59 Thanks for coming!

CONSISTENCY OF CHEEGER AND RATIO GRAPH CUTS

CONSISTENCY OF CHEEGER AND RATIO GRAPH CUTS CONSISTENCY OF CHEEGER AN RATIO GRAPH CUTS NICOLÁS GARCÍA TRILLOS 1, EJAN SLEPČEV 1, JAMES VON BRECHT 2, THOMAS LAURENT 3, XAVIER BRESSON 4. ABSTRACT. This paper establishes the consistency of a family