Community detection in stochastic block models via spectral methods

Size: px

Start display at page:

Download "Community detection in stochastic block models via spectral methods"

Joanna Porter
5 years ago
Views:

1 Community detection in stochastic block models via spectral methods Laurent Massoulié (MSR-Inria Joint Centre, Inria) based on joint works with: Dan Tomozei (EPFL), Marc Lelarge (Inria), Jiaming Xu (UIUC), Charles Bordenave (IMT)

2 Community Detection Clustering of items based on their observed interactions Typical approach: embedding (eg in Euclidean space), then K-means Embedding space 2

3 Application 1: contact recommendation in online social networks Supporting data: e.g. OSN s friendship graph recommend members of user s implicit community Variation: NSA s co-traveler programme spot groups of suspect persons meeting regularly in unusual places

4 Application 2: item recommendation to users Supporting data: {user-item} matrix Example: Netflix prize {user-movie} ratings matrix User / Movie f 1 f 2 f m u 1? ** *** u 2?? u n ***** ** ** Use item communities to support recommendation users who liked this also liked

5 Application 3: categorizing chemical reactives in biology Basic data: sets of chemicals jointly involved in specific reaction More generally Knowledge graph as generic representation of data A1 has with B1 interaction of type C1 A2 has with B2 interaction of type C2

6 Outline The Stochastic Block Model With labels With general types Performance of Spectral Methods rich signal case The weak signal case: sparse observations Phase transition on detectability Non-backtracking matrix and the Spectral Redemption

7 The Stochastic Block Model, aka planted partition model [Holland-Laskey-Leinhardt 83] n nodes partitioned into r categories Category σ: α σ n nodes Edge between nodes u,v present with probability s b σ u σ v /n (s : average degree, or signal strength) Observation: adjacency matrix A= + noise Community Detection: provide clustering σ based on A that is correlated as much as possible with true planted clustering σ, i.e. maximizing n 1 Overlap:=Max π 1 n π( σ u )=σ u Max σε r α σ u=1

8 The Labeled Stochastic Block Model Edges (u-v) labeled by L uv L (finite set) Drawn from distribution μ σ u σ(v) Netflix case: labels 1-5 stars

9 The SBM with general types [Aldous 81; Lovász 12] User type σ(u) i.i.d. ~P in general set (e.g. uniform on [0,1]) Edge (u-v) present w.p. b σ u σ v e.g. b x, y F x y s/n for kernel b F(x) Edges (u-v) labeled by L uv L (finite set) x Drawn from distribution μ σ u σ(v) Technical assumptions: compact type set and continuity of symmetric functions b and μ

10 Outline The Stochastic Block Model With labels With general types Performance of Spectral Methods rich signal case The weak signal case: sparse observations Phase transition on detectability Non-backtracking matrix and the Spectral Redemption

Spectral Clustering From Matrix A extract R normed eigenvectors x i corresponding to R largest eigenvalues 1 R Form R-dimensional node representatives y u = n(x i

11 Spectral Clustering From Matrix A extract R normed eigenvectors x i corresponding to R largest eigenvalues 1 R Form R-dimensional node representatives y u = n(x i (u)) i=1 R Group nodes u according to proximity of spectral representatives y u Karl Pearson s PCA: On Lines and Planes of Closest Fit to Systems of Points in Space", 1901

12 Illustration for R= Clustering from SVD Principal component Principal component 1 Netflix dataset SBM with K=4

13 Result for logarithmic signal strength s Assume s= (log(n)) and clusters are distinguishable, i.e. σ σ τ such that b στ b σ τ Then spectrum of A consists of R eigenvalues i of order (s) (R K) and n-r eigenvalues i of order O s Where R = rank b uv α v 1 u,v K Node representatives y u based on top R eigenvectors x i : Cluster according to underlying blocks except for negligible fraction of nodes

14 Proof arguments Control spectral radius of noise matrix + perturbation of matrix eigen-elements A = + random noise matrix Block matrix non-zero eigenvalues: (s)

15 Controlling perturbation of eigen-elements of Hermitian matrices A = A + W Where A : observed; A : signal; W : perturbation Perturbation of eigenvalues: order 1 n, 1 n Then i i ρ W, i = 1, n (Weyl) (a simple consequence of Courant-Fisher s minimax theorem: For Hermitian n n matrix H : i H = i H = max min dim V =i v V, v =1 v Hv min max dim V =n i+1 v V, v =1 v Hv

16 Controlling perturbation of eigen-elements of Hermitian matrices A = A + W Where A : observed; A : signal; W : perturbation Perturbation of eigenvectors: order 1 n, 1 n Let min i j. If ρ W < /2 i,j: i j Then for all i and for any normed eigenvector x i i, there exists i s.t. x i x i x i 1 ρ W ρ W ( A simple version of [Chandler Davis-Kahane 70] ) 2

17 Application to SBM Spectrum of A : spectrum of b uv α v 1 u,v K scaled by s, hence = s Announced result follows if ρ W = o s

18 Ramanujan graphs [Lubotzky-Phillips-Sarnak 88] s-regular graphs such that eigenvalues of adjacency matrix verify :=max k: k < s k 2 s 1 The best possible spectral gap [Alon-Boppana 91] 2 s 1 O s diam G

19 spectral separation properties à la Ramanujan [Friedman 08] random s-regular graph with s 3 verifies whp =2 s 1 + o(1) [Feige-Ofek 05]: for Erdős-Rényi graph G n, s/n and s = log n, then whp = O s Corollary: for SBM, ρ A A = O s when s = log n

20 spectral separation properties à la Ramanujan Consequence: in SBM with s = log n, whp ρ A A = O s A s leading eigen-elements close to those of A For s = 1, ρ A A ~C spectral separation is lost log n log log n standard spectral method needs fixing to work in low signal regime

21 Discrepancy between SBM with small K and Netflix Eigenvalue distributions 4 outstanding eigenvalues SBM with K=4 Netflix (subset) motivates consideration of SBM with general types

22 SBM with general types User types (u) i.i.d. ~P from general set (e.g. uniform on [0,1]) Edge (u-v) present w.p. b σ u σ v e.g. b x, y F x y s/n for kernel b F(x) x Edges (u-v) labeled by L uv L (finite set) Drawn from distribution μ σ u σ(v) Form matrix {A ij W(L ij )} from random projections W l of labels

23 SBM with general types: Spectral properties for logarithmic s Define kernel K(x, y) l W l b xy μ xy l and integral operator Tf x K x, y f(y)p dy spectrum of s 1 {A ij W(L ij )} spectrum of T Eigenvalue convergence: s 1 i n i Eigenvector convergence: x i (u) φ i σ u Associated eigenfunctionn Type of node u

24 SBM with general types: Spectral properties for logarithmic s Flexible model -power-law spectra (convolution operator + Fourier analysis) b x, y F x y F(x) better matches to Netflix data x

25 SBM with general types: estimation for logarithmic s For fixed R form R-dimensional node representatives y u = Prob(label(i,j)=5)? Node j n k 1 x k (u) k=1 R Node i Use empirical distribution of labels L(i,k) for k in -neighborhood of j Embedding allows consistent estimation of label distributions Accuracy controlled by ε and ε R k>r k 2

26 Outline The Stochastic Block Model With labels With general types Performance of Spectral Methods rich signal case The weak signal case: sparse observations Phase transition on detectability Non-backtracking matrix and the Spectral Redemption

27 Weak signal strength: s = (1) Correct classification of all but negligible fraction of nodes impossible (isolated nodes ) Performance of clustering σ assessed by overlap metric: n 1 Overlap:=Max π 1 n π( σ u )=σ u Max σε r α σ u=1 Overlap Overlap s 0 : threshold for Giant component Signal strength s s 0 s c Signal strength s

28 Positively correlated community detection Symmetric two-communities scenario: E A = a/n b/n n/2 b/n a/n n/2 Conjecture ([Decelle-Krzakala-Moore-Zdeborova 2011]: For τ: = a b 2 < 1, correlation tends to zero for any σ 2 a+b Proven by [Mossel-Neeman-Sly 2012] For τ > 1, positive correlation can be achieved Proven by [LM 2013] and [Mossel-Neeman-Sly 2013] Spectral redemption conjecture [Krzakala-Moore-Mossel-Neeman-Sly- Zdeborova-Zhang 2013]): Spectral methods based on non-backtracking matrix can achieve positively correlated detection

29 Non-backtracking matrix B of graph G = (V, E) Defined on oriented edges uv for u, v εe : B uv,xy = 1 v=x 1 u y e f u v y u e f v Asymmetric, such that B k ef = number of non-backtracking paths on G of length k+1 starting at e and ending at f

30 Non-backtracking matrix B of graph G = (V, E) Defined on oriented edges uv for u, v εe : B uv,xy = 1 v=x 1 u y Ihara-Bass formula [Ihara 66] for A: adjacency matrix, D: diagonal matrix of node degrees, 1 u 2 n m Det I ub = Det I ua + u 2 D I Hence s-regular graph Ramanujan iff eigenvalues k of B verify max k: k < 1 k 1 A definition that extends to non-regular graphs ([Stark-Terras 96] s graph theory Riemann hypothesis)

31 Notations and assumptions Let μ 1,, μ r, μ 1 μ r be the leading eigenvalues of E A and x i corresponding eigenvectors in R n Let y i : y i e = x i e 1, y i R n n 1 : lift of eigenvector x i to Rn n 1 In symmetric two-community case, μ s: eigenvalues of μ 1 = a+b, μ 2 2= a b (hence τ = μ μ 1 ) a/2 b/2 b/2 a/2 r Assume constant per-community average degree: j r, i=1 α i b ij μ 1

32 Notations and assumptions Call eigenvalue μ i visible if μ i > μ 1 Let r 0 : μ r0 > μ 1 μ r0 +1 : index of smallest visible eigenvalue Kesten-Stigum condition: r 0 > 1, i.e. there is some i > 1 such that μ i is visible In symmetric two-community case, Kesten-Stigum condition: μ 2 2 > μ 1 or equivalently, τ = μ 2 2 μ 1 > 1

33 Main result: spectrum of non-backtracking matrix for stochastic block model With high probability as n, eigenvalues i of B satisfy i r 0 i μ i = o 1 i > r 0 i μ 1 + o 1 For i r 0, if μ i has multiplicity 1, corresponding eigenvector: asymptotically parallel to B k B k y i for k~ log n

34 Illustration for 2-community symmetric Stochastic block model i

35 Corollary 1: proof of Spectral Redemption Conjecture Assume r 0 > 1, equal community sizes α i 1 r and μ i has multiplicity 1 for some i : 1 < i r 0 Then positive correlation obtained by following procedure 1) Extract eigenvector z i of B 2) Form signs s e = sgn z i e τ 3) To each node u assign s u = s(e) for edge e picked uniformly among graph edges with e 1 = u 4) Assign community σ u at random according to distribution π s u e u e e

36 Illustration for Erdős-Rényi graph with n = 500, α = 4

37 Corollary 2: Erdős-Rényi graphs are nearly Ramanujan For Erdős-Rényi graph G α, n, spectrum n i of associated nonbacktracking matrix B satisfies with high probability as n max k: k < 1 k 1 + o(1) An approximate version of max k: k < 1 k 1 Property generalizes notion of Ramanujan graph to non-regular case [Ihara 66, Stark-Terras 96] Hence: Corollary 2 a «non-regular» version of Friedman s result for regular graphs: most random graphs (with given number of edges) approximately Ramanujan according to extended definition

38 Proof strategy Key step: for k~ log n prove matrix expansion B k r = i=1 k i v i w i + R k where i ~μ i (near-eigenvalue), v i B k B k y i (near- right eigenvector), w i B k y i (near- left eigenvector), R k a perturbation and with high probability Low-rank i j v i v j, v i w j, w i w j = o 1, (near-orthonormal system) v i w i > c > 0 (low condition number) ρ R k = O μ 1 k (negligible perturbation) Yields the result, after leveraging Bauer-Fike theorem (a tool to control impact of additive perturbation on eigenvalues of matrix not necessarily symmetric)

39 Proof elements (perturbations negligible) To show ρ R k = O μ 1 k, establish «small norm in complement»: sup x =1,x w i =0,i=1,,r Bk x = O μ 1 k Challenge: directions w i random and correlated with B k

40 Remaining mysteries about SBM s (1) Conjectured phase diagram for more than 2 blocks (assuming fixed inter-community parameter b) Intra-community parameter a Above Kesten-Stigum threshold Detection easy (spectral methods or BP) Detection hard but feasible (how? In polynomial time?) Detection infeasible r=4 r=5 Number of communities r

41 Remaining mysteries about SBM s (2) Clique detection problem: add a size-k clique to random graph with edgeprobability ½ ½ i.e. a 2-block SBM with unbalanced block sizes: ½ ½ K n-k for K = n clique easily detectable (e.g. inspection of node degrees) are there polynomial-time algorithms for smaller yet large K? (e.g. K = 3 n ) A notoriously hard problem ( planted clique detection recently proposed as a new benchmark of algorithmic hardness)

42 Conclusions and Outlook Variations of basic spectral methods still to be invented: interesting mathematics and practical relevance Detection in SBM = rich playground for analysis of computational complexity with methods of statistical physics Computationally efficient methods for hard cases (planted clique, intermediate phase for multiple communities)? Non-regular Ramanujan graphs: theory still in its infancy (proper analogue of Alon-Boppana s theorem missing)

43 Thanks!

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria Community Detection fundamental limits & efficient algorithms Laurent Massoulié, Inria Community Detection From graph of node-to-node interactions, identify groups of similar nodes Example: Graph of US