Emmanuel Abbe Colin Sandon - PDF Free Download

Detection in the stochastic bock mode with mutipe custers: proof of the achievabiity conjectures, acycic BP, and the information-computation gap Emmanue Abbe Coin Sandon Abstract In a paper that initiated the modern study of the stochastic bock mode, Decee et a., backed by Mosse et a., made a fascinating conjecture: Denote by k the number of baanced communities, a/n the probabiity of connecting inside communities and b/n across, and set SNR = a b / ka + k 1)b); for any k, it is possibe to detect communities efficienty whenever SNR > 1 the KS threshod), whereas for k 5, it is possibe to detect communities information-theoreticay for some SNR < 1. Massouié, Mosse et a. and Bordenave et a. succeeded in proving that the KS threshod is efficienty achievabe for k =, whie Mosse et a. proved that it cannot be crossed information-theoreticay for k =. The above conjecture remained open for k 3. This paper proves this conjecture. For the efficient part, a inearized acycic beief propagation ABP) agorithm is deveoped and proved to detect communities for any k down to the KS threshod in time On og n). Achieving this requires showing optimaity of ABP in the presence of cyces, a chaenge in the ream of graphica modes. The paper further connects ABP to a power iteration method on a nonbacktracking operator of generaized order, and shows that the mode parameters can be earned efficienty down to the KS threshod. For the information-theoretic IT) part, a non-efficient agorithm samping a typica custering is shown to break down the KS threshod at k = 5. The emerging gap is shown to be arge in some cases; if a = 0, the KS threshod reads b k whereas the IT bound reads b k nk). Finay, the efficient resuts are extended to non-symmetrica SBMs with a generaized notion of detection that appies to genera SBMs; this improves prior methods both in terms of compexity and universaity in achieving the KS threshod. Program in Appied and Computationa Mathematics, and EE Department, Princeton University, USA, eabbe@princeton.edu. This research was party supported by the NSF CAREER Award CCF-155131, the ARO grant W911NF-16-1-0051, and the Googe Facuty Research Award. Department of Mathematics, Princeton University, USA, sandon@princeton.edu.

Contents 1 Introduction 1 1.1 Our resuts.................................... 1. Reated iterature................................. 4 1.3 Reated modes.................................. 5 Resuts 6.1 Achieving the KS threshod efficienty..................... 8.1.1 Acycic Beief Propagation ABP) Agorithm............. 8. Crossing the KS threshod information-theoreticay............. 9..1 Typicaity Samping Agorithm..................... 11.3 Learning the mode................................ 11 3 Achieving the KS threshod: proof technique 11 3.1 Random assignment............................... 1 3. Ampification................................... 1 3.3 Nonbacktracking waks.............................. 14 3.4 Compensation for the average vaue...................... 16 3.5 The fina agorithm for the symmetric SBM.................. 19 3.6 The spectra view................................. 1 3.7 The genera SBM................................. 3 3.8 Aternatives.................................... 6 4 Crossing the KS threshod: proof technique 7 5 Open probems 30 6 Proofs 31 6.1 Achieving the KS threshod........................... 31 6.1.1 Preiminaries............................... 33 6.1. The shard decomposition........................ 35 6.1.3 Estimating the expectation of W m/s.................. 41 6.1.4 Bounding the variance of W m/s..................... 47 6. Crossing the KS threshod............................ 60 6..1 Atypicaity of a bad custering..................... 60 6.. Size of the typica set.......................... 64 6..3 Samping estimates............................ 69 6.3 Learning the mode................................ 70

1 Introduction The stochastic bock mode SBM) is a canonica mode of networks with communities, and a natura mode to study various centra questions in machine earning, agorithms and statistics. The mode serves in particuar as test bed for custering and community detection agorithms, commony used in socia networks [NWS], protein-to-protein interactions networks [CY06], gene expressions [CSC + 07], recommendation systems [LSY03], medica prognosis [SPT + 01], DNA foding [CAT15], image segmentation [SM97], natura anguage processing [BKN11] and more. Interestingy, the SBM emerged independenty in mutipe scientific communities. The bock mode terminoogy, which seems to have dominated in the recent years, comes from the machine earning and statistics iterature [HLL83, WBB76, FMW85, WW87, BC09, KN11, SN97, RCY11, CWA1], whie the mode is typicay caed the panted partition mode in theoretica computer science [BCLS87, DF89, Bop87, JS98, CK99, CI01, McS01], and the inhomogeneous random graphs mode in the mathematica iterature [BJR07]. Athough the mode was defined as far back as the 80s, it resurged in the recent years due in part to the foowing fascinating conjecture estabished first in [DKMZ11], and reiterated in [MNS1], from deep but non-rigorous statistica physics arguments: Conjecture 1. Let X, G) be drawn from SBMn, k, a, b), i.e., X is uniformy drawn among partitions of [n] into k baanced custers, and G is a random graph on the vertex set [n] where edges are paced independenty with probabiity a/n inside the custers and b/n a b across. Define SNR = and say that an agorithm detects communities if it ka+k 1)b) takes as an input the graph G and outputs a custering ˆX that is positivey correated with X with high probabiity. Then, i) Irrespective of k, if SNR > 1, it is possibe to detect communities in poynomia time, i.e., the Kesten-Stigum KS) threshod can be achieved efficienty; ii) If k 5, it is possibe to detect communities information-theoreticay for some SNR stricty beow 1. We prove this conjecture in this paper. Major resuts were aready obtained for the case of k =. It was proved in [Mas14, MNS14b] that the KS threshod can be achieved efficienty for k =, with an aternative proof ater given in [BLM15]. However, for k =, no information-computation gap takes paces as shown with a tight converse in [MNS1]. It was aso shown in [BLM15] that for SBMs with mutipe sighty asymmetric communities, a chaenging probem for detection, the KS threshod can be achieved. Yet, [BLM15] does not resove Conjecture 1 for k 3. An interesting chaenge raised by part i) of the conjecture is that standard custering methods, commony used in appications, fai to achieve the KS threshod. This incudes spectra methods based on the adjacency matrix or standard Lapacians, as we as SDPs. For standard spectra methods, a first issue is that the fuctuations in the node degrees produce high-degree nodes that disrupt the eigenvectors from concentrating on the custers. 1 1 This issue is further enhanced on rea networks where degree variations are arge. 1

A cassica trick is to trim such high-degree nodes [Co10, Vu14, GV14, CRV15], throwing away some information, but this does not suffice to achieve the KS threshod. SDPs are a natura aternative, but they aso stumbe before the KS threshod [GV14, MS15], focusing on the most ikey rather than typica custerings. As we sha show in this paper, and as aready investigated in [KMM + 13, BLM15], BP agorithms and non-backtracking operators provide instead a proper soution. In their origina paper [DKMZ11], Decee et a. mention that beief propagation BP) is ikey to achieve the KS threshod, and in fact, to give the the optima accuracy in the reconstruction of the communities. However, the main issue when appying BP to the SBM is the cassica one: whie it is not hard to prove that BP converges to the desired soution on trees, the presence of cyces in the graph makes the behavior of the agorithm much more difficut to understand, and BP is susceptibe to setting down in the wrong fixed points in such scenarios. 3 This is a ong standing chaenge in the ream of message passing agorithms for graphica channes. Moreover, achieving the KS threshod requires precisey running BP to an extent where the graph is not even tree-ike, thus precuding us from using standard toos. Interestingy, numerica simuations suggest that starting with a purey random initiaization, i.e., etting each vertex in the graph guess its community membership at random, and running BP does the job. However, no prior method was known to contro random initiaization, as discussed in [MNS14b]. We deveop this approach here, with an acycic inearized beief propagation agorithm. Further, the paper proves part ii) of the conjecture, crossing the KS threshod at k = 5 using a non-efficient agorithm that sampes a typica custering i.e., a custering having the right proportions of edges inside and across custers). Note that the information-computation gap concerns the gap between the KS threshod and what is achieved information-theoreticay, which is the gap between the information-theoretic and computationa threshods ony under non-forma evidences [DKMZ11]. However, the IT bound that resuts from our anaysis gives a gap to the KS threshod which is arge in some cases, making the SBM a good study-case for studying information-computation gaps. Bounds on the information-theoretic threshods were aso obtained independenty in [BM16]. 1.1 Our resuts The foowing is obtained: 1. A inearized acycic beief propagation ABP) agorithm is deveoped and shown to detect communities down to the KS threshod with compexity On og n), proving part i) of Conjecture 1. A more genera resut appying to genera asymmetrica) SBMs with a generaized notion of detection is aso obtained. ABP improves thus on prior agorithms for k = [Mas14, MNS14b, BLM15] both in terms of compexity and universaity of achieving the KS threshod see Theorem 1); The recent resuts of [MPW15] on robustness to monotone adversaries impy that SDPs can in fact not achieve the KS threshod. 3 Empirica studies of BP on oopy graph show that convergence sti takes pace in some cases [MWJ99].

. An agorithm that sampes a custering with typica voumes and cuts is shown to break down to the KS threshod at k = 5, proving part ii) of Conjecture 1; 3. A connection between ABP and a power iteration method on a non-backtracking operator is deveoped, extending the operator of [Has89] to higher order non backtracks, and formaizing the interpay pointed out in [KMM + 13] between inearized BP and nonbacktracking operators; 4. The information-theoretic IT) bound is characterized at the extrema regimes of a and b. For a = 0, it is shown that detection is information-theoreticay sovabe if b > ck n k + o k 1), c [1, ]. Thus the information-computation gap defined as the gap between the KS threshod and the IT bound is arge since the KS threshod reads b > kk 1). The behaviour of the IT bound is aso described for b cose to 0. 5. An efficient agorithm is shown to earn the parameters a, b, k down to the KS threshod. To achieve the KS threshod, we rey on a inearized version of BP that can hande cyces. The simpest inearized 4 version of BP is to simpy repeatedy update beiefs about a vertex s community based on its neighbor s suspected communities whie ignoring the part of that beief that resuts from the beiefs about that vertex s community to prevent a feedback oop. However, this ony works ideay if the graph is a tree. The correct response to a cyce woud be to discount information reaching the vertex aong either branch of the cyce to compensate for the redundancy of the two branches. However, due to computationa issues we simpy prevent information from cycing around sma cyces in order to imit feedback. We aso add steps where a mutipe of the beiefs in the previous step are subtracted from the beiefs in the current step to prevent the beiefs from setting into an equiibrium where vertices communities are sytematicay misrepresented in ways that add credibiity to each other. We refer to Section.1.1 for a compete description of the agorithm and Section 3 for further intuition on how it performs. The fact that ABP is equivaent to a power iteration method on a non-backtracking operator resuts from its inearized form, as pointed out first informay in [KMM + 13]. This provides an intriguing synergy between message passing agorithm and spectra methods. It further aows us to interpret the obstructions of spectra methods through the ens of BP. The risk of obtaining eigenvectors that concentrate on singuar structures e.g., high degree nodes for Lapacian s), is reated to the risk that BP settes down in wrong fixed points e.g., due to cycing around high-degree nodes). Rather than removing such obstructions, ABP mitigates the feedback coming from the oops, giving rise to a non-backtracking operator, which extends the operator of [Has89] by considering higher order nonbacktracking. In addition to simpifying the proofs, considering higher order nonbacktracking operators may hep mitigate short oops in more genera modes, which is of independent interest. Further detais are provided in Section 3. 4 Different forms of approximate message passing agorithms have been studied, such as in [?] for compressed sensing. 3

To cross the KS threshod information theoreticay, we rey on a non-efficient agorithm that sampes a typica custering. Upon observing a graph drawn from the SBM, the agorithm buids the set of a partitions of the n nodes that have a typica fraction of edges inside and across custers, and then sampes a partition uniformy at random from that set. The anaysis of the agorithm reveas three different regimes, that refect three ayers of refinement in the bounds on the typica set s size. In a first regime, bad custerings i.e., partitions of the nodes that agree in no more than cose to 1/k vertices) are with high probabiity not typica using a union-bound, and the agorithm sampes ony good custerings with high probabiity. This aows us to cross the KS threshod at a = 0 but does not give the right bound at b = 0. In a second regime, the arge number of tree-ike components in the graph is expoited, finding some bad custerings to be typica but unikey to be samped. This gives a regime where the agorithm succeed with the right bound at b = 0, but not the right approximation at sma b. To address the atter, a finer estimate on the typica set s size is obtained by aso expoiting parts of the giant that are tree-ike. The bound is shown to aso scae right in k when a = 0 i.e., panted k-cooring), using a converse resut for the reconstruction on tree probem. Further detais are in Section 4. The earning of the parameters a, b, k is done simiary as for the case k = [MNS1]. Note that earning the parameters when k is unknown was previousy setted ony for diverging degrees [AS15b, BCS15]. 1. Reated iterature Severa methods have been devised and proved to succeed down to the KS threshod for two communities. The first is based 5 on a spectra method from the matrix of sef-avoiding waks entry i, j) counts the number of sef-avoiding waks of moderate size between vertices i and j) [Mas14], the second on counting weighted non-backtracking waks between vertices [MNS14b], and the third on a spectra method with the matrix of non-backtracking waks between directed edges each edge is repaced with two directed edges and entry e, f) is one if and ony if edge e foows edge f) [BLM15]. The first method has a compexity of On 1+ε ), ε > 0, whie the second method affords a esser compexity of On og n) but with a arge constant see discussion in [MNS14b]). These two methods were the first to achieve the KS threshod for two communities, setting a major conjecture. The third method is based on a detaied anaysis of the spectrum of the non-backtracking operator and aows going beyond the SBM with communities, requiring however a certain asymmetry in SBM parameters to obtain a resut on detection the precise condition is the requirement on µ k being a simpe eigenvaue of M in Theorem 5 of [BLM15]), thus faing short of proving Conjecture 1.i) for k 3 since the second eigenvaue in this case has mutipicity at east ). Note that a certain amount of symmetry is needed to make the detection probem interesting. For exampe, if the communities have different average degrees, detection becomes trivia. Thus the symmetric mode SBMn, k, a, b) is in a sense the most chaenging mode for detection. The non-backtracking operator was proposed first for the SBM in [KMM + 13], aso described as a inearization of BP. A precise spectra anaysis of this operator is deveoped in 5 Reated ideas reying on shortest paths were aso considered in [BB14]. 4

[BLM15] and appied to the SBM. This approach gives a fascinating approach to community detection, giving the first rigorous understanding on why nonbacktracking operators are reevant in this context. Besides the previousy mentioned shortcomings in the fuy symmetric case, the approach aso suffers from an increase in dimension, as the derived matrix scaes with the number of edges rather than vertices specificay E E, where E is the number of edges), which matters for the task of extracting the eigenvectors. 6 Our proof technique bares simiarities with the above papers in few ocations, but diverges in severa key parts. A few expansions in the paper are simiar to others carried in [MNS14b], such as the weighted sums in Definition 11 of [MNS14b] and the SAW decomposition in [MNS14b]. As discussed in Section 3.6, our ABP agorithm can aso be viewed as a power iteration method on the r-nonbacktracking operator, which generaizes the cassica nonbacktracking operator to higher order backtracks. Instead of extracting the eigenvectors as in [BLM15], the power iteration takes a random vector and appies iterativey the matrix. This typicay gives the first eigenvaue, whereas we are interested in the second eigenvaue. A cassica fix to achieve the atter is to use a defation method, subtracting the first eigenvector before proceeding to iterations. This approach is ikey to work in the symmetric SBM, but in the genera SBM, we rey on a different approach that subtracts the first eigenvaue times the identity matrix. In addition, the impementation is done in a message passing fashion ABP), rather than buiding the actua matrix of dimension growing with the number of edges and iterating it. From a practica standpoint, the spectra version woud be ess efficient than the actua ABP agorithm, whie its proof of correctness woud be neary identica. Further, whie r = is arguaby the simpest impementation, a arger r may be beneficia in practice. For exampe, an adversary may add trianges for which ABP with r = woud fai whie arger r woud succeed. Finay, the approach of ABP in contrast to the spectra one) can be extended beyond the inearized setting to improve the agorithm s accuracy. For the information-theoretic part, a few papers have studied information-theoretic bounds and information-computation tradeoffs for SBMs with a growing number of communities [YC14], two unbaanced communities [NN14], and a singe community [Mon15]. No resuts seemed known for the symmetric SBM and Conjecture 1b). Shorty after this paper posting, [BM16] obtained in an independent effort bounds on the information theoretic threshod that cross the KS threshod at k = 5, using moment methods. 1.3 Reated modes Exact recovery is a stronger recovery requirement than detection, which has ong been studied for the SBM [BCLS87, DF89, Bop87, SN97, JS98, CK99, CI01, McS01, BC09, RCY11, CWA1, CSX1, Vu14, YC14, AL14, ABBS14a], and more recenty in the ens of sharp threshods [ABH16, MNS14a, YP14, BH14, Ban15, CG15,?, YP15]. The notion of exact recovery requires a reconstruction of the compete communities with high probabiity. It was proved in [ABH16, MNS14a] that exact recovery has a sharp threshod for SBMn,, a ogn), b ogn)) at a b = 1, which can be achieved efficienty. As 6 The non-backtracking matrix is aso not norma and has thus a compex spectrum; an interesting heuristic based on the Bethe Hessian operator was proposed in [SKZ14] to address such issues. 5

opposed to detection which can expoit variations in degrees, exact recovery becomes harder when considering genera SBMs, where communities have different reative sizes and different connectivity parameters. In [AS15a], it was proved that for the genera SBM with inear size communities, exact recovery has a sharp threshod at the CH-divergence, and the threshod is proved to be efficienty achievabe without knowing the parameters in [AS15c]). This further improves on the resut of [Vu14] that appy to the ogarithmic degree regime in fu generaity. Thus, for exact recovery with inear size communities, there is no information-computation gap. When considering sub-inear communities and coarser regime of the parameters, [YC14] gives evidences that exact recovery can again have information-computation gaps. We aso conjecture that simiar phenomenon can take pace in the setting of [AS15a] for exact recovery when k is arger than ogn). Finay, many variants of the SBM can be studied, such as the abeed bock mode [GZFA10, HLM1, XLM14], the censored bock mode [AM15, ABBS14a, CG14, ABBS14b, GRSY14, CRV15, SKLZ15], the degree-corrected bock mode [KN11], overapping bock modes [For10] and more. Whie most of the fundamenta chaenges seem to be captured by the SBM aready, these represent important extensions for appications. Resuts The SBM can be defined with a uniform or Binomia mode for the communities. This means that for a probabiity vector p = p 1,..., p k ), the communities may be drawn uniformy at random among a partitions of n having np i vertices in community i with an arbitrary rounding rue on np i to obtain integers adding up to n), or each vertex may be assigned a abe in [k] independenty with probabiity p. These are equivaent for the purpose of this paper, due to standard concentration arguments [AS9] and the fact that the graph contains a constant fraction of isoated nodes with high probabiity. We may thus switch freey between the modes to ease the presentation. In the case when p = 1/k,..., 1/k), we simpy say that the communities are baanced. We denote the genera sparse SBM by SBMn, p, Q/n), where n is the number of vertices in the graph, p is a probabiity distribution on [k] governing the reative sizes of the communities, 7 and Q is a k k symmetric matrix with nonnegative entries, such that Q/n gives the connectivity matrix for sufficienty arge n), i.e., pair of nodes in community i and j connect independenty with probabiity Q i,j /n. The symmetric SBM is defined as foows. Definition 1. σ, G) is drawn under SBMn, k, a, b), if σ is a baanced n-dimensiona vector on [k] and G is a random graph on the vertex set [n] where edge i, j) ) [n] is drawn with probabiity 1σ i = σ j )a/n + 1σ i σ j )b/n, independenty of the other edges. Note that we often tak about G being drawn under the SBM without specifying the panted community partition σ. We aso standardy et Ω i = {v : σ v = i} for each i. 7 Note that p does not scae with n. 6

Definition. Let σ [k] n and ε > 0. We define the set of bad custerings with respect to σ as B ε σ) = {y [n] k : 1 n d σ, y) > 1 1 ε}, 1) k where d σ, y) is the minimum Hamming distance between σ and any reabeing of y i.e., any mapping of the components of y with a fixed permutation of [k] ). Reabeings need to be considered since ony the partition needs to be detected and not the actua abes. It is simpy convenient to work with abes. Definition 3. An agorithm ˆσ : [n] ) [k] n soves detection or weak recovery) in SBMn, k, a, b) if for some ε > 0, P σ,g {ˆσG) B ε σ)} = o n 1), ) where σ, G) SBMn, k, a, b). Detection is sovabe efficienty if the agorithm runs in poynomia time in n, and information-theoreticay otherwise. Note that if ˆσ is a randomized agorithm i.e., it takes the graph as an input and outputs various custerings with different probabiities), and if for some ε > 0, P σ,g,ˆσ {ˆσG) B ε σ)} = o n 1), 3) then detection is sovabe information-theoreticay). Note that the mode SBMn, k, a, b) corresponds to the genera mode SBMn, p, Q/n) where p = 1/k,..., 1/k) and Q i,j is a if i = j and b otherwise. We aso consider weaker notions of symmetric SBMs, where p and Q are such that the row sums of P Q are constant. This means that every node in the graph has the same expected degree. A more genera definition of detection is then as foows. Definition 4. An agorithm ˆσ : [n] ) n soves detection or weak recovery) in SBMn, p, Q/n) if for some ε > 0, when σ, G) SBMn, p, Q/n) and a abeing of the vertices is generated by ˆσG), the foowing hods with probabiity 1 o n 1). There exist communities i and j such that the fraction of vertices from community i that are abeed 1 differs from the fraction of vertices from community j that are abeed 1 by at east ε. Detection is sovabe efficienty if the agorithm runs in poynomia time in n, and information-theoreticay otherwise. In other words, an agorithm suceeds at detection if it divides the graph s vertices into two sets such that the fractions of vertices from different communities that are assigned to one of the sets differ significanty. Note that in the symmetric case, an agorithm that soves detection according to the atter definition can be converted to one that soves detection according to the former definition by spitting as many disjoint subsets of size n/k as possibe off of each set, and then grouping a eftover vertices together. The origina sets contain more vertices from some communities than others, so the subsets they are spit into aso contain more vertices from some communities than others. Thus, there is an identification of these subsets with the communities under which nontriviay more than 1/k of the vertices are in the subsets corresponding to their communities. 7

.1 Achieving the KS threshod efficienty We present first a genera resut that appies to the genera SBM with a notion of detection that is meaningfu even in a mode that is not symmetric or is ony weaky symmetric by having a constant expected degree for each node). We next specify the resut for symmetric SBMs, and provide the SBP agorithm in the next section. Theorem 1. Let p 0, 1) k with p = 1, Q be a symmetric matrix with nonnegative entries, P be the diagona matrix such that P i,i = p i, and λ 1,..., λ h be the distinct eigenvaues of P Q in order of nonincreasing magnitude. Aso, et s = 3 if λ = λ 3 and s = otherwise. If λ > λ 1 then there exist constants ɛ,, r, c, γ > 0 and m = Θogn)) such that the acycic beief propagation agorithm with these parameters soves detection in SBMn, p, Q/n). The agorithm can be run in On og n) time. The proof is in Section 6.1. Coroary 1. ABP soves detection in SBMn, k, a, b) if and can be run in On og n) time. a b) ka + k 1)b) > 1 4).1.1 Acycic Beief Propagation ABP) Agorithm We mention here the genera version of the ABP agorithm, appying to Theorem 1. Simpified versions are provided in Section 3.5 and 3.6. In particuar, Section 3.6 shows that the agorithm can be viewed as appying a power iteration method from a random vector to a higher order nonbacktracking operator W r). ABP G, m, r,, c, γ, λ 1,..., λ h ), s) : 1. Initiaize: a) Assign each edge of G independenty with probabiity γ to a set Γ. Then, remove these edges from G. b) Find a cyces of ength r or ess in G. c) For every vertex v G, randomy draw x v from a Gaussian distribution with mean 0 and variance 1. d) For each adjacent v and v, set y 1) v,v = x v, and y t) v,v = 0 for a t < 1.. Propagate: a) For each 1 t m, and each adjacent v, v ) EG), set y t) v,v = v :v,v ) EG),v v y t 1) v,v 8

uness v, v is part of a cyce of ength r or ess. If it is, then et the other vertex in the cyce that is adjacent to v be v, and the ength of the cyce be r 8. Set y t) v,v = y t 1) v,v v :v,v ) EG),v v uness t = r. In that case, set y t) v,v = b) For each 1 t m and v G, set v :v,v ) EG),v v y t) v = v :v,v ) EG) y t r) v,v v :v,v ) EG),v v,v v y t 1) v,v x v. y t) v,v. c) For each s < s, repeat the foowing m r r+1)s times: Simutaneousy set y t) v to y v t) d) For each v, set 3. Assign: λ s y t 1) v for each v and t 9. and set y v to the sum of y v og og n. y v = v :v,v ) Γ y m) v over a v that have shortest path to v of ength a) Set c = c v G y v) /n. Create sets of vertices S 1 and S as foows. For each vertex v, if y v < c, assign v to S 1. If y v > c, then assign v to S. Otherwise, assign v to S with probabiity 1/ + y v/c and S 1 otherwise. b) Return S 1, S ).. Crossing the KS threshod information-theoreticay Theorem. Let d := a+k 1)b k 0, 1) of τe τ = de d, i.e., τ = + j=1 detects 10 communities in SBMn, k, a, b) if 1 n k a n a + k 1)b n b k, assume d > 1, and et τ = τ d be the unique soution in j j 1 j! de d ) j. The Typicaity Samping Agorithm a + k 1)b k n ) a + k 1)b > k 1 τ 1 τk/a + k 1)b). 5) 8 What the agorithm does if v, v ) is in mutipe cyces of ength r or ess is unspecified. However, there is no such edge with probabiity 1 o1), so we can assume that this does not come up. 9 This is an inefficient way of doing this. The ony part of the resut that matters is that y m) have the correct vaue, and that can be computed more efficienty by keeping track of what inear combination of the origina y t) each of the y t) woud currenty be if these steps were being carried out, and then setting y m) to the appropriate inear combination at the end. 10 Setting δ > 0 sma enough gives the existence of ε > 0 for detection. 9

This bound stricty improves on the KS threshod for k = 5; see Figure 3. Figure 1: The horizonta axis has b varying, whie a = 1/10 and k = 5. The red dashed) curve is the KS threshod, i.e., the vaues of b for which a b) /ka + k 1)b)) = 1, and the bue pain) curve is our IT threshod. When the curves are positive, detection is sovabe. In particuar, detection is sovabe beow the KS threshod. Remark 1. Note that in terms of d, the previous expression reads ) 1 a n a + k 1)b n b d n d > 1 τ =: fτ, d), 6) n k k 1 τ/d and since fτ, d) < 1 when d > 1 which is needed for the presence of the giant), detection is aready sovabe in SBMn, k, a, b) if ) 1 a n a + k 1)b n b d n d > 1. 7) n k k As we sha see in Lemma 0, the above corresponds to the regime where there is no bad custering that is typica with high probabiity. However, the above bound is not tight in the extreme regime of b = 0, since it reads a > k as opposed to a > k. An intermediate bound is aso obtained by dropping the term 56) in the proof of the theorem, corresponding to ignoring panted trees in the giant see Section 4), which gives 1 n k ) a n a + k 1)b n b d n d > 1 τ k d 1 Defining a k b) as the unique soution of n k expanding the bound in Theorem gives the foowing. 10 a n a+k 1)b n b k 1 τ ). 8) ) d n d = fτ, d) and

Coroary. Detection is sovabe k n k in SBMn, k, 0, b) if b > k 1) n k fτ, bk 1)/k), 9) k 1 in SBMn, k, a, b) if a > a k b), where a k 0) = k. 10) Remark. Note that 10) approaches the optima bound given by the presence of the giant at b = 0, and we further conjecture that a k b) gives the correct first order approximation of the information-theoretic bound for sma b. Note aso that 9) improves significanty on the KS threshod given by b > kk 1) at a = 0. We beieve that this gives the correct scaing in k for a = 0, i.e., that for b < 1 ε)k nk) + o k 1), ε > 0, detection is information-theoreticay impossibe...1 Typicaity Samping Agorithm Given an n-vertex graph G and δ > 0, the agorithm draws ˆσ typ G) uniformy at random in T δ G) ={σ Baancedn, k) : k ) [n] {G u,v : u, v) s.t. σ u = i, σ v = i} an 1 δ), k ) [n] bnk 1) {G u,v : u, v) s.t. σ u = i, σ v = j} 1 + δ)}, k i,j [k],i<j where the above assumes that a > b; fip the above two inequaities in the case a < b..3 Learning the mode To earn the parameters, we count cyces of sowy growing ength as aready done in [MNS1] for k =, using non-backtracking waks to approximate the count. Lemma 1. If SNR > 1, there exists a consistent and efficient estimator for the parameters a, b, k in SBMn, k, a, b). 3 Achieving the KS threshod: proof technique Reca the parameters: k and n are positive integers, p 0, 1) k with p i = 1, and Q is a k k symmetric matrix with nonnegative entries. Then SBMn, p, Q/n) generates n-vertex graphs by the foowing procedure. First, each vertex v is randomy and independenty assigned a community σ v such that the probabiity that σ v = i is p i for each i. Then, each pair of vertices v and v have an edge put between them with probabiity Q σv,σ v /n. Aso, et Ω 1,..., Ω k be the communities and P be the k k diagona matrix such that P i,i = p i for each i. Now, consider the -community symmetric stochastic bock mode. In this case, k =, p = [1/, 1/], Q i,j is a if i = j and b otherwise for some a, b. Now, et λ 1 = a+b be 11

the average degree of a vertex in a graph drawn from this mode, and λ = a b be the other eigenvaue of P Q. Throught this section we say f is approximatey g or f g when f g = o f + g ) with probabiity 1 o1). Our goa is to determine which of v s vertices are in each community with an accuracy that is nontriviay better than that attained by random guessing. Obviousy, the symmetry between communities ensures that we can never te whether a given vertex is in community 1 or community, so the best we can hope for is to divide the vertices into two sets such that there is a nontrivia difference between the fraction of vertices from community 1 that are assigned to the first set and the fraction of vertices from community that are assigned to the first set. 3.1 Random assignment Now, et x and n x be the numbers of vertices in community 1 and community respectivey. If the vertices are assigned sets at random, then the expected numbers of vertices from each community in the first set are x n x and. However, by the centra imit theorem, the probabiity distribution of the actua number of vertices from a given community in the first set is approximatey a gaussian distribution with the mean stated previousy and a variance of x n x 4 or 4 as appropriate. That means that the probabiity distribution of the difference between the fraction of the vertices from community 1 assigned to the first set and the fraction of the vertices from the community assigned to the first set is aso approximatey a be curve. This one has a mean of 1 1 = 0 and a variance of x 4 /x + n x /n x) = 1 4 4x + 1 4n x) 1 n + 1 n = 1 n That means that it has a standard deviation of 1/ n, and the difference between the fraction of the vertices from community 1 assigned to the first set and the fraction of the vertices from community assigned to the first set wi typicay have a magnitude on the order of 1/ n. Labe the sets S 1 and S such that the fraction of the vertices from σ 1 that were assigned to S 1 is at east as arge as the fraction of the vertices from σ that were assigned to S 1. For the rest of this section, we wi consider the difference between these fractions to be fixed. 3. Ampification Once we have even such weak information on which vertex is in which community, we can try to improve our cassification of a given vertex by factoring in our knowedge of what communities the nearby vertices are in. For a vertex v and integer t, et N t v) be the number of vertices t edges away from v, t v) be the difference between the number of vertices t edges away from v that are in community 1 and the number of vertices t edges away from v that are in community, and t v) be the difference between the number of vertices t edges away from v that are in S and the number of vertices t edges away from v that are in S 1. For sma t, a + b E[N t v)] 1 ) t

and ) a b t E[ t v)] 1) σv For any fixed vaues of N t v) and t v), the probabiity distribution of t v) is essentiay a Gaussian distribution with a mean of Θ t v)/ n) and a variance of N t v) because it is the sum of N t v) neary independent variabes that are approximatey equay ikey to be 1 or 1. So, t v) is positive with a probabiity of 1 + Θ tv)/ N t v) n). In other words, if v is in community 1 then t v) is positive with a probabiity of a ) 1 b t ) a + b t/ Θ / n) and if v is in community then t v) is positive with a probabiity of a ) 1 b t ) a + b t/ + Θ / n) If a b) a + b), then this is not improving the accuracy of the cassification, so this technique is useess. On the other hand, if a b) > a + b), the cassification becomes more accurate as t increases. However, this formua says that to cassify vertices with an accuracy of 1/ + Ω1), we woud need to have t such that ) a b t a ) + b t = Ω n) However, uness 11 a or b is 0, that woud impy that ) a + b t a ) ) b t a ) + b t = ω = ω n) which means that a+b )t = ωn). It is obviousy impossibe for N t v) to be greater than n, so this t is too arge for the approximation to hod. The probem is, the approximation assumes that each vertex at a distance of t 1 from v has one edge eading back towards v, and that the rest of its edges ead towards new vertices. Once a significant fraction of the vertices are ess than t edges away from v, a significant fraction of the edges incident to vertices t 1 edges away from v are part of oops and thus do not ead to new vertices. 11 If a = 0 and k = or b = 0, then cassifying vertices based on the sign of tv) for suitabe t is ikey to work, but this is pointessy compicated because every component of the graph consists of a vertices of one community or vertices of aternating communities. 13

Figure : The eft figure shows the neighborhood of vertex v pued from the SBM graph at depth c og d n, c < 1/, which is a tree with high probabiity. If one had an educated guess about each vertex s abe, of good enough accuracy, then it woud be possibe to ampify that guess by considering ony such sma neighborhoods deciding with the majority at the eaves). However, we do not have such an educated guess. We thus initiaize our abes purey at random, obtaining a sma advantage of roughy n vertices by uck i.e., the centra imit theorem), in either an agreement or disagreement form. This is iustrated in agreement form in the right figure. We next attempt to ampify that ucky guess by expoiting the information of the SBM graph. Unfortunatey, the graph is too sparse to et us ampify that guess by considering tree ike or even oopy neighborhoods; the vertices woud have to be exhausted. This takes us to considering waks. 3.3 Nonbacktracking waks An obvious way to sove the probem caused by running out of vertices woud be to simpy count the waks of ength t from v to vertices in S 1 or S. Reca that a wak is a series of vertices such that each vertex in the wak is adjacent to the next, and a path is a wak with no repeated vertices. The ast vertex of such a wak wi be adjacent to an average of approximatey a/ vertices in its community outside the wak and b/ vertices in the other community outside the wak. However, it wi aso be adjacent to the second to ast vertex of the wak, and maybe some of the other vertices in the wak as we. As a resut, the number of waks of ength t from v to vertices in S 1 or S cannot be easiy predicted in terms of v s community. So, the numbers of such waks are not usefu for cassifying vertices. We coud dea with this issue by counting paths of ength t from v to vertices in S 1 and S. Given a path of ength t 1, the expected number of vertices outside the path in the same community as its ast vertex that are adjacent to it is approximatey a/ and the expected number of vertices outside the path in the opposite community as its ast vertex that are adjacent to it is approximatey b/. So, the expected number of paths of ength t from v is approximatey a+b )t and the expected difference between the number that end in vertices in the same community as v and the number that end in the other 14

community is approximatey a b )t. The probem with this is that counting a of these paths is inefficient. The compromise we use is to count nonbacktracking waks ending at v, i.e. waks that never repeat the same edge twice in a row. We can efficienty determine how many nonbacktracking waks of ength t there are from vertices in S i to v by using the fact that the number of nonbacktracking waks of ength t starting at a vertex in S i and having v and v as their ast two vertices is equa to the sum over a v v such that v is adjacent to v of the number of nonbacktracking waks of ength t 1 starting at a vertex in S i and having v and v as their ast two vertices. Furthermore, most nonbacktracking waks of a given ength that is ogarithmic in n are paths, so it seems reasonabe to expect that counting nonbacktracking waks instead of paths in our agorithm wi have a negigibe effect on the accuracy. Figure 3: This figure extends Figure to a arger neighborhood. The ABP agorithm ampifies the beief of vertex v by considering a the waks of a given ength that end at it. To avoid being disrupted by backtracking or cycing the beiefs on short oops, the agorithm considers ony waks that do not repeat the same vertex within r steps, i.e., r-nonbacktracking waks. For exampe, when r = 3 and when the waks have ength 7, the green wak starting at vertex v 1 is discarded, whereas the orange wak starting at the vertex v is counted. Note aso that the same vertex can ead to mutipe waks, as iustrated with the two magenta waks from v 3. Since there are approximatey equay many such waks between any two vertices, if the majority of the vertices were initiay cassified as bue, this is ikey to cassify a of the vertices as bue. We hence need a compensation step to prevent the cassification from becoming biased towards one community. More precisey, that suggests the foowing approach. Define y t) v,v to be the number 15

of nonbacktracking waks of ength t that start at vertices in S and end in the directed edge v, v) minus the number of nonbacktracking waks of ength t that start at vertices in S 1 and end in v, v). Aso, define y v t) to be the overa difference between the number of nonbacktracking waks of ength t from vertices in S to v and the number of nonbacktracking waks of ength t from vertices in S 1 to v. Their vaues can be efficienty computed by means of the foowing procedure: 1. For every v, v ) EG) : If v S, set y 1) v,v = 1 Otherwise, set y 1) v,v = 1. For every 1 < t m and v, v ) EG) : Set y t) v,v = v :v,v ) EG),v v yt 1) v,v 3. For every v, v ) EG) and v G Set y m) v = v :v,v ) EG) ym) v,v One way of viewing this agorithm is that y t) v,v represents our current beief about what community v is in, disregarding any information derived from the fact that it is next to v. We start with fairy unconfident beiefs about the vertices communities, and then derive more and more confident beiefs about the vertices communities by taking our beiefs about their neighbors communities into account. 3.4 Compensation for the average vaue y 1) v,v has an average vaue of Θ1/ n) for v in community and Θ1/ n) for v in community 1. Aso, y 1) v,v has a variance of order 1. For a random v, v ) EG), v wi have an average of approximatey a/ neighbors other than v in its community and b/ neighbors other than v in the other community. So, by induction on t, we woud expect that y t) v,v woud have an average vaue of Θ a b )t / n) for v in community and Θ a b )t / n) for v in community 1. Since y t 1) v,v for different v adjacent to v, we woud aso expect that y t) v,v shoud be approximatey independent woud have an empirica a+b )t. variance of approximatey a+b )t, and thus a standard deviation of approximatey So, for t such that a b )t / n > a+b )t, we woud expect that we coud determine the community of v from y t) v,v with accuracy 1/ + Ω1). The probem with this reasoning is that the average vaue over a v, v ) EG) of y 1) v,v wi not be exacty 0. It wi aso tend to have an absoute vaue on the order of 1/ n. That means that the average vaue over a v, v ) EG) of y t) v,v wi have an absoute vaue of Θ a+b )t / n). If we hod the average vaue of y t 1) v,v fixed then that means that E[y t) v,v N 1 v )] wi have an empirica variance of Θ a+b )t /n), and thus that y t) v,v wi 16

aso have an empirica variance of at east Θ a+b )t /n). This impies that the standard deviation of y t) v,v wi aways be much greater than the difference between the average vaue of y t) v,v for v in community 1 and the average vaue of y t) v,v for v in community, which woud render attempts to cassify v based on y t) v,v ineffective. Remark 3. The simpe way to fix this woud be to add a step where we subtract the average vaue of y t) from every eement of y t) so its sum is 0 for every t. However, this does not extend easiy to the genera Stochastic Bock Mode, and we want a soution that does. In order to prevent this, we need to stop the average vaue of y t) v,v arge. It wi tend to mutipy by roughy a+b vaue of y t) v,v a+b yt 1) v,v if we pick some 1 < i m and redefine y i) v,v from getting too each time t increases by 1, so the average wi probaby be much smaer than the average vaue of y t) v,v. So, so that y i) v,v = a + b yi 1) v,v + v :v,v ) EG),v v y i 1) v,v 11) for a v, v ) EG), the average vaue of y i) v,v wi be much smaer than it woud have been. Since the difference between the average vaues over v in different communities of y t) v,v grows as Θ a b )t / n), this redefinition wi merey change the difference between the average vaues over v in different communities of y i) v,v from Θ a b )i / n) to Θ a b ) i / n a + b However, the average vaue of y i) v :v,v ) EG),v v yt 1) v,v ) a b i 1 / ) a b i 1 n) = Θ b / n). v,v wi sti be nonzero, and if we continue to set y t) for a t > i, the average vaue of y t) v,v in magnitude faster than the average difference between y t) v,v = woud resume increasing v,v for different communities of v. This creates the risk that it woud sti eventuay get too arge. So, in order to actuay fix the probem, it may be necessary to repeat the step where its magnitude is reduced. More precisey, we may have to choose severa indices t 0, t 1,...t m and redefine for each i so that y t i) v,v y t i) v,v = a + b yt i 1) v,v + v :v,v ) EG),v v y t i 1) v,v 1) for every v, v ) EG). Once we have made these modifications, it wi be the case that for sufficienty arge m, the average vaue for v in community 1 of y m) v,v wi differ from the average vaue for v in community of y m) v,v we define y m) v by a constant mutipe of the standard deviation of y m) v,v. Then, in order to simpify it to a variabe of one vertex = v :v,v ) EG) ym) v,v 17

which is sti correated with the vertex s community. However, this does not guarantee that simpy dividing the vertices into those with a positive vaue of y v m) and those with a negative vaue of y v m) wi give a usefu partition. It coud be the case that the fraction of v for which y v m) is positive is the same for both communities, but y v m) is typicay more strongy positive or ess strongy negative for v in one community than v in the other. So, we randomy assign each vertex to a set with a probabiity that scaes ineary with y v m) in order to ensure that having a higher average vaue of y v m) actuay eads to having greater representation in one of the proposed communities. Impementation detais. The above sections cover a of the key parts of how our agorithm works. However, the actua agorithm differs from what we have described in a few ways in order to make it easier to prove that it works, or in order to simpify the agorithm. First of a, we initiaize the y 0) v,v using a random vaue assigned to each v that is drawn from a norma distribution because a probabiity distribution that is an n-dimensiona norma distribution is easier to anayse than a probabiity distribution that is eveny distributed over the vertices of an n-dimensiona hypercube. Secondy, we require that our waks never repeat the same vertex within r steps for some r, rather than merey requiring that they not backtrack. This aows us to use the expected number of waks between vertices in our anaysis without worrying about the tiny probabiity that there is a dense tange in the graph with a huge number of nonbacktracking waks between its vertices. Making this modification to the agorithm requires adding an extra part to the recursion step where waks that just repeated a vertex are canceed out, specificay the second haf of step 4 of the agorithm beow. The resuting agorithm is caed the acycic beief propagation agorithm because it counts waks that do not contain any sma cyces. Thirdy, we move a of the recursion steps that compensate for the average vaue to the end of the agorithm. This is possibe because the operation that takes y t 1) as input and outputs a ist that has a vaue of a + b yt 1) v,v + v :v,v ) EG),v v y t 1) v,v for each v, v ) E commutes with the one that simpy outputs a ist that has a vaue of for each v, v ) E. The agorithm generates y m) by appying v :v,v ) EG),v v yt 1) v,v these two operations in some sequence to y 1), so we can cacuate it by appying the ater operation m m times and then appying the former operation m times. Actuay, the agorithm takes this one step farther by appying the ater operation m times and then cacuating how the resut woud have changed if it had appied the former the appropriate number of times, but it sti has the same resut. Finay, we randomy seect a sma fraction of the graph s edges at the beginning of the agorithm. Then, we require one specific step of each nonbacktracking wak to use one of the seected edges, and a of their other steps to use edges that have not been seected. Since the seected edges are neary independent of the rest of the graph, this aows us to more easiy prove that the vaues of y t 1) v,v for v adjacent to v wi not become dependent in a way that disrupts the agorithm. 18

3.5 The fina agorithm for the symmetric SBM Theorem 3. Let a and b be positive rea numbers such that a b) > a + b), and S be the -community symmetric stochastic bock mode with these parameters. There exist constants ɛ,, r, c > 0 and m = Θogn)) such that when the basic -community symmetric acycic beief propagation agorithm is run on these parameters and a random G S, the expected difference between the fraction of vertices from community 1 that are in S 1 and the fraction of vertices from community that are in S 1 is at east ɛ. The basic -community symmetric acycic beief propagation agorithm is as foows. CS ABP G, m, r,, c, λ 1 ) : 1. Find a cyces of ength r or ess in G.. For every vertex v G, randomy assign x v according to a Norma distribution with mean 0 and variance 1. 3. For each adjacent v and v, et y 1) v,v = x v, and y t) v,v = 0 for a t 0. 4. For each 1 t m, and each adjacent v, v ) EG), set y t) v,v = v :v,v ) EG),v v y t 1) v,v uness v, v is part of a cyce of ength r or ess. If it is, then et the other vertex in the cyce that is adjacent to v be v, and the ength of the cyce be r. Set y t) v,v = v :v,v ) EG),v v uness t = r. In that case, set y t) v,v = 5. For each 1 t m and v G, set y t 1) v,v v :v,v ) EG),v v y t) v = v :v,v ) EG) y t r) v,v v :v,v ) EG),v v,v v y t 1) v,v x v. y t) v,v. 6. Repeat the foowing m 3r 1 times: Simutaneousy set y v t) each v and t. 1 7. Set c = c v G ym) v ) /n. to y t) v λ 1 y t 1) v for 1 This is an inefficient way of doing this. The ony part of the resut that matters is that y m) have the correct vaue, and that can be computed more efficienty by setting y m) = m 3r 1 i=0 λ 1) i m 3r 1 ) y m i) i 19

8. Create sets of vertices S 1 and S as foows. For each vertex v, if y v m) < c, assign v to S 1. If y v m) > c, then assign v to S. Otherwise, assign v to S with probabiity 1/ + y v m) /c and S 1 otherwise. 9. Return S 1, S ). We beieve that the difference between the fraction of vertices from community 1 that this agorithm puts in S 1 and the fraction of vertices from community that this agorithm puts in S 1 wi be at east ɛ with probabiity 1 o1). However, in order to prove that we can detect communities reiaby we use the foowing sighty modified form of the agorithm. Theorem 4. Let a and b be positive rea numbers such that a b) > a + b), and S be the -community symmetric stochastic bock mode with these parameters. There exist constants ɛ,, r, c, γ > 0 and m = Θogn)) such that when the -community symmetric acycic beief propagation agorithm is run on these parameters and a random G S, the difference between the fraction of vertices from community 1 that are in S 1 and the fraction of vertices from community that are in S 1 is at east ɛ with probabiity 1 o1). The -community symmetric acycic beief propagation agorithm is as foows. CS ABP G, m, r,, c, λ 1, γ) : 1. Randomy and independenty seect each edge in G with probabiity γ. Put a of the seected edges in a set Γ, and remove them from the graph.. Find a cyces of ength r or ess in G. 3. For every vertex v G, randomy assign x v according to a Norma distribution with mean 0 and variance 1. 4. For each adjacent v and v, et y 1) v,v = x v, and y t) v,v = 0 for a t 0. 5. For each 1 t m, and each adjacent v, v ) EG), set y t) v,v = v :v,v ) EG),v v y t 1) v,v uness v, v is part of a cyce of ength r or ess. If it is, then et the other vertex in the cyce that is adjacent to v be v, and the ength of the cyce be r. Set y t) v,v = v :v,v ) EG),v v uness t = r. In that case, set y r ) v,v = y t 1) v,v v :v,v ) EG),v v y t r) v,v v :v,v ) EG),v v,v v y r 1) v,v x v. 0

6. For each 1 t m and v G, set y t) v = v :v,v ) EG) y t) v,v 7. Repeat the foowing m 3r 1 times: Simutaneousy set y v t) each v and t. 13 to y t) v λ 1 y t 1) v for 8. For each v G, set y v = v :v,v ) Γ y m) v 9. For each v G, set y v to be the sum of y v for a vertices v that are exacty n n n edges away from v. 10. Set c = c v G y v) /n. 11. Create sets of vertices S 1 and S as foows. For each vertex v, if y v < c, assign v to S 1. If y v > c, then assign v to S. Otherwise, assign v to S with probabiity 1/ + y v/c and S 1 otherwise. 1. Return S 1, S ). Remark 4. Note that the ast theorem is simpy Theorem 1 in the community symmetric case, whie the one before it foows from a sighty simpified version of the same proof. Remark 5. For the k-community symmetric stochastic bock mode, the above is argey unchanged. The key differences are that λ 1 = a+k 1)b k, λ = a b k, the requirement for the agorithm to work is that a b) > ka + k 1)b), and if the requirements are met the kcs-abp agorithm distinguishes between every pair of communities with expected accuracy at east ɛ. Remark 6. In this agorithm, the initiaization step and computing y t) for a given t both run in On) time. The canceation step runs in On og n) time if it is done the efficient way, y takes On) time to compute, and computing y takes On og n) time. Determining S 1 and S from y can aso be done in On) time, so this whoe agorithm can be run in On og n) time. 3.6 The spectra view An aternative perspective on this agorithm is the foowing. Assume for the moment that there are exacty n/ vertices in each community, and et M be the expected adjacency matrix, the matrix such that M v,v is a/n if v and v are in the same community and b/n if they are not. This matrix has an eigenvector whose entries are a 1 with eigenvaue 13 This is an inefficient way of doing this. The ony part of the resut that matters is that y m) have the correct vaue, and that can be computed more efficienty by setting y m) = m 3r 1 i=0 λ 1) i m 3r 1 ) y m i) i 1

a+b, an eigenvector whose entries are ±1 with the sign determined by the reevant vertex s community with eigenvaue a b, and a of its other eigenvaues are 0. Now, et M be the graph s actua adjacency matrix. The resut above suggests that the second eigenvector of M may have entries that are correated with the vertices communities. The probem with this reasoning is that whie M has an expected vaue of M, M ) has an expected vaue of roughy M + a+b I because for every i, j, M i,j = 1 M j,i = 1, with the resut that E[ j M i,j M j,i ] is very different from j E[M i,j ] E[M j,i ]. In other words, the square of the adjacency matrix counts waks of ength and has an expected vaue that is significanty different from the square of the expected adjacency matrix due to backtracking. In order to avoid this issue, we define the graph s nonbacktracking wak matrix W as a matrix over the vector space with an orthonorma basis consisting of a vector for each directed edge in the graph. W v1,v ),v,v ) is defined to be 1 if v = v 1 and v v 1 and 0 otherwise. In other words, it has a 1 for every case where one directed edge eads to another that is not the same edge in the other direction. Now, et w R EG) be the vector whose entries are a 1, and w R EG) be the vector such that w v 0,v 1 ) is 1 if v 0 is in community 1 and 1 if v 0 is in community. As mentioned before, for a sma t and a random v, v ) EG), there wi be an average of approximatey a+b )t directed edges t edges in front of v, v ), and approximatey a b )t more of these edges wi have ending vertices in the same community as v than in the other community on average. So, w W t w EG) a+b )t and w W t w EG) a b )t. That strongy suggests that W has eigenvectors that are correated with w and w that have eigenvaues of approximatey a+b and a b respectivey. It aso seems pausibe that W s other eigenvaues have reativey sma magnitude. If this is true, then one can gain information on which vertices of G are in each community from the second eigenvector of W. One coud simpy cacuate the second eigenvector of W directy. However, it is significanty faster to pick a random vector w and then compute W m w for some m = Θog n). The resuting vector wi be approximatey a inear combination of W s main eigenvectors. Unfortunatey, it wi be much coser to being a mutipe of its first eigenvector than its second. If we mutipy W a+b I) by the resuting vector, such as in 11), the component of the vector that is proportiona to the first eigenvector wi be mosty canceed out. However, since its eigenvaue is not exacty a+b, it wi not be canceed out competey, and might sti be too arge. Luckiy, if we instead mutipy W a+b I)m by the resuting vector for suitabe m, such as in 1), the component of the vector that is proportiona to the first eigenvaue wi be essentiay canceed out, eaving a vector that is approximatey a mutipe of W s second eigenvector, and thus correated with G s communities. We beieve that this woud succeed in detecting communities in the SBM, but in order to make it easier to prove that our agorithm works we actuay use r-nonbacktracking waks. This corresponds to using the graph s r-nonbacktracking wak matrix, which is defined as foows. Definition 5. For any r, the graph s r-nonbacktracking wak matrix, W r), is a matrix over the vector space with an orthonorma basis consisting of a vector for each directed path of ength r 1 on the graph. W r) v 1,v,...,v r),v 1,v,...,v r) is 1 if v i+1 = v i for each 1 i < r and

v 1 v r. Otherwise it is 0. In other words, W r) maps a path of ength r 1 to the sum of a paths resuting from adding another eement to the end of the path and deeting its first eement. Figure 4: The bue and red waks ead to an entry of 1 in the W 4) matrix. Performing these cacuations is essentiay what the acycic beief propagation agorithm does. From this perspective, the agorithm roughy transates to: 1. Choose y 1) randomy such that each eement is independent drawn from a Norma distribution.. For each 1 < t m, et y t) = W r) y t 1). 3. Change y m) to W r) a+b I)m y m m), where m = m 3r 1 4. For each v G, assign v to S with a probabiity that scaes ineary with the sum over a paths v 1, v,..., v r = v) of y m) v 1,v,...,v r), and assign it to S 1 otherwise 14. 5. Return S 1, S ). Even though r = might suffice to achieve the KS threshod in the SBM, the use of arger r might hep for other graph modes, e.g., having more short cyces. 3.7 The genera SBM Now, consider a graph G drawn from SBMn, p, Q/n) with arbitrary p and Q. Aso, et λ 1, λ,...λ h be the distinct eigenvaues of P Q in order of nonincreasing magnitude. If the parameters are such that vertices from different communities have different expected degrees, then one can detect communities by simpy dividing its vertices into those with above-average degree and those with beow-average degree. So, assume that the expected degree of a vertex is independent of its community. Detecting communities in the genera case runs into some obstaces that do not appy in the -community symmetric case. First of a, it is much ess cear that assigning vertices to sets randomy is a usefu start. Aso, 14 More precisey, assign it to S with a probabiity of 1/ pus the aforementioned sum divided by the root-mean-square of a such sums and then divided by a constant c 3

even if we did have reasonabe preiminary guesses of which vertex is in which community, it is not obvious how to determine a vertex s community based on the aeged communities of the vertices a fixed distance from it. For the moment, assume that for each vertex v, we have a vector x v such that we beieve v is in community i with probabiity p i + x v e i for each i, where a eements of x v are sma. Furthermore, assume that x v is generated independenty of v s neighbors. The correct beief about the probabiity that v is in each community once its neighbors are taken into account, is p + x v + 1 P Qx v λ 1 v :v,v ) E[G] up to noninear terms in the x s. So, given m sma enough that the set of vertices within m edges of v is a tree, the correct beief about what community v is in once a of the vertices within m edges of v are taken into account is p + 0 m m λ m 1 v :dv,v )=m P Q) m up to noninear terms in the x s. So, the ogica beief about the probabiity that v is in each community based ony on the preiminary guesses concerning the vertices m edges away from v is p + λ m 1 P Q) m x v. v :dv,v )=m Convenienty, this expression is inear, so if w is an eigenvector of P Q with eigenvaue λ i for i 1, then E[w P 1 e σv ] w P 1 p + λ m 1 P Q) m x v = λ m 1 λ m i v :dv,v )=m v :dv,v )=m w P 1 x v In particuar, this means that we ony need an initia estimate for w P 1 e σv for every vertex in the graph, rather than needing a fu set of beiefs about the vertices communities. Any random guesses we make wi probaby have correation Ω1/ n) with w P 1 e σv, so we can use them as a starting point. Unfortunatey, just ike in the two-community symmetric case, the graph wi run out of vertices before m becomes arge enough to ampify our beiefs enough. However, switching from a sum over a vertices v that are m edges away from v to a sum over a nonbacktracking waks of ength m ending in a vertex v fixes this probem the same way it does in the two-community symmetric case. Likewise, we can sti compute this sum by randomy dividing G s vertices between two sets S 1 and S and then using the foowing agorithm. However, needing to compensate for the average vaue is a specia case of a consideraby more compicated phenomenon. The average vaue over a v, v ) EG) of y 1) v,v w σv 4 x v

1. For every v, v ) EG) : If v S, set y 1) v,v = 1 Otherwise, set y 1) v,v = 1. For every 1 < t m and v, v ) EG) : Set y t) v,v = v :v,v ) EG),v v yt 1) v,v 3. For every v, v ) EG) and v G Set y m) v = v :v,v ) EG) ym) v,v wi typicay have a magnitude of Θ1/ n), and for genera t the average vaue over a v, v ) EG) of y t) v,v w v wi typicay have a magnitude of Θ λ t i / n). Now, et w be an eigenvector of P Q with an eigenvaue of λ i which has greater magnitude than λ i. Then the average vaue over a v, v ) EG) of y t) v,v w v wi typicay have a magnitude of Θ λ t i / n). Since this grows faster than y t) v,v w v does, it wi eventuay become arge enough to disrupt efforts to estimate w v using y t) v,v the same way the average vaue of y t) did in the -community symmetric case. In fact, the issue with the average vaue is just the subcase of this when i = 1. So, in order to dea with this, we need to compensate for each eigenvaue of P Q with magnitude greater than λ i by choosing severa indices t 0,i, t 1,i,...t m,i and redefining yt j,i ) v,v for each j so that y t j,i ) v,v = λ i y t j,i 1) v,v + v :v,v ) EG),v v y t j,i 1) v,v for every v, v ) EG). Assuming that this is done, y 1) v,v has a variance of approximatey 1, and then y t) v,v has a variance of roughy λ t 1. It becomes possibe to determine which community v is in with accuracy nontriviay greater than that obtained by guessing randomy based on y t) v,v when the expected difference between its vaues for v in different communities is within a constant factor of its standard deviation. In other words, t needs to be arge enough that λ t i / n is significant reative to λ t 1. If λ 1 λ i then this wi never happen, so the agorithm requires that λ i > λ 1. The genera acycic beief propagation agorithm is amost the same as the -community symmetric version. However, it takes a ist of eignevectors with magnitudes at east as arge as the one that is being focused on as input instead of just λ 1. Aso, the step compensating for arger eigenvaues is changed from Repeat the foowing m 3r 1 times: Simutaneousy set y v t) to y v t) λ 1 y v t 1) for each v and t to For each i < i, repeat the 5

foowing m r r+1)i times: Simutaneousy set y v t) to y v t) Its effectiveness is described by the foowing theorem. λ i y t 1) v for each v and t. Theorem 5. Let p 0, 1) k with p = 1, Q be a symmetric matrix with nonnegative entries, P be the diagona matrix such that P i,i = p i, and λ 1,..., λ h be the eigenvaues of P Q in order of nonincreasing magnitude. Aso, et s = 3 if λ = λ 3 and s = otherwise. If λ > λ 1 then there exist constants ɛ,, r, c, γ > 0 and m = Θogn)) such that when the acycic beief propagation agorithm is run on these parameters and a random G SBMn, p, Q/n), with probabiity 1 o1) there exist σ and σ such that the difference between the fraction of vertices from community σ that are in S 1 and the fraction of vertices from community σ that are in S 1 is at east ɛ. The agorithm can be run in On og n) time. Remark 7. This theorem coud aternatey have stated that there exists ɛ such that for any two communities σ and σ such that there exists an eigenvector w of P Q with eigenvaue λ s with w σ w σ, the expected difference between the fraction of vertices from community σ that are in S 1 and the fraction of vertices from community σ that are in S 1 is at east ɛ. 3.8 Aternatives There are aso a coupe of other variants of these ideas that may be usefu for community detection. For instance, if we pick a suitabe r and then define Σ to be the n n symmetric matrix such that Σ v,v is the number of nonbacktracking waks of ength r between v and v, we suspect that Σ s eigenvector of second argest magnitude wi have entries that are correated with the correspnding vertices communities. Like in the standard case, we expect that we coud get an approximation of this eigenvector by taking a random vector v and then computing Σ λ 1 I)m Σ m m v for suitabe m and m where λ 1 is an estimate of Σ s argest eigenvaue. We can compute Σ as foows. First, et Σ t be the n n matrix such that Σ t v,v is the number of nonbacktracking waks of ength t between v and v. Then Σ 0 = I, Σ 1 is the graph s adjacency matrix, and Σ v,v is equa to the number of shared neighbors v and v have for a v and v. For every t >, we have that Σ t = Σ 1 Σ t 1 D Σ t, where D is the diagona matrix such that D v,v is one ess than the degree of v for a v. This can be used to efficienty compute Σ = Σ r. Aso, instead of prohibiting repeating a vetex within r steps, we coud address the issue of tanges by dividing G s edges between sets E 0,..., E m for suitabe m such that most of the edges are assigned to E 0 and the rest are assigned to one of the others at random. Then we count nonbacktracking waks with the restriction that the edge r of the wak must be from E 1, edge r must be from E and so on, whie a other edges must be from E 0 for suitabe r. The periodic prohibitions on using edges from E 0 woud force the wak to eave any tange it had been in, whie the fact that most of the edges are chosen from E 0 prevents the restriction from reducing the number of waks too severey. 6

4 Crossing the KS threshod: proof technique Reca that the agorithm sampes a typica custering uniformy at random in the typica set T δ G) ={σ Baancen, k) : k {G u,v : u, v) i,j [k],i<j ) [n] {G u,v : u, v) s.t. σ u = i, σ v = i} an 1 δ), k ) [n] s.t. σ u = i, σ v = j} bnk 1) 1 + δ)}, k where previous two inequaities appy to the case a > b, and are fipped if a < b. A first question is to estimate the ikeihood that a bad custering, i.e., one that has an overap that is cose to 1/k, beongs to the typica set. This means the probabiity that a custering which spits each of the true custer into k groups beonging to each community sti manages to keep the right proportions of edges inside and across the custers. This is unikey to take pace, but we care about the exponent of this rare event probabiity. Figure 5: A bad custering roughy spits each community equay among the k communities. Each pair of nodes connects with probabiity a/n among vertices of same communities i.e., same coor groups, pain ine connections), and b/n across communities i.e., different coor groups, dashed ine connections). Ony some connections are dispayed in the Figure to ease the visuaization. As iustrated in Figure 5, the number of edges that are contained in the custers of a 7

bad custering is roughy distributed as the sum of two Binomia random variabes, n E in Bin k, a ) k 1)n + Bin n k, b ), 13) n where we use to emphasize that this is an approximation. Note that the expectation of the above distribution is n a+k 1)b k k. In contrast, the true custering woud have a distribution given by Bin n k, a an n ), which woud give an expectation of k. In turn, the number of edges that are crossing the custers of a bad custering is roughy distributed as n k 1) E out Bin k, a ) n k 1) + Bin n k, b ), 14) n which has an expectation of nk 1) k the above repaced by Bin n k 1) k, b n a+k 1)b k. In contrast, the true custering woud have ), and an expectation of bnk 1) k. Thus, we need to estimate the rare event that the Binomia sum deviates from its expectations. Whie there is a arge ist of bounds on Binomia tai events, the number of trias here is quadratic in n and the success bias decays ineary in n, which require particuar care to ensure tight bounds. We derive these by hand in Lemma 0, which gives for a bad custering σ, where A = a + bk 1) P{σ is typica} exp n ) k A n 15) k a + k 1)b) + a bk 1) n a + n b. 16) One can then use a union bound, since there are at most k n bad custerings, to obtain a first regime where no bad custering is typica with high probabiity. This aready aows us to cross the KS threshod in some regime of the parameters when k 5. However, this does not interpoate the correct behavior of the information-theoretic bound in the extreme regime of b = 0. In fact, for b = 0, the union bound requires a > k to impy no bad typica custering with high probabiity, whereas as soon as a > k, an agorithm that simpy separates the two giants in SBMn, k, a, 0) and assigns communities uniformy at random for the other vertices soves detection. Thus when a k, k], the union bound is oose. To remediate to this, we next take into account the topoogy of the SBM graph. Since the agorithm sampes a typica custering, we ony need the number of bad and typica custerings to be sma compared to the tota number of typica custerings, in expectation. Thus, we seek to better estimate the tota number of typica custerings. The first topoogica property of the SBM graph that we expoit is the arge fraction of nodes that are in tree-ike components outside of the giant. Conditioned on being on a tree, the SBM abes are distributed as in a broadcasting probem on a Gaton-Watson) tree. Specificay, for a uniformy drawn root node X, each edge in the tree acts as a symmetric channe, producing the output Y = X + Z mod k, 17) 8

where Z ν := a a + k 1)b, b a + k 1)b,..., ) b, 18) a + k 1)b and this propagates down the tree. Thus, abeing the nodes in the trees according to the above distribution and freezing the giant to the correct abes eads to a typica custering with high probabiity. Figure 6: Iustration of the topoogy of SBMn, k, a, b) for k =. A giant component covering the two communities takes pace when d = a+k 1)b k > 1; a inear fraction of vertices beong to isoated trees incuding isoate vertices), and a inear fraction of vertices in the giant are on panted trees. The foowing is used to estimate the size of the typica set in Section 6... For isoated trees, sampe a bit uniformy at random for a vertex green vertices) and propagate the bit according to the symmetric channe with fip probabiity b/a + k 1)b) pain edges do not fip whereas dashed edges fip). For panted trees, do the same but freeze the root bit to its true vaue. We hence need to count the number of nodes T and edges M that beong to such trees in the SBM graph. This is done in a series of emmas in Section 6.., and requires combinatoria estimates simiar to those carried for the Erdős-Rényi case [ER60]. The main part is to show that the fraction of such nodes and edges concentrates around T/n τ d M/n τ 1 τ ), 19) d, 0) 9