Spectral thresholds in the bipartite stochastic block model

Similar documents
Spectral thresholds in the bipartite stochastic block model

Reconstruction in the Generalized Stochastic Block Model

Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models

Statistical and Computational Phase Transitions in Planted Models

How Robust are Thresholds for Community Detection?

Community detection in stochastic block models via spectral methods

Reconstruction in the Sparse Labeled Stochastic Block Model

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

Spectral Partitiong in a Stochastic Block Model

ISIT Tutorial Information theory and machine learning Part II

Predicting Phase Transitions in Hypergraph q-coloring with the Cavity Method

On the Complexity of Random Satisfiability Problems with Planted Solutions

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY

On the Complexity of Random Satisfiability Problems with Planted Solutions

Independence and chromatic number (and random k-sat): Sparse Case. Dimitris Achlioptas Microsoft

Consistency Thresholds for the Planted Bisection Model

How robust are reconstruction thresholds for community detection?

The non-backtracking operator

Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting

Matrix estimation by Universal Singular Value Thresholding

SPIN SYSTEMS: HARDNESS OF APPROXIMATE COUNTING VIA PHASE TRANSITIONS

The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations

Algorithmic Barriers from Phase Transitions

Lecture 8: February 8

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Advanced Algorithms 南京大学 尹一通

Solving Random Satisfiable 3CNF Formulas in Expected Polynomial Time

Quantum walk algorithms

Chasing the k-sat Threshold

Approximation Algorithms

Spectral Redemption: Clustering Sparse Networks

Algorithms Reading Group Notes: Provable Bounds for Learning Deep Representations

PCP Theorem and Hardness of Approximation

Community Detection and Stochastic Block Models: Recent Developments

Local MAX-CUT in Smoothed Polynomial Time

SPARSE RANDOM GRAPHS: REGULARIZATION AND CONCENTRATION OF THE LAPLACIAN

Phase Transitions in Random Discrete Structures

Limitations in Approximating RIP

Benchmarking recovery theorems for the DC-SBM

Benchmarking recovery theorems for the DC-SBM

Lecture 21 (Oct. 24): Max Cut SDP Gap and Max 2-SAT

arxiv: v1 [math.st] 26 Jan 2018

A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities

Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

MIXING TIMES OF RANDOM WALKS ON DYNAMIC CONFIGURATION MODELS

The giant component in a random subgraph of a given graph

Clustering from Sparse Pairwise Measurements

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

Random Graph Coloring

Math 391: Midterm 1.0 Spring 2016

CS 301: Complexity of Algorithms (Term I 2008) Alex Tiskin Harald Räcke. Hamiltonian Cycle. 8.5 Sequencing Problems. Directed Hamiltonian Cycle

Lecture 20: Course summary and open problems. 1 Tools, theorems and related subjects

Extreme eigenvalues of Erdős-Rényi random graphs

Linear inverse problems on Erdős-Rényi graphs: Information-theoretic limits and efficient recovery

COMP Analysis of Algorithms & Data Structures

NP-complete Problems

The condensation phase transition in random graph coloring

1 Adjacency matrix and eigenvalues

ON THE QUALITY OF SPECTRAL SEPARATORS

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Random matrices: A Survey. Van H. Vu. Department of Mathematics Rutgers University

Approximation & Complexity

Lecture 06 01/31/ Proofs for emergence of giant component

XVI International Congress on Mathematical Physics

Label Cover Algorithms via the Log-Density Threshold

Algorithms and Theory of Computation. Lecture 22: NP-Completeness (2)

CMSC Discrete Mathematics FINAL EXAM Tuesday, December 5, 2017, 10:30-12:30

Unique Games Conjecture & Polynomial Optimization. David Steurer

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018

Eigenvalues, random walks and Ramanujan graphs

CS264: Beyond Worst-Case Analysis Lectures #9 and 10: Spectral Algorithms for Planted Bisection and Planted Clique

What can be sampled locally?

A spectral technique for random satisfiable 3CNF formulas

arxiv: v2 [cs.ds] 3 Oct 2017

Community Detection. Data Analytics - Community Detection Module

PCPs and Inapproximability Gap-producing and Gap-Preserving Reductions. My T. Thai

Non-Approximability Results (2 nd part) 1/19

Reconstruction for Models on Random Graphs

Clustering Algorithms for Random and Pseudo-random Structures

Mixing time and diameter in random graphs

Random Lifts of Graphs

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Graph Partitioning. Under the Spectral Lens

CS369N: Beyond Worst-Case Analysis Lecture #4: Probabilistic and Semirandom Models for Clustering and Graph Partitioning

P, NP, NP-Complete, and NPhard

Information Recovery from Pairwise Measurements

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

Tight Bounds For Random MAX 2-SAT

Spectral gap in bipartite biregular graphs and applications

Lecture 18: More NP-Complete Problems

Lecture 8: Statistical vs Computational Complexity Tradeoffs via SoS

Notes for Lecture 2. Statement of the PCP Theorem and Constraint Satisfaction

Goldreich s PRG: Evidence for near-optimal polynomial stretch

the configuration space of Branched polymers R. Kenyon, P. Winkler

A reverse Sidorenko inequality Independent sets, colorings, and graph homomorphisms

Approximating MAX-E3LIN is NP-Hard

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Transcription:

Spectral thresholds in the bipartite stochastic block model Laura Florescu and Will Perkins NYU and U of Birmingham September 27, 2016 Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 1 / 30

Stochastic Block Model Figure: Red edges added with P = p and blue edges with P = q. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 2 / 30

Community detection Goal: Detect communities in networks. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 3 / 30

Stochastic Block Model Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 4 / 30

Entries are not colored Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 5 / 30

Nor ordered Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 6 / 30

Stochastic Block Model First introduced by Holland, Laskey, Leinhardt in 1983. Motivation: discover communities in large networks. Theorem (Boppana, Dyer/Frieze, Snijders/Nowicki, Condon/Karp, McSherry, Bickel/Chen, etc) There are efficient algorithms for exactly recovering the true colors, provided that p q is large enough as n. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 7 / 30

Bipartite Stochastic Block Model Figure: Bipartite stochastic model on V 1 and V 2. Red edges added with P = δp(n 1, n 2 ) and blue edges with P = (2 δ)p(n 1, n 2 ). Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 8 / 30

SBM Goal: get the planted assignment σ (on V 1 for bipartite stochastic model) Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 9 / 30

SBM Goal: get the planted assignment σ (on V 1 for bipartite stochastic model) Detection: compute v that agrees with σ on 1/2 + ɛ fraction of vertices Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 9 / 30

SBM Goal: get the planted assignment σ (on V 1 for bipartite stochastic model) Detection: compute v that agrees with σ on 1/2 + ɛ fraction of vertices Recovery: compute v that agrees with σ on 1 o(1) fraction of vertices Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 9 / 30

Background Intermediate step in recovering solutions in planted problems [Feldman, Perkins, Vempala 14]. planted constraint satisfaction problems (CSP) Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 10 / 30

Background Intermediate step in recovering solutions in planted problems [Feldman, Perkins, Vempala 14]. planted constraint satisfaction problems (CSP) Reducing planted problems on n variables will give vertex sets of size n 1 = n, n 2 = n k 1. (n 2 n 2 ) Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 10 / 30

Unified Planted k-csp model Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 11 / 30

Unified Planted k-csp model Definition (Feldman-Perkins-Vempala 14) Given a planting distribution Q : {±1} k [0, 1], Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 11 / 30

Unified Planted k-csp model Definition (Feldman-Perkins-Vempala 14) Given a planting distribution Q : {±1} k [0, 1], and an assignment σ {±1} n, Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 11 / 30

Unified Planted k-csp model Definition (Feldman-Perkins-Vempala 14) Given a planting distribution Q : {±1} k [0, 1], and an assignment σ {±1} n, define the random constraint satisfaction problem F Q,σ (n, m) Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 11 / 30

Unified Planted k-csp model Definition (Feldman-Perkins-Vempala 14) Given a planting distribution Q : {±1} k [0, 1], and an assignment σ {±1} n, define the random constraint satisfaction problem F Q,σ (n, m) by drawing m k-clauses from C k (the set of all k-tuples) independently according to Q(σ(C)) Q σ (C) = C C k Q(σ(C )) Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 11 / 30

Unified Planted k-csp model Definition (Feldman-Perkins-Vempala 14) Given a planting distribution Q : {±1} k [0, 1], and an assignment σ {±1} n, define the random constraint satisfaction problem F Q,σ (n, m) by drawing m k-clauses from C k (the set of all k-tuples) independently according to Q(σ(C)) Q σ (C) = C C k Q(σ(C )) where σ(c) is the vector of values that σ assigns to the k-tuple of literals comprising C. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 11 / 30

Planted random k-sat and Goldreich PRG Planted random k-sat: Form a truth assignment φ of literals, then select each clause independently from the k-tuples of literals where at least one literal is set to 1 by φ. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 12 / 30

Planted random k-sat and Goldreich PRG Planted random k-sat: Form a truth assignment φ of literals, then select each clause independently from the k-tuples of literals where at least one literal is set to 1 by φ. Goldreich PRG: also add a 0/1, depending on a predicate evaluated on literals. (cryptography) Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 12 / 30

Planted random k-sat and Goldreich PRG Planted random k-sat: Form a truth assignment φ of literals, then select each clause independently from the k-tuples of literals where at least one literal is set to 1 by φ. Goldreich PRG: also add a 0/1, depending on a predicate evaluated on literals. (cryptography) Feldman, Perkins, Vempala 14 gave a reduction of above and others to the BSBM. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 12 / 30

Information theory threshold When p = a/n and q = b/n Theorem (Mossel, Neeman, Sly, 2012) There is a test to distinguish the partition that succeeds with high probability if and only if a + b > 2 and (a b) 2 > 2(a + b). Proves conjecture of [Decelle, Krzakala, Moore, Zdeborova 13]. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 13 / 30

Computational threshold Dyer, Frieze 1989 p = na > q = nb fixed Condon, Karp 2001 a b n 1/2 McSherry 2001 a b b log n Coja-Oghlan 2010 a b b Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 14 / 30

Computational threshold Dyer, Frieze 1989 p = na > q = nb fixed Condon, Karp 2001 a b n 1/2 McSherry 2001 a b b log n Coja-Oghlan 2010 a b b Massoulié 2013 and Mossel, Neeman, Sly 2013 - detection possible and efficient (a b) 2 > 2(a + b). Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 14 / 30

Computational threshold Dyer, Frieze 1989 p = na > q = nb fixed Condon, Karp 2001 a b n 1/2 McSherry 2001 a b b log n Coja-Oghlan 2010 a b b Massoulié 2013 and Mossel, Neeman, Sly 2013 - detection possible and efficient (a b) 2 > 2(a + b). Ingenious spectral methods Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 14 / 30

Previous work MNS Idea: nbhd of vertex in G(n, a/n, b/n) looks like a random labelled tree, where each child gives birth to Pois(a) vertices of same type, Pois(b) vertices of different type Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 15 / 30

Previous work MNS Idea: nbhd of vertex in G(n, a/n, b/n) looks like a random labelled tree, where each child gives birth to Pois(a) vertices of same type, Pois(b) vertices of different type show that conditioned on the labels of the bdry of the tree, the label of root is asymp indep of the rest of graph Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 15 / 30

Binary symmetric broadcast model T : Galton-Watson tree with mean offspring distribution mean b. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 16 / 30

Binary symmetric broadcast model T : Galton-Watson tree with mean offspring distribution mean b. Root R labeled uniformly +1/ 1, each child takes parent s label with P = 1 η and opposite label with P = η. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 16 / 30

Binary symmetric broadcast model T : Galton-Watson tree with mean offspring distribution mean b. Root R labeled uniformly +1/ 1, each child takes parent s label with P = 1 η and opposite label with P = η. Goal: reconstruct value of R from labels at level n. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 16 / 30

Binary symmetric broadcast model T : Galton-Watson tree with mean offspring distribution mean b. Root R labeled uniformly +1/ 1, each child takes parent s label with P = 1 η and opposite label with P = η. Goal: reconstruct value of R from labels at level n. Theorem (Evans, Kenyon, Peres, Schulman 00) Probability of correct reconstruction of value of R tends to 1 2 as n if (1 2η) 2 p c (T ), where p c (T ) is the critical probability for percolation on T. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 16 / 30

Binary symmetric broadcast model T : Galton-Watson tree with mean offspring distribution mean b. Root R labeled uniformly +1/ 1, each child takes parent s label with P = 1 η and opposite label with P = η. Goal: reconstruct value of R from labels at level n. Theorem (Evans, Kenyon, Peres, Schulman 00) Probability of correct reconstruction of value of R tends to 1 2 as n if (1 2η) 2 p c (T ), where p c (T ) is the critical probability for percolation on T. Can think of p c (T ) as the edge density at which the tree is connected. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 16 / 30

Binary symmetric broadcast model T : Galton-Watson tree with mean offspring distribution mean b. Root R labeled uniformly +1/ 1, each child takes parent s label with P = 1 η and opposite label with P = η. Goal: reconstruct value of R from labels at level n. Theorem (Evans, Kenyon, Peres, Schulman 00) Probability of correct reconstruction of value of R tends to 1 2 as n if (1 2η) 2 p c (T ), where p c (T ) is the critical probability for percolation on T. Can think of p c (T ) as the edge density at which the tree is connected. trees with offspring distribution Pois( a+b a 2 ) and take 1 η = a+b. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 16 / 30

Binary symmetric broadcast model T : Galton-Watson tree with mean offspring distribution mean b. Root R labeled uniformly +1/ 1, each child takes parent s label with P = 1 η and opposite label with P = η. Goal: reconstruct value of R from labels at level n. Theorem (Evans, Kenyon, Peres, Schulman 00) Probability of correct reconstruction of value of R tends to 1 2 as n if (1 2η) 2 p c (T ), where p c (T ) is the critical probability for percolation on T. Can think of p c (T ) as the edge density at which the tree is connected. trees with offspring distribution Pois( a+b a 2 ) and take 1 η = a+b. Then threshold reduces to (a b) 2 2(a + b). Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 16 / 30

Previous work - Spectral methods Applying some classical results to bipartite model using spectrum with p = O(1/n 1 ) recovers partition Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 17 / 30

Previous work - Spectral methods Applying some classical results to bipartite model using spectrum with p = O(1/n 1 ) recovers partition typical analysis of spectral algos: 2nd singular value > spectral norm of noise matrix M EM; Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 17 / 30

Previous work - Spectral methods Applying some classical results to bipartite model using spectrum with p = O(1/n 1 ) recovers partition typical analysis of spectral algos: 2nd singular value > spectral norm of noise matrix M EM; here λ 2 (EM) = Θ(p n 1 n 2 ), norm of noise M EM = Θ( pn 2 ). Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 17 / 30

Previous work - Spectral methods Applying some classical results to bipartite model using spectrum with p = O(1/n 1 ) recovers partition typical analysis of spectral algos: 2nd singular value > spectral norm of noise matrix M EM; here λ 2 (EM) = Θ(p n 1 n 2 ), norm of noise M EM = Θ( pn 2 ). Feldman, Perkins, Vempala 14: subsampled power iteration recovers partition whp with p = Õ((n 1n 2 ) 1/2 ) Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 17 / 30

Questions 1 Here λ 2 < M EM. Is SVD doomed for p 1/n 1? Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 18 / 30

Questions 1 Here λ 2 < M EM. Is SVD doomed for p 1/n 1? 2 What is the optimal threshold for detection in BSBM? Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 18 / 30

Our results - sharp reconstruction/impossibility Theorem On the other hand, if n 2 n 1 and p 1 (δ 1) 2 n 1 n 2, then no algorithm can detect the partition. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 19 / 30

Our results - sharp reconstruction/impossibility Theorem On the other hand, if n 2 n 1 and p 1 (δ 1) 2 n 1 n 2, then no algorithm can detect the partition. Idea: Couple to a broadcast model on a multi-type Galton Watson tree. Show that conditioned on the labels of a log n bdry of the tree, the label of root is asymp indep of the rest of graph. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 19 / 30

Our results - sharp reconstruction/impossibility Theorem Let n 2 n 1. Then there is a polynomial-time algorithm that detects the partition V 1 = A 1 B 1 if p > 1 + ɛ (δ 1) 2 n 1 n 2 for any fixed ɛ > 0. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 20 / 30

Our results - sharp reconstruction/impossibility Theorem Let n 2 n 1. Then there is a polynomial-time algorithm that detects the partition V 1 = A 1 B 1 if p > 1 + ɛ (δ 1) 2 n 1 n 2 for any fixed ɛ > 0. Idea: reduce to SBM on graph on V 1 induced by paths of length 2 in bipartite graph. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 20 / 30

Proof sketch Reduce to a graph G by replacing each path of length 2 from V 1 to V 2 back to V 1 with a single edge between the endpoints in V 1. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 21 / 30

Proof sketch Reduce to a graph G by replacing each path of length 2 from V 1 to V 2 back to V 1 with a single edge between the endpoints in V 1. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 21 / 30

Proof sketch Reduce to a graph G by replacing each path of length 2 from V 1 to V 2 back to V 1 with a single edge between the endpoints in V 1. E = (1+ɛ)2 n 1 (1 + o(1)) (δ 1) 4 Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 21 / 30

Proof sketch Reduce to a graph G by replacing each path of length 2 from V 1 to V 2 back to V 1 with a single edge between the endpoints in V 1. E = (1+ɛ)2 n 1 (1 + o(1)) (δ 1) 4 Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 21 / 30

Proof sketch Reduce to a graph G by replacing each path of length 2 from V 1 to V 2 back to V 1 with a single edge between the endpoints in V 1. E = (1+ɛ)2 n 1 (1 + o(1)) (δ 1) 4 Now we can compute p a = P[e = (u, v) σ(u) = σ(v)] and p b = P[e = (u, v) σ(u) σ(v)] Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 21 / 30

Proof sketch Reduce to a graph G by replacing each path of length 2 from V 1 to V 2 back to V 1 with a single edge between the endpoints in V 1. E = (1+ɛ)2 n 1 (1 + o(1)) (δ 1) 4 Now we can compute p a = P[e = (u, v) σ(u) = σ(v)] and p b = P[e = (u, v) σ(u) σ(v)] Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 21 / 30

Proof sketch Reduce to a graph G by replacing each path of length 2 from V 1 to V 2 back to V 1 with a single edge between the endpoints in V 1. E = (1+ɛ)2 n 1 (1 + o(1)) (δ 1) 4 Now we can compute p a = P[e = (u, v) σ(u) = σ(v)] and p b = P[e = (u, v) σ(u) σ(v)] Now compute a and b accordingly: a = (1 + ɛ)(2 2δ + δ2 ) (δ 1) 4 (1 + o(1)) b = (1 + ɛ)(2δ δ2 ) (δ 1) 4 (1 + o(1)) Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 21 / 30

Proof sketch Reduce to a graph G by replacing each path of length 2 from V 1 to V 2 back to V 1 with a single edge between the endpoints in V 1. E = (1+ɛ)2 n 1 (1 + o(1)) (δ 1) 4 Now we can compute p a = P[e = (u, v) σ(u) = σ(v)] and p b = P[e = (u, v) σ(u) σ(v)] Now compute a and b accordingly: a = (1 + ɛ)(2 2δ + δ2 ) (δ 1) 4 (1 + o(1)) b = (1 + ɛ)(2δ δ2 ) (δ 1) 4 (1 + o(1)) Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 21 / 30

Proof sketch Reduce to a graph G by replacing each path of length 2 from V 1 to V 2 back to V 1 with a single edge between the endpoints in V 1. E = (1+ɛ)2 n 1 (1 + o(1)) (δ 1) 4 Now we can compute p a = P[e = (u, v) σ(u) = σ(v)] and p b = P[e = (u, v) σ(u) σ(v)] Now compute a and b accordingly: a = (1 + ɛ)(2 2δ + δ2 ) (δ 1) 4 (1 + o(1)) b = (1 + ɛ)(2δ δ2 ) (δ 1) 4 (1 + o(1)) Apply criterion (a b) 2 (1 + ɛ)2(a + b). Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 21 / 30

Implications for planted k-sat - detection in the block model exhibits a sharp threshold at m = Θ(n r/2 ) hyperedges/clauses Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 22 / 30

Implications for planted k-sat - detection in the block model exhibits a sharp threshold at m = Θ(n r/2 ) hyperedges/clauses Definition The distribution complexity r of the planting distribution Q is the smallest r > 0 so that Q is an (r 1)-wise independent distribution on {±} k but not r-wise independent. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 22 / 30

Spectral algorithms Standard SVD: Compute left singular vector of M (adjacency matrix) corresponding to 2nd singular value, round signs to get v; compare σ and v Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 23 / 30

Spectral algorithms Standard SVD: Compute left singular vector of M (adjacency matrix) corresponding to 2nd singular value, round signs to get v; compare σ and v Diagonal deletion SVD: Set diagonal entries of MM T to 0, compute second eigenvector, round signs to get v; compare σ and v Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 23 / 30

Our results - spectral Theorem Let n 2 n 1, with n 1. Then 1 If p D > (n 1 n 2 ) 1/2, then whp the diagonal deletion SVD algorithm recovers the partition V 1 = A 1 B 1. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 24 / 30

Our results - spectral Theorem Let n 2 n 1, with n 1. Then 1 If p D > (n 1 n 2 ) 1/2, then whp the diagonal deletion SVD algorithm recovers the partition V 1 = A 1 B 1. 2 If p V > n 2/3 the partition. 1 n 1/3 2, then whp the standard SVD algorithm recovers Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 24 / 30

Our results - spectral Theorem Let n 2 n 1, with n 1. Then 1 If p D > (n 1 n 2 ) 1/2, then whp the diagonal deletion SVD algorithm recovers the partition V 1 = A 1 B 1. 2 If p V > n 2/3 the partition. 1 n 1/3 2, then whp the standard SVD algorithm recovers Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 24 / 30

Our results - spectral Theorem Let n 2 n 1, with n 1. Then 1 If p D > (n 1 n 2 ) 1/2, then whp the diagonal deletion SVD algorithm recovers the partition V 1 = A 1 B 1. 2 If p V > n 2/3 the partition. 1 n 1/3 2, then whp the standard SVD algorithm recovers When n 2 = n 2, p D n 3/2, p V n 4/3. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 24 / 30

Timeline Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 25 / 30

Our results 1 0.9 Plot of correlation as a function of p(n1,n2), n1= 1000, n2=100000, delta=0.2 v dd 0.335 ess Plot of the top eigenvalues of MM T for delta=0.5 and delta=1 for the various regimes p(n1,n2) delta=0.5 delta=1 0.33 0.8 0.7 0.325 0.6 0.32 correlation 0.5 0.4 0.3 top 10 normalized eigenvalues 0.315 0.31 0.305 0.2 0.3 0.1 0.295 0 0 1 2 p(n1,n2) 3 4 0.29 x 10 0.15 0.2 0.25 0.3 0.35 0.4 0.45 p1=(n1*n2)^( 1/2)*log(n1), p2=n1^( 2/3)*n2^( 2/3), p3=n1^( 2/3)*n2^( 1/3)*log(n1) Figure: Correlations of computed vectors with planted vector Figure: Eigenvalue separation Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 26 / 30

Thresholds origins DiagD: B = MM T D V, SVD: B = B + D V Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 27 / 30

Thresholds origins DiagD: B = MM T D V, SVD: B = B + D V σ: partition, e 2 (B): second largest eigenvector of B, D V : degrees. Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 27 / 30

Thresholds origins DiagD: B = MM T D V, SVD: B = B + D V σ: partition, e 2 (B): second largest eigenvector of B, D V : degrees. DiagD: sin(b, EB) C B EB SVD: sin(b, EB ) C B EB + D V ED V λ 2 λ 2 by Sin Theta Theorem - sin of angle between eigenvector spaces norm/eigenvalue gap Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 27 / 30

Thresholds origins DiagD: B = MM T D V, SVD: B = B + D V σ: partition, e 2 (B): second largest eigenvector of B, D V : degrees. DiagD: sin(b, EB) C B EB SVD: sin(b, EB ) C B EB + D V ED V λ 2 λ 2 by Sin Theta Theorem - sin of angle between eigenvector spaces norm/eigenvalue gap C n1/2 (δ 1) 2 n 1 n 2 ; C n1/2 p 2 1 n 1/2 2 p (2nd λ asymptotics) 1 n 1/2 2 p+(c n 2 p log n 1 ) (δ 1) 2 n 1 n 2 p 2 Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 27 / 30

Thresholds origins DiagD: B = MM T D V, SVD: B = B + D V σ: partition, e 2 (B): second largest eigenvector of B, D V : degrees. DiagD: sin(b, EB) C B EB SVD: sin(b, EB ) C B EB + D V ED V λ 2 λ 2 by Sin Theta Theorem - sin of angle between eigenvector spaces norm/eigenvalue gap C n1/2 (δ 1) 2 n 1 n 2 ; C n1/2 p 2 1 n 1/2 2 p (2nd λ asymptotics) ( ( ) = O 1 log n 1 ); = O 1 log n 1 1 n 1/2 2 p+(c n 2 p log n 1 ) (δ 1) 2 n 1 n 2 p 2 Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 27 / 30

Thresholds origins DiagD: B = MM T D V, SVD: B = B + D V σ: partition, e 2 (B): second largest eigenvector of B, D V : degrees. DiagD: sin(b, EB) C B EB SVD: sin(b, EB ) C B EB + D V ED V λ 2 λ 2 by Sin Theta Theorem - sin of angle between eigenvector spaces norm/eigenvalue gap C n1/2 (δ 1) 2 n 1 n 2 ; C n1/2 p 2 1 n 1/2 2 p (2nd λ asymptotics) ( ( ) = O 1 log n 1 ); = O 1 log n 1 1 n 1/2 2 p+(c n 2 p log n 1 ) (δ 1) 2 n 1 n 2 p 2 e 2 (B) σ/ n 1 = O(log 1 n 1 ) (by special case of Sin Theta Theorem). Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 27 / 30

Thresholds origins DiagD: B = MM T D V, SVD: B = B + D V σ: partition, e 2 (B): second largest eigenvector of B, D V : degrees. DiagD: sin(b, EB) C B EB SVD: sin(b, EB ) C B EB + D V ED V λ 2 λ 2 by Sin Theta Theorem - sin of angle between eigenvector spaces norm/eigenvalue gap C n1/2 (δ 1) 2 n 1 n 2 ; C n1/2 p 2 1 n 1/2 2 p (2nd λ asymptotics) ( ( ) = O 1 log n 1 ); = O 1 log n 1 1 n 1/2 2 p+(c n 2 p log n 1 ) (δ 1) 2 n 1 n 2 p 2 e 2 (B) σ/ n 1 = O(log 1 n 1 ) (by special case of Sin Theta Theorem). Conclude by rounding signs of e 2 (B). Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 27 / 30

Conclusions Theorem Can efficiently detect partition in BSBM if p > 1+ɛ (δ 1) 2 n 1 n 2 Cannot detect if p 1 (δ 1) 2 n 1 n 2 Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 28 / 30

Conclusions Theorem Can efficiently detect partition in BSBM if p > 1+ɛ (δ 1) 2 n 1 n 2 Cannot detect if p 1 (δ 1) 2 n 1 n 2 spectral method still works if λ 2 norm of noise matrix Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 28 / 30

Conclusions Theorem Can efficiently detect partition in BSBM if p > 1+ɛ (δ 1) 2 n 1 n 2 Cannot detect if p 1 (δ 1) 2 n 1 n 2 spectral method still works if λ 2 norm of noise matrix modifying adjacency matrix improves recovery significantly Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 28 / 30

Open problems apply Diagonal Deletion type of algorithm for improvement over SVD in other problems? Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 29 / 30

Open problems apply Diagonal Deletion type of algorithm for improvement over SVD in other problems? sharper detection thresholds for planted k-sat? Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 29 / 30

Thank you! Laura Florescu and Will Perkins Spectral thresholds in the bipartite stochastic block model 30 / 30