The non-backtracking operator

Similar documents
A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities

THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

Reconstruction in the Generalized Stochastic Block Model

Community detection in stochastic block models via spectral methods

Spectral Redemption: Clustering Sparse Networks

How Robust are Thresholds for Community Detection?

Learning latent structure in complex networks

Part 1: Les rendez-vous symétriques

Clustering from Sparse Pairwise Measurements

On convergence of Approximate Message Passing

Statistical and Computational Phase Transitions in Planted Models

Community Detection. Data Analytics - Community Detection Module

Algebraic Representation of Networks

Statistical Physics on Sparse Random Graphs: Mathematical Perspective

Spectral Partitiong in a Stochastic Block Model

Machine Learning for Data Science (CS4786) Lecture 24

Physics on Random Graphs

Phase Transitions (and their meaning) in Random Constraint Satisfaction Problems

Pan Zhang Ph.D. Santa Fe Institute 1399 Hyde Park Road Santa Fe, NM 87501, USA

Machine Learning in Simple Networks. Lars Kai Hansen

Phase Transitions in the Coloring of Random Graphs

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY

Extreme eigenvalues of Erdős-Rényi random graphs

Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

arxiv: v1 [cond-mat.dis-nn] 16 Feb 2018

Phase Transitions in Physics and Computer Science. Cristopher Moore University of New Mexico and the Santa Fe Institute

Spectra of Large Random Stochastic Matrices & Relaxation in Complex Systems

Spectral thresholds in the bipartite stochastic block model

A brief introduction to the inverse Ising problem and some algorithms to solve it

Learning from and about complex energy landspaces

arxiv: v1 [math.st] 26 Jan 2018

Phase Transitions in Semisupervised Clustering of Sparse Networks

Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models

Theory and Methods for the Analysis of Social Networks

Phase transitions in discrete structures

Data Mining and Analysis: Fundamental Concepts and Algorithms

Bayesian Learning in Undirected Graphical Models

Approximate counting of large subgraphs in random graphs with statistical mechanics methods

Information, Physics, and Computation

Accelerated Monte Carlo simulations with restricted Boltzmann machines

Reconstruction in the Sparse Labeled Stochastic Block Model

XVI International Congress on Mathematical Physics

The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations

Probabilistic Graphical Models

Understanding graphs through spectral densities

Estimating the Number of Communities in a Bipartite Network

SPARSE RANDOM GRAPHS: REGULARIZATION AND CONCENTRATION OF THE LAPLACIAN

Reconstruction for Models on Random Graphs

Data science with multilayer networks: Mathematical foundations and applications

Statistical Physics of Inference and Bayesian Estimation

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

Network Analysis and Modeling

Predicting Graph Labels using Perceptron. Shuang Song

Markov Chains, Random Walks on Graphs, and the Laplacian

Spectral Properties of Random Matrices for Stochastic Block Model

Mini course on Complex Networks

Machine Learning for Data Science (CS4786) Lecture 11

Sparse Superposition Codes for the Gaussian Channel

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

arxiv: v1 [physics.soc-ph] 20 Oct 2010

The Computer Science and Physics of Community Detection: Landscapes, Phase Transitions, and Hardness

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Kyle Reing University of Southern California April 18, 2018

Planted Cliques, Iterative Thresholding and Message Passing Algorithms

Deep unsupervised learning

The Trouble with Community Detection

Spectra of Adjacency and Laplacian Matrices

Phase transition phenomena of statistical mechanical models of the integer factorization problem (submitted to JPSJ, now in review process)

STA 4273H: Statistical Machine Learning

6.867 Machine learning, lecture 23 (Jaakkola)

13 : Variational Inference: Loopy Belief Propagation and Mean Field

CLUSTERING over graphs is a classical problem with

CS249: ADVANCED DATA MINING

Complexity, Parallel Computation and Statistical Physics

Communities, Spectral Clustering, and Random Walks

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Learning Energy-Based Models of High-Dimensional Data

Phase Transitions in Networks: Giant Components, Dynamic Networks, Combinatoric Solvability

Accelerating QMC on quantum computers. Matthias Troyer

1 Matrix notation and preliminaries from spectral graph theory

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Phase transitions in low-rank matrix estimation

Sampling Algorithms for Probabilistic Graphical models

The Ising model and Markov chain Monte Carlo

Fundamental limits of symmetric low-rank matrix estimation

Networks and Their Spectra

STA 4273H: Statistical Machine Learning

Modularity in several random graph models

Introduction to Machine Learning CMU-10701

Online Social Networks and Media. Link Analysis and Web Search

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Spectra of adjacency matrices of random geometric graphs

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Monte Carlo Simulation of the Ising Model. Abstract

A physical model for efficient rankings in networks

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

STA 414/2104: Machine Learning

The Origin of Deep Learning. Lili Mou Jan, 2015

Probabilistic Graphical Models

Transcription:

The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley: E. Mossel, A. Sly, J. Neeman http://krzakala.org

Community detection in network Stochastic block modeling (statistics) Data Clustering (machine learning) Graph-based machine learning (inference) Planted constraint satisfaction models Given the graph, find the labeling! Online communities Word adjacency networks Food webs Metabolic networks Protein-protein interaction networks Financial market (sectors)...

This is a hard problem Adjacency matrix What you have: The data What you want: The structure

This is a hard problem Karate club 77 The internet Protein network and it is getting harder!

Algorithms for clustering Hundreds of different methods - for a review e.g. S. Fortunato, Physics Reports, 2010. Spectral clustering - the state of art clustering method in machine learning. Associate a matrix to the network (adjacency, random walk, Laplacian, etc.), compute its top eigenvalues. The corresponding eigenvectors encode the clusters in a visible way.

This talk Inference using the stochastic block model Spectral algorithm(s)

Stochastic block model Generate a random network as follows: q groups, N nodes n a proportion of nodes in group p ab = c ab a = 1,..., q probability that an edge present N between node from group a and another from group b n 1 = 7/12 n 2 = 5/12 p 11 = p 22 = 0.39 p 12 = p 21 = 0.14

Optimal inference in SBM (when parameters of the model known) P ({q i })= NY n qi P (A ij {q i })= Y i6=j p A ij q i q j (1 p qi q j ) 1 A ij i=1 P ({q i } A ij )= 1 Z P ({q i})p (A ij {q i }) n a p ab proportion of nodes in group a proportion of edge between groups a and b q i 2{1,...,q} an assignment of colors Includes ALL available information about the signal

Optimal algorithm in SBM (for a statistician) Posterior probability distribution P ({s i } A ij )= 1 Z P ({s i})p (A ij {s i }) Marginal probabilities µ(s i ) X {s j } j6=i P ({s j } j6=i,s i A ij ) Maximize number of correctly assigned nodes s i = argmax si µ(s i )

Optimal algorithm in SBM (for a statistical physicist) Potts spin variables s i 2 {1,...,q} Boltzmann measure P ({s i })= 1 Z e H({s i}) Hamiltonian: Potts glass in field on the Nishimori line H({s i })= NX log n si i=1 i6=j magnetic field X Aij log p si s j +(1 A ij ) log (1 p si s j ) pair-wise interactions Local magnetization s i = argmax si µ(s i )

How to compute marginals and Z? A #P-hard problem in general MCMC: (Monte Carlo Markov chain, Gibbs sampling): generic, but potentially a (very) large equilibration time Variational mean field: convergence problem and (sometimes) a crude approximation. Belief propagation/cavity method: fast, hard to control, generally better than mean field, exact for large networks generated by the model.

Results (in a nutshell) Q = max P N i=1 (t i ),s i 1 1/q /N 1/q Overlap of the optimal estimation with the true labeling Q 1 0.8 N=100k, BP N=70k, MC 0.6 q=4, c=16 0.4 0.2 ferromagnet detectable paramagnet undetectable 0 0 0.2 0.4 0.6 0.8 1 ε= c out /c in

Results (in a nutshell) 1 0.8 overlap 0.6 0.4 0.2 c d init. planted init. random q=5,c in =0, N=100k c s paramagnet 0 spinodal region ferromagnet 12 13 14 15 16 17 18 dis-assortative network 5 colors c

Algorithmic consequences 1 0.5 0.8 0.4 c d c s c<c c impossible 0.6 0.4 0.2 init. ordered init. random log Z pl -log Z ran q=5,c in =0, N=100k 0.3 0.2 0.1 log Z pl -log Z ran 0 c c 12 13 14 15 16 17 18 c c in c out = q p c s c c <c<c s c s <c possible but hard easy 0

Rigorous results for q=2 For q=2 we find that the information theoretic thresholds equals the algorithmic one c in c out = q p c s Proven! Communutities are undetectable bellow cs and easy to find beyond cs The undetectable regime (Mossel, Neeman, Sly 12) : Planted graphs are essentially Erdos-Renyi random graphs in the undetectable regime. The detectable regime (Massoulie 13)

This talk Inference using the stochastic block model Spectral algorithm(s)

Spectral clustering Compute k largest eigenvalues and their eigenvectors for a matrix associated to the graph. Cluster these eigenvectors, e.g. using k-means. For 2 groups - signs of the 2nd eigenvector. Matrices: Adjacency Laplacian Random walk Modularity A ij L ij = d i i,j Q ij = A ij d i M ij = A ij A ij d i d j 2M

Adjacency matrix for dense graphs Largest eigenvalue is the average degree, and the eigenvector is trivial. It exists an eigenvector correlated with the communities, with eigenvalue c = cin cout 2 cin + cout + cin cout 1 p 4c The bulk follows Wigner s semi circle law: P ( ) = 2 c If cin 2 p cout > 2 c the informative eigenvalue is out of the bulk Adjacency is optimal for dense graphs! If it fails, any thing else will Nadakuditi & Newman 12

Performance of spectral clustering Overlap 1 0.8 0.6 0.4 0.2 B Modularity Random Walk A L BP q =2 n a =1/2 c =3 N = 10 5 c in = c aa c out = c a6=b 0 2.5 3 3.5 4 4.5 5 5.5 6 c in c out = q p c c in c out Q = max P N i=1 (t i ),q i 1 1/q /N 1/q

Why the suboptimality? Wigner not good for sparse G P ( )= 1 p 4c 2 2 c The edge of the spectrum of sparse graphs is spoiled by nodes of large degree. ER graphs, largest degree ~ log(n)/log(log(n)). How to correct this? Remove largest degrees? Not good enough - loosing information.

Spectral Redemption Can the optimal performance (phase transition) be matched by a spectral method?

The non-backtracking matrix M edges, define matrix B on directed edges, i.e. 2M x 2M matrix as follows B i!j,k!l =1 if j = k, i 6= l B i!j,k!l = 0 otherwise The Hashimoto ( 89) edge adjacency operator used to evaluate the Ihara zeta function Directed walk similar to a message passing algorithm (i.e. Belief propagation)

The non-backtracking matrix 1.5 1.0 0.5 1 µ 1 2 3 0.5 1.0 1.5 Not a random sparse real non-symmetric matrix, elements correlated! adjacency matrix spectrum

Performance of spectral clustering 1 0.8 B Modularity Random Walk A L BP Overlap 0.6 0.4 0.2 0 2.5 3 3.5 4 4.5 5 5.5 6 c in c out

Properties of the non-backtracking matrix Eigenvalues of B, and sums of eigenvector elements are given by a 2N x 2N matrix. B 0 = Characteristic polynomial (Bass 92 formula) Largest eigenvalue c. c in c out = q p c 0 D 1 1 A v j in = X i2@j det µ 2 1 µa +(D 1) =0 v i!j If the 2nd eigenvalue is informative and > p c All others are inside circle p c (we believed, but proved only in density) Number of eigenvalues outside the circle = number of communities! Complete proof now available Massoulie, Lelarge and Bordenave

Spectra of some real networks

Spectrum of the non-backtracking matrix

Spectrum of the non-backtracking matrix (with some physics hocus-pocus ) det D za (1 z 2 )1 = (z) = 1 2N 2NX i=1 2NY i (z i) (z µ) = 1 @ z(z µ) 1 (z i). 1 (z) =lim!0 2 N @ z@ z log det M M (z,a) = 1 i(d za (1 z 2 )1) i(d za (1 z 2 )1) 1 (z) = 1 lim!0 2 N @ z@ z log Z

Spectrum of the non-backtracking matrix (with some physics hocus-pocus )

Spectrum of the non-backtracking matrix (with some physics hocus-pocus ) Non-localized localized

The non-backtracking operator Allows spectral clustering to match information theoretic performances Trivial to implement: spectrum of 2N x 2N matrix. B 0 = 0 D 1 1 A Parameter free: No need for learning Avoid the usual BP convergence problems on non-tree like graph.

The non-backtracking operator Perspectives: Can we compute Full spectrum rigorously? I believe we have only scratched the surface of the potential of this approach Spectral method for matrix factorization? Censored block model (Abbe et al)? Percolation?

Percolation Dense network: The critical occupation probability for percolation on a dense network is equal to the reciprocal of the leading eigenvalue of the adjacency matrix (Bollobas et al 10) ANY network: The critical occupation probability for percolation on a dense network is bounded to the reciprocal of the leading eigenvalue of the non-backtracking matrix (Zdeborova et al 14) (and exact for sparse tree-like graph and dense graphes as well..)

This talk is based on: A. Decelle, FK, C. Moore, L. Zdeborová, Phase transition in the detection of modules in sparse networks, Phys. Rev. Lett. 107, 065701 (2011). A. Decelle, FK, C. Moore, L. Zdeborová, Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications, Phys. Rev. E 84, 066106 (2011). P. Zhang, FK, J. Reichardt, L. Zdeborová, Comparative Study for Inference of Hidden Classes in Stochastic Block Models, J. Stat. Mech. (2012) P12021 FK, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborová, P. Zhang, Spectral clustering of Sparse Networks, Proc. Nat. Acad. Sci. (2013) A. Saade, FK & L. Zdeborová, The spectrum of the non-backtracking matrix EPL (2014) A. Saade, FK & L. Zdeborová, Clustering with the Bethe Hessian NIPS 2014 Implementations available at: http://mode_net.krzakala.org/ Thank you for your attention!