Community detection in stochastic block models via spectral methods

Size: px
Start display at page:

Download "Community detection in stochastic block models via spectral methods"

Transcription

1 Community detection in stochastic block models via spectral methods Laurent Massoulié (MSR-Inria Joint Centre, Inria) based on joint works with: Dan Tomozei (EPFL), Marc Lelarge (Inria), Jiaming Xu (UIUC), Charles Bordenave (IMT)

2 Community Detection Clustering of items based on their observed interactions Typical approach: embedding (eg in Euclidean space), then K-means Embedding space 2

3 Application 1: contact recommendation in online social networks Supporting data: e.g. OSN s friendship graph recommend members of user s implicit community Variation: NSA s co-traveler programme spot groups of suspect persons meeting regularly in unusual places

4 Application 2: item recommendation to users Supporting data: {user-item} matrix Example: Netflix prize {user-movie} ratings matrix User / Movie f 1 f 2 f m u 1? ** *** u 2?? u n ***** ** ** Use item communities to support recommendation users who liked this also liked

5 Application 3: categorizing chemical reactives in biology Basic data: sets of chemicals jointly involved in specific reaction More generally Knowledge graph as generic representation of data A1 has with B1 interaction of type C1 A2 has with B2 interaction of type C2

6 Outline The Stochastic Block Model With labels With general types Performance of Spectral Methods rich signal case The weak signal case: sparse observations Phase transition on detectability Non-backtracking matrix and the Spectral Redemption

7 The Stochastic Block Model, aka planted partition model [Holland-Laskey-Leinhardt 83] n nodes partitioned into r categories Category σ: α σ n nodes Edge between nodes u,v present with probability s b σ u σ v /n (s : average degree, or signal strength) Observation: adjacency matrix A= + noise Community Detection: provide clustering σ based on A that is correlated as much as possible with true planted clustering σ, i.e. maximizing n 1 Overlap:=Max π 1 n π( σ u )=σ u Max σε r α σ u=1

8 The Labeled Stochastic Block Model Edges (u-v) labeled by L uv L (finite set) Drawn from distribution μ σ u σ(v) Netflix case: labels 1-5 stars

9 The SBM with general types [Aldous 81; Lovász 12] User type σ(u) i.i.d. ~P in general set (e.g. uniform on [0,1]) Edge (u-v) present w.p. b σ u σ v e.g. b x, y F x y s/n for kernel b F(x) Edges (u-v) labeled by L uv L (finite set) x Drawn from distribution μ σ u σ(v) Technical assumptions: compact type set and continuity of symmetric functions b and μ

10 Outline The Stochastic Block Model With labels With general types Performance of Spectral Methods rich signal case The weak signal case: sparse observations Phase transition on detectability Non-backtracking matrix and the Spectral Redemption

11 Spectral Clustering From Matrix A extract R normed eigenvectors x i corresponding to R largest eigenvalues 1 R Form R-dimensional node representatives y u = n(x i (u)) i=1 R Group nodes u according to proximity of spectral representatives y u Karl Pearson s PCA: On Lines and Planes of Closest Fit to Systems of Points in Space", 1901

12 Illustration for R= Clustering from SVD Principal component Principal component 1 Netflix dataset SBM with K=4

13 Result for logarithmic signal strength s Assume s= (log(n)) and clusters are distinguishable, i.e. σ σ τ such that b στ b σ τ Then spectrum of A consists of R eigenvalues i of order (s) (R K) and n-r eigenvalues i of order O s Where R = rank b uv α v 1 u,v K Node representatives y u based on top R eigenvectors x i : Cluster according to underlying blocks except for negligible fraction of nodes

14 Proof arguments Control spectral radius of noise matrix + perturbation of matrix eigen-elements A = + random noise matrix Block matrix non-zero eigenvalues: (s)

15 Controlling perturbation of eigen-elements of Hermitian matrices A = A + W Where A : observed; A : signal; W : perturbation Perturbation of eigenvalues: order 1 n, 1 n Then i i ρ W, i = 1, n (Weyl) (a simple consequence of Courant-Fisher s minimax theorem: For Hermitian n n matrix H : i H = i H = max min dim V =i v V, v =1 v Hv min max dim V =n i+1 v V, v =1 v Hv

16 Controlling perturbation of eigen-elements of Hermitian matrices A = A + W Where A : observed; A : signal; W : perturbation Perturbation of eigenvectors: order 1 n, 1 n Let min i j. If ρ W < /2 i,j: i j Then for all i and for any normed eigenvector x i i, there exists i s.t. x i x i x i 1 ρ W ρ W ( A simple version of [Chandler Davis-Kahane 70] ) 2

17 Application to SBM Spectrum of A : spectrum of b uv α v 1 u,v K scaled by s, hence = s Announced result follows if ρ W = o s

18 Ramanujan graphs [Lubotzky-Phillips-Sarnak 88] s-regular graphs such that eigenvalues of adjacency matrix verify :=max k: k < s k 2 s 1 The best possible spectral gap [Alon-Boppana 91] 2 s 1 O s diam G

19 spectral separation properties à la Ramanujan [Friedman 08] random s-regular graph with s 3 verifies whp =2 s 1 + o(1) [Feige-Ofek 05]: for Erdős-Rényi graph G n, s/n and s = log n, then whp = O s Corollary: for SBM, ρ A A = O s when s = log n

20 spectral separation properties à la Ramanujan Consequence: in SBM with s = log n, whp ρ A A = O s A s leading eigen-elements close to those of A For s = 1, ρ A A ~C spectral separation is lost log n log log n standard spectral method needs fixing to work in low signal regime

21 Discrepancy between SBM with small K and Netflix Eigenvalue distributions 4 outstanding eigenvalues SBM with K=4 Netflix (subset) motivates consideration of SBM with general types

22 SBM with general types User types (u) i.i.d. ~P from general set (e.g. uniform on [0,1]) Edge (u-v) present w.p. b σ u σ v e.g. b x, y F x y s/n for kernel b F(x) x Edges (u-v) labeled by L uv L (finite set) Drawn from distribution μ σ u σ(v) Form matrix {A ij W(L ij )} from random projections W l of labels

23 SBM with general types: Spectral properties for logarithmic s Define kernel K(x, y) l W l b xy μ xy l and integral operator Tf x K x, y f(y)p dy spectrum of s 1 {A ij W(L ij )} spectrum of T Eigenvalue convergence: s 1 i n i Eigenvector convergence: x i (u) φ i σ u Associated eigenfunctionn Type of node u

24 SBM with general types: Spectral properties for logarithmic s Flexible model -power-law spectra (convolution operator + Fourier analysis) b x, y F x y F(x) better matches to Netflix data x

25 SBM with general types: estimation for logarithmic s For fixed R form R-dimensional node representatives y u = Prob(label(i,j)=5)? Node j n k 1 x k (u) k=1 R Node i Use empirical distribution of labels L(i,k) for k in -neighborhood of j Embedding allows consistent estimation of label distributions Accuracy controlled by ε and ε R k>r k 2

26 Outline The Stochastic Block Model With labels With general types Performance of Spectral Methods rich signal case The weak signal case: sparse observations Phase transition on detectability Non-backtracking matrix and the Spectral Redemption

27 Weak signal strength: s = (1) Correct classification of all but negligible fraction of nodes impossible (isolated nodes ) Performance of clustering σ assessed by overlap metric: n 1 Overlap:=Max π 1 n π( σ u )=σ u Max σε r α σ u=1 Overlap Overlap s 0 : threshold for Giant component Signal strength s s 0 s c Signal strength s

28 Positively correlated community detection Symmetric two-communities scenario: E A = a/n b/n n/2 b/n a/n n/2 Conjecture ([Decelle-Krzakala-Moore-Zdeborova 2011]: For τ: = a b 2 < 1, correlation tends to zero for any σ 2 a+b Proven by [Mossel-Neeman-Sly 2012] For τ > 1, positive correlation can be achieved Proven by [LM 2013] and [Mossel-Neeman-Sly 2013] Spectral redemption conjecture [Krzakala-Moore-Mossel-Neeman-Sly- Zdeborova-Zhang 2013]): Spectral methods based on non-backtracking matrix can achieve positively correlated detection

29 Non-backtracking matrix B of graph G = (V, E) Defined on oriented edges uv for u, v εe : B uv,xy = 1 v=x 1 u y e f u v y u e f v Asymmetric, such that B k ef = number of non-backtracking paths on G of length k+1 starting at e and ending at f

30 Non-backtracking matrix B of graph G = (V, E) Defined on oriented edges uv for u, v εe : B uv,xy = 1 v=x 1 u y Ihara-Bass formula [Ihara 66] for A: adjacency matrix, D: diagonal matrix of node degrees, 1 u 2 n m Det I ub = Det I ua + u 2 D I Hence s-regular graph Ramanujan iff eigenvalues k of B verify max k: k < 1 k 1 A definition that extends to non-regular graphs ([Stark-Terras 96] s graph theory Riemann hypothesis)

31 Notations and assumptions Let μ 1,, μ r, μ 1 μ r be the leading eigenvalues of E A and x i corresponding eigenvectors in R n Let y i : y i e = x i e 1, y i R n n 1 : lift of eigenvector x i to Rn n 1 In symmetric two-community case, μ s: eigenvalues of μ 1 = a+b, μ 2 2= a b (hence τ = μ μ 1 ) a/2 b/2 b/2 a/2 r Assume constant per-community average degree: j r, i=1 α i b ij μ 1

32 Notations and assumptions Call eigenvalue μ i visible if μ i > μ 1 Let r 0 : μ r0 > μ 1 μ r0 +1 : index of smallest visible eigenvalue Kesten-Stigum condition: r 0 > 1, i.e. there is some i > 1 such that μ i is visible In symmetric two-community case, Kesten-Stigum condition: μ 2 2 > μ 1 or equivalently, τ = μ 2 2 μ 1 > 1

33 Main result: spectrum of non-backtracking matrix for stochastic block model With high probability as n, eigenvalues i of B satisfy i r 0 i μ i = o 1 i > r 0 i μ 1 + o 1 For i r 0, if μ i has multiplicity 1, corresponding eigenvector: asymptotically parallel to B k B k y i for k~ log n

34 Illustration for 2-community symmetric Stochastic block model i

35 Corollary 1: proof of Spectral Redemption Conjecture Assume r 0 > 1, equal community sizes α i 1 r and μ i has multiplicity 1 for some i : 1 < i r 0 Then positive correlation obtained by following procedure 1) Extract eigenvector z i of B 2) Form signs s e = sgn z i e τ 3) To each node u assign s u = s(e) for edge e picked uniformly among graph edges with e 1 = u 4) Assign community σ u at random according to distribution π s u e u e e

36 Illustration for Erdős-Rényi graph with n = 500, α = 4

37 Corollary 2: Erdős-Rényi graphs are nearly Ramanujan For Erdős-Rényi graph G α, n, spectrum n i of associated nonbacktracking matrix B satisfies with high probability as n max k: k < 1 k 1 + o(1) An approximate version of max k: k < 1 k 1 Property generalizes notion of Ramanujan graph to non-regular case [Ihara 66, Stark-Terras 96] Hence: Corollary 2 a «non-regular» version of Friedman s result for regular graphs: most random graphs (with given number of edges) approximately Ramanujan according to extended definition

38 Proof strategy Key step: for k~ log n prove matrix expansion B k r = i=1 k i v i w i + R k where i ~μ i (near-eigenvalue), v i B k B k y i (near- right eigenvector), w i B k y i (near- left eigenvector), R k a perturbation and with high probability Low-rank i j v i v j, v i w j, w i w j = o 1, (near-orthonormal system) v i w i > c > 0 (low condition number) ρ R k = O μ 1 k (negligible perturbation) Yields the result, after leveraging Bauer-Fike theorem (a tool to control impact of additive perturbation on eigenvalues of matrix not necessarily symmetric)

39 Proof elements (perturbations negligible) To show ρ R k = O μ 1 k, establish «small norm in complement»: sup x =1,x w i =0,i=1,,r Bk x = O μ 1 k Challenge: directions w i random and correlated with B k

40 Remaining mysteries about SBM s (1) Conjectured phase diagram for more than 2 blocks (assuming fixed inter-community parameter b) Intra-community parameter a Above Kesten-Stigum threshold Detection easy (spectral methods or BP) Detection hard but feasible (how? In polynomial time?) Detection infeasible r=4 r=5 Number of communities r

41 Remaining mysteries about SBM s (2) Clique detection problem: add a size-k clique to random graph with edgeprobability ½ ½ i.e. a 2-block SBM with unbalanced block sizes: ½ ½ K n-k for K = n clique easily detectable (e.g. inspection of node degrees) are there polynomial-time algorithms for smaller yet large K? (e.g. K = 3 n ) A notoriously hard problem ( planted clique detection recently proposed as a new benchmark of algorithmic hardness)

42 Conclusions and Outlook Variations of basic spectral methods still to be invented: interesting mathematics and practical relevance Detection in SBM = rich playground for analysis of computational complexity with methods of statistical physics Computationally efficient methods for hard cases (planted clique, intermediate phase for multiple communities)? Non-regular Ramanujan graphs: theory still in its infancy (proper analogue of Alon-Boppana s theorem missing)

43 Thanks!

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria Community Detection fundamental limits & efficient algorithms Laurent Massoulié, Inria Community Detection From graph of node-to-node interactions, identify groups of similar nodes Example: Graph of US

More information

Reconstruction in the Generalized Stochastic Block Model

Reconstruction in the Generalized Stochastic Block Model Reconstruction in the Generalized Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign GDR

More information

Statistical and Computational Phase Transitions in Planted Models

Statistical and Computational Phase Transitions in Planted Models Statistical and Computational Phase Transitions in Planted Models Jiaming Xu Joint work with Yudong Chen (UC Berkeley) Acknowledgement: Prof. Bruce Hajek November 4, 203 Cluster/Community structure in

More information

Reconstruction in the Sparse Labeled Stochastic Block Model

Reconstruction in the Sparse Labeled Stochastic Block Model Reconstruction in the Sparse Labeled Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign

More information

The non-backtracking operator

The non-backtracking operator The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:

More information

Extreme eigenvalues of Erdős-Rényi random graphs

Extreme eigenvalues of Erdős-Rényi random graphs Extreme eigenvalues of Erdős-Rényi random graphs Florent Benaych-Georges j.w.w. Charles Bordenave and Antti Knowles MAP5, Université Paris Descartes May 18, 2018 IPAM UCLA Inhomogeneous Erdős-Rényi random

More information

Spectral thresholds in the bipartite stochastic block model

Spectral thresholds in the bipartite stochastic block model Spectral thresholds in the bipartite stochastic block model Laura Florescu and Will Perkins NYU and U of Birmingham September 27, 2016 Laura Florescu and Will Perkins Spectral thresholds in the bipartite

More information

Spectral Partitiong in a Stochastic Block Model

Spectral Partitiong in a Stochastic Block Model Spectral Graph Theory Lecture 21 Spectral Partitiong in a Stochastic Block Model Daniel A. Spielman November 16, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened

More information

How Robust are Thresholds for Community Detection?

How Robust are Thresholds for Community Detection? How Robust are Thresholds for Community Detection? Ankur Moitra (MIT) joint work with Amelia Perry (MIT) and Alex Wein (MIT) Let me tell you a story about the success of belief propagation and statistical

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY OLIVIER GUÉDON AND ROMAN VERSHYNIN Abstract. We present a simple and flexible method to prove consistency of semidefinite optimization

More information

Clustering from Sparse Pairwise Measurements

Clustering from Sparse Pairwise Measurements Clustering from Sparse Pairwise Measurements arxiv:6006683v2 [cssi] 9 May 206 Alaa Saade Laboratoire de Physique Statistique École Normale Supérieure, 24 Rue Lhomond Paris 75005 Marc Lelarge INRIA and

More information

A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities

A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities Journal of Advanced Statistics, Vol. 3, No. 2, June 2018 https://dx.doi.org/10.22606/jas.2018.32001 15 A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities Laala Zeyneb

More information

Spectral gap in bipartite biregular graphs and applications

Spectral gap in bipartite biregular graphs and applications Spectral gap in bipartite biregular graphs and applications Ioana Dumitriu Department of Mathematics University of Washington (Seattle) Joint work with Gerandy Brito and Kameron Harris ICERM workshop on

More information

Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization Jess Banks Cristopher Moore Roman Vershynin Nicolas Verzelen Jiaming Xu Abstract We study the problem

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

Computational Lower Bounds for Community Detection on Random Graphs

Computational Lower Bounds for Community Detection on Random Graphs Computational Lower Bounds for Community Detection on Random Graphs Bruce Hajek, Yihong Wu, Jiaming Xu Department of Electrical and Computer Engineering Coordinated Science Laboratory University of Illinois

More information

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010 Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu COMPSTAT

More information

arxiv: v1 [math.st] 26 Jan 2018

arxiv: v1 [math.st] 26 Jan 2018 CONCENTRATION OF RANDOM GRAPHS AND APPLICATION TO COMMUNITY DETECTION arxiv:1801.08724v1 [math.st] 26 Jan 2018 CAN M. LE, ELIZAVETA LEVINA AND ROMAN VERSHYNIN Abstract. Random matrix theory has played

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices

Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices Alex Wein MIT Joint work with: Amelia Perry (MIT), Afonso Bandeira (Courant NYU), Ankur Moitra (MIT) 1 / 22 Random

More information

SPARSE RANDOM GRAPHS: REGULARIZATION AND CONCENTRATION OF THE LAPLACIAN

SPARSE RANDOM GRAPHS: REGULARIZATION AND CONCENTRATION OF THE LAPLACIAN SPARSE RANDOM GRAPHS: REGULARIZATION AND CONCENTRATION OF THE LAPLACIAN CAN M. LE, ELIZAVETA LEVINA, AND ROMAN VERSHYNIN Abstract. We study random graphs with possibly different edge probabilities in the

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient

More information

x #{ p=prime p x }, as x logx

x #{ p=prime p x }, as x logx 1 The Riemann zeta function for Re(s) > 1 ζ s -s ( ) -1 2 duality between primes & complex zeros of zeta using Hadamard product over zeros prime number theorem x #{ p=prime p x }, as x logx statistics

More information

Random Lifts of Graphs

Random Lifts of Graphs 27th Brazilian Math Colloquium, July 09 Plan of this talk A brief introduction to the probabilistic method. A quick review of expander graphs and their spectrum. Lifts, random lifts and their properties.

More information

Spectral Properties of Random Matrices for Stochastic Block Model

Spectral Properties of Random Matrices for Stochastic Block Model Spectral Properties of Random Matrices for Stochastic Block Model Konstantin Avrachenkov INRIA Sophia Antipolis France Email:konstantin.avratchenkov@inria.fr Laura Cottatellucci Eurecom Sophia Antipolis

More information

Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models

Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models JMLR: Workshop and Conference Proceedings vol 35:1 15, 2014 Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models Elchanan Mossel mossel@stat.berkeley.edu Department of Statistics

More information

Linear algebra and applications to graphs Part 1

Linear algebra and applications to graphs Part 1 Linear algebra and applications to graphs Part 1 Written up by Mikhail Belkin and Moon Duchin Instructor: Laszlo Babai June 17, 2001 1 Basic Linear Algebra Exercise 1.1 Let V and W be linear subspaces

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

Community Detection. Data Analytics - Community Detection Module

Community Detection. Data Analytics - Community Detection Module Community Detection Data Analytics - Community Detection Module Zachary s karate club Members of a karate club (observed for 3 years). Edges represent interactions outside the activities of the club. Community

More information

Spectral Methods for Subgraph Detection

Spectral Methods for Subgraph Detection Spectral Methods for Subgraph Detection Nadya T. Bliss & Benjamin A. Miller Embedded and High Performance Computing Patrick J. Wolfe Statistics and Information Laboratory Harvard University 12 July 2010

More information

Spectra of adjacency matrices of random geometric graphs

Spectra of adjacency matrices of random geometric graphs Spectra of adjacency matrices of random geometric graphs Paul Blackwell, Mark Edmondson-Jones and Jonathan Jordan University of Sheffield 22nd December 2006 Abstract We investigate the spectral properties

More information

Lecture 1: Riemann, Dedekind, Selberg, and Ihara Zetas

Lecture 1: Riemann, Dedekind, Selberg, and Ihara Zetas Lecture 1: Riemann, Dedekind, Selberg, and Ihara Zetas Audrey Terras U.C.S.D. 2008 more details can be found in my webpage: www.math.ucsd.edu /~aterras/ newbook.pdf First the Riemann Zeta 1 The Riemann

More information

What is the Riemann Hypothesis for Zeta Functions of Irregular Graphs?

What is the Riemann Hypothesis for Zeta Functions of Irregular Graphs? What is the Riemann Hypothesis for Zeta Functions of Irregular Graphs? UCLA, IPAM February, 2008 Joint work with H. M. Stark, M. D. Horton, etc. What is an expander graph X? X finite connected (irregular)

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Spectral Redemption: Clustering Sparse Networks

Spectral Redemption: Clustering Sparse Networks Spectral Redemption: Clustering Sparse Networks Florent Krzakala Cristopher Moore Elchanan Mossel Joe Neeman Allan Sly, et al. SFI WORKING PAPER: 03-07-05 SFI Working Papers contain accounts of scienti5ic

More information

Random Sampling of Bandlimited Signals on Graphs

Random Sampling of Bandlimited Signals on Graphs Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with

More information

Hyperuniformity on the Sphere

Hyperuniformity on the Sphere Hyperuniformity on the Sphere Johann S. Brauchart j.brauchart@tugraz.at 14. June 2018 Week 2 Workshop MATRIX Research Program "On the Frontiers of High Dimensional Computation" MATRIX, Creswick [ 1 / 56

More information

Spectral thresholds in the bipartite stochastic block model

Spectral thresholds in the bipartite stochastic block model JMLR: Workshop and Conference Proceedings vol 49:1 17, 2016 Spectral thresholds in the bipartite stochastic block model Laura Florescu New York University Will Perkins University of Birmingham FLORESCU@CIMS.NYU.EDU

More information

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing

More information

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Information Recovery from Pairwise Measurements

Information Recovery from Pairwise Measurements Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen, Changho Suh, Andrea Goldsmith Stanford University KAIST Page 1 Recovering data from correlation measurements A large

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Matrix estimation by Universal Singular Value Thresholding

Matrix estimation by Universal Singular Value Thresholding Matrix estimation by Universal Singular Value Thresholding Courant Institute, NYU Let us begin with an example: Suppose that we have an undirected random graph G on n vertices. Model: There is a real symmetric

More information

Independence and chromatic number (and random k-sat): Sparse Case. Dimitris Achlioptas Microsoft

Independence and chromatic number (and random k-sat): Sparse Case. Dimitris Achlioptas Microsoft Independence and chromatic number (and random k-sat): Sparse Case Dimitris Achlioptas Microsoft Random graphs W.h.p.: with probability that tends to 1 as n. Hamiltonian cycle Let τ 2 be the moment all

More information

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 Lecture 3 In which we show how to find a planted clique in a random graph. 1 Finding a Planted Clique We will analyze

More information

TEST CODE: PMB SYLLABUS

TEST CODE: PMB SYLLABUS TEST CODE: PMB SYLLABUS Convergence and divergence of sequence and series; Cauchy sequence and completeness; Bolzano-Weierstrass theorem; continuity, uniform continuity, differentiability; directional

More information

What is the Riemann Hypothesis for Zeta Functions of Irregular Graphs?

What is the Riemann Hypothesis for Zeta Functions of Irregular Graphs? What is the Riemann Hypothesis for Zeta Functions of Irregular Graphs? Audrey Terras Banff February, 2008 Joint work with H. M. Stark, M. D. Horton, etc. Introduction The Riemann zeta function for Re(s)>1

More information

Sampling, Inference and Clustering for Data on Graphs

Sampling, Inference and Clustering for Data on Graphs Sampling, Inference and Clustering for Data on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work

More information

The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations

The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations City University of New York Frontier Probability Days 2018 Joint work with Dr. Sreenivasa Rao Jammalamadaka

More information

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental

More information

Community Detection and Stochastic Block Models: Recent Developments

Community Detection and Stochastic Block Models: Recent Developments Journal of Machine Learning Research 18 (2018) 1-86 Submitted 9/16; Revised 3/17; Published 4/18 Community Detection and Stochastic Block Models: Recent Developments Emmanuel Abbe Program in Applied and

More information

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs Ma/CS 6b Class 3: Eigenvalues in Regular Graphs By Adam Sheffer Recall: The Spectrum of a Graph Consider a graph G = V, E and let A be the adjacency matrix of G. The eigenvalues of G are the eigenvalues

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

Eigenvalues, random walks and Ramanujan graphs

Eigenvalues, random walks and Ramanujan graphs Eigenvalues, random walks and Ramanujan graphs David Ellis 1 The Expander Mixing lemma We have seen that a bounded-degree graph is a good edge-expander if and only if if has large spectral gap If G = (V,

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian

Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian Radu Balan February 5, 2018 Datasets diversity: Social Networks: Set of individuals ( agents, actors ) interacting with each other (e.g.,

More information

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs Jiaming Xu Joint work with Rui Wu, Kai Zhu, Bruce Hajek, R. Srikant, and Lei Ying University of Illinois, Urbana-Champaign

More information

1 Math 241A-B Homework Problem List for F2015 and W2016

1 Math 241A-B Homework Problem List for F2015 and W2016 1 Math 241A-B Homework Problem List for F2015 W2016 1.1 Homework 1. Due Wednesday, October 7, 2015 Notation 1.1 Let U be any set, g be a positive function on U, Y be a normed space. For any f : U Y let

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

GAUSSIAN PROCESS TRANSFORMS

GAUSSIAN PROCESS TRANSFORMS GAUSSIAN PROCESS TRANSFORMS Philip A. Chou Ricardo L. de Queiroz Microsoft Research, Redmond, WA, USA pachou@microsoft.com) Computer Science Department, Universidade de Brasilia, Brasilia, Brazil queiroz@ieee.org)

More information

Andriy Mnih and Ruslan Salakhutdinov

Andriy Mnih and Ruslan Salakhutdinov MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Szemerédi s regularity lemma revisited. Lewis Memorial Lecture March 14, Terence Tao (UCLA)

Szemerédi s regularity lemma revisited. Lewis Memorial Lecture March 14, Terence Tao (UCLA) Szemerédi s regularity lemma revisited Lewis Memorial Lecture March 14, 2008 Terence Tao (UCLA) 1 Finding models of large dense graphs Suppose we are given a large dense graph G = (V, E), where V is a

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Eight theorems in extremal spectral graph theory

Eight theorems in extremal spectral graph theory Eight theorems in extremal spectral graph theory Michael Tait Carnegie Mellon University mtait@cmu.edu ICOMAS 2018 May 11, 2018 Michael Tait (CMU) May 11, 2018 1 / 1 Questions in extremal graph theory

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)

Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech) Fourier PCA Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech) Introduction 1. Describe a learning problem. 2. Develop an efficient tensor decomposition. Independent component

More information

Spectral Analysis of k-balanced Signed Graphs

Spectral Analysis of k-balanced Signed Graphs Spectral Analysis of k-balanced Signed Graphs Leting Wu 1, Xiaowei Ying 1, Xintao Wu 1, Aidong Lu 1 and Zhi-Hua Zhou 2 1 University of North Carolina at Charlotte, USA, {lwu8,xying, xwu,alu1}@uncc.edu

More information

Linear Algebra: Matrix Eigenvalue Problems

Linear Algebra: Matrix Eigenvalue Problems CHAPTER8 Linear Algebra: Matrix Eigenvalue Problems Chapter 8 p1 A matrix eigenvalue problem considers the vector equation (1) Ax = λx. 8.0 Linear Algebra: Matrix Eigenvalue Problems Here A is a given

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Graph Metrics and Dimension Reduction

Graph Metrics and Dimension Reduction Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November

More information

Planted Cliques, Iterative Thresholding and Message Passing Algorithms

Planted Cliques, Iterative Thresholding and Message Passing Algorithms Planted Cliques, Iterative Thresholding and Message Passing Algorithms Yash Deshpande and Andrea Montanari Stanford University November 5, 2013 Deshpande, Montanari Planted Cliques November 5, 2013 1 /

More information

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016 A Random Dot Product Model for Weighted Networks arxiv:1611.02530v1 [stat.ap] 8 Nov 2016 Daryl R. DeFord 1 Daniel N. Rockmore 1,2,3 1 Department of Mathematics, Dartmouth College, Hanover, NH, USA 03755

More information

Invertibility and Largest Eigenvalue of Symmetric Matrix Signings

Invertibility and Largest Eigenvalue of Symmetric Matrix Signings Invertibility and Largest Eigenvalue of Symmetric Matrix Signings Charles Carlson Karthekeyan Chandrasekaran Hsien-Chih Chang Alexandra Kolla Abstract The spectra of signed matrices have played a fundamental

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

arxiv: v2 [math.pr] 13 Dec 2017

arxiv: v2 [math.pr] 13 Dec 2017 LIMITING EMPIRICAL SPECTRAL DISTRIBUTION FOR THE NON-BACKTRACKING MATRIX OF AN ERDŐS-RÉNYI RANDOM GRAPH KE WANG AND PHILIP MATCHETT WOOD arxiv:70.05v [math.pr] 3 Dec 07 Abstract. In this note, we give

More information

Singular value decomposition (SVD) of large random matrices. India, 2010

Singular value decomposition (SVD) of large random matrices. India, 2010 Singular value decomposition (SVD) of large random matrices Marianna Bolla Budapest University of Technology and Economics marib@math.bme.hu India, 2010 Motivation New challenge of multivariate statistics:

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of

More information

CONCEPT OF DENSITY FOR FUNCTIONAL DATA

CONCEPT OF DENSITY FOR FUNCTIONAL DATA CONCEPT OF DENSITY FOR FUNCTIONAL DATA AURORE DELAIGLE U MELBOURNE & U BRISTOL PETER HALL U MELBOURNE & UC DAVIS 1 CONCEPT OF DENSITY IN FUNCTIONAL DATA ANALYSIS The notion of probability density for a

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Two-sample hypothesis testing for random dot product graphs

Two-sample hypothesis testing for random dot product graphs Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,

More information

Analysis Preliminary Exam Workshop: Hilbert Spaces

Analysis Preliminary Exam Workshop: Hilbert Spaces Analysis Preliminary Exam Workshop: Hilbert Spaces 1. Hilbert spaces A Hilbert space H is a complete real or complex inner product space. Consider complex Hilbert spaces for definiteness. If (, ) : H H

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Correlation Preserving Unsupervised Discretization. Outline

Correlation Preserving Unsupervised Discretization. Outline Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization

More information

Machine Learning - MT Clustering

Machine Learning - MT Clustering Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

Laplacian and Random Walks on Graphs

Laplacian and Random Walks on Graphs Laplacian and Random Walks on Graphs Linyuan Lu University of South Carolina Selected Topics on Spectral Graph Theory (II) Nankai University, Tianjin, May 22, 2014 Five talks Selected Topics on Spectral

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information