Line Orthogonality in Adjacency Eigenspace with Application to Community Partition

Similar documents
Spectral Analysis of k-balanced Signed Graphs

Lecture Notes: Finite Element Analysis, J.E. Akin, Rice University

The Real Stabilizability Radius of the Multi-Link Inverted Pendulum

Classify by number of ports and examine the possible structures that result. Using only one-port elements, no more than two elements can be assembled.

Xihe Li, Ligong Wang and Shangyuan Zhang

Analysis of Spectral Space Properties of Directed Graphs using Matrix Perturbation Theory with Application in Graph Partition

A Survey of the Implementation of Numerical Schemes for Linear Advection Equation

Technical Note. ODiSI-B Sensor Strain Gage Factor Uncertainty

A Model-Free Adaptive Control of Pulsed GTAW

Simplified Identification Scheme for Structures on a Flexible Base

Graphs and Their. Applications (6) K.M. Koh* F.M. Dong and E.G. Tay. 17 The Number of Spanning Trees

TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings

OPTIMUM EXPRESSION FOR COMPUTATION OF THE GRAVITY FIELD OF A POLYHEDRAL BODY WITH LINEARLY INCREASING DENSITY 1

FEA Solution Procedure

Sources of Non Stationarity in the Semivariogram

Vectors in Rn un. This definition of norm is an extension of the Pythagorean Theorem. Consider the vector u = (5, 8) in R 2

A Characterization of the Domain of Beta-Divergence and Its Connection to Bregman Variational Model

), σ is a parameter, is the Euclidean norm in R d.

FREQUENCY DOMAIN FLUTTER SOLUTION TECHNIQUE USING COMPLEX MU-ANALYSIS

Nonlinear parametric optimization using cylindrical algebraic decomposition

Setting The K Value And Polarization Mode Of The Delta Undulator

When are Two Numerical Polynomials Relatively Prime?

FOUNTAIN codes [3], [4] provide an efficient solution

Step-Size Bounds Analysis of the Generalized Multidelay Adaptive Filter

Graphs and Networks Lecture 5. PageRank. Lecturer: Daniel A. Spielman September 20, 2007

System identification of buildings equipped with closed-loop control devices

Decision Making in Complex Environments. Lecture 2 Ratings and Introduction to Analytic Network Process

Lecture 3. (2) Last time: 3D space. The dot product. Dan Nichols January 30, 2018

Introduction to Spectral Graph Theory and Graph Clustering

Chapter 4 Supervised learning:

arxiv:quant-ph/ v4 14 May 2003

1 The space of linear transformations from R n to R m :

Spectral Clustering. Zitao Liu

The spreading residue harmonic balance method for nonlinear vibration of an electrostatically actuated microbeam

EVALUATION OF GROUND STRAIN FROM IN SITU DYNAMIC RESPONSE

1 Undiscounted Problem (Deterministic)

Prandl established a universal velocity profile for flow parallel to the bed given by

Discontinuous Fluctuation Distribution for Time-Dependent Problems

On the tree cover number of a graph

Prediction of Transmission Distortion for Wireless Video Communication: Analysis

Homework 5 Solutions

Markov Chains and Spectral Clustering

Control Performance Monitoring of State-Dependent Nonlinear Processes

We automate the bivariate change-of-variables technique for bivariate continuous random variables with

arxiv: v1 [cs.sy] 22 Nov 2018

On Spectral Analysis of Directed Graphs with Graph Perturbation Theory

Sareban: Evaluation of Three Common Algorithms for Structure Active Control

Effects Of Symmetry On The Structural Controllability Of Neural Networks: A Perspective

Chapter 2 Difficulties associated with corners

Subcritical bifurcation to innitely many rotating waves. Arnd Scheel. Freie Universitat Berlin. Arnimallee Berlin, Germany

Frequency Estimation, Multiple Stationary Nonsinusoidal Resonances With Trend 1

Effects of Soil Spatial Variability on Bearing Capacity of Shallow Foundations

Hybrid modelling and model reduction for control & optimisation

Study on the impulsive pressure of tank oscillating by force towards multiple degrees of freedom

Stair Matrix and its Applications to Massive MIMO Uplink Data Detection

Development of Second Order Plus Time Delay (SOPTD) Model from Orthonormal Basis Filter (OBF) Model

Discussion of The Forward Search: Theory and Data Analysis by Anthony C. Atkinson, Marco Riani, and Andrea Ceroli

The Dual of the Maximum Likelihood Method

Substructure Finite Element Model Updating of a Space Frame Structure by Minimization of Modal Dynamic Residual

DEFINITION OF A NEW UO 2 F 2 DENSITY LAW FOR LOW- MODERATED SOLUTIONS (H/U < 20) AND CONSEQUENCES ON CRITICALITY SAFETY

Change Detection Using Directional Statistics

Discussion Papers Department of Economics University of Copenhagen

Information Source Detection in the SIR Model: A Sample Path Based Approach

APPENDIX B MATRIX NOTATION. The Definition of Matrix Notation is the Definition of Matrix Multiplication B.1 INTRODUCTION

When Closed Graph Manifolds are Finitely Covered by Surface Bundles Over S 1

Change Detection Using Directional Statistics

UNCERTAINTY FOCUSED STRENGTH ANALYSIS MODEL

BLOOM S TAXONOMY. Following Bloom s Taxonomy to Assess Students

RESGen: Renewable Energy Scenario Generation Platform

Incorporating Diversity in a Learning to Rank Recommender System

Evaluation of the Fiberglass-Reinforced Plastics Interfacial Behavior by using Ultrasonic Wave Propagation Method

Computational Fluid Dynamics Simulation and Wind Tunnel Testing on Microlight Model

The SISTEM method. LOS ascending

A Single Species in One Spatial Dimension

PHASE STEERING AND FOCUSING BEHAVIOR OF ULTRASOUND IN CEMENTITIOUS MATERIALS

1. State-Space Linear Systems 2. Block Diagrams 3. Exercises

PC1 PC4 PC2 PC3 PC5 PC6 PC7 PC8 PC

Applying Fuzzy Set Approach into Achieving Quality Improvement for Qualitative Quality Response

PREDICTABILITY OF SOLID STATE ZENER REFERENCES

Elements of Coordinate System Transformations

Study on the Mathematic Model of Product Modular System Orienting the Modular Design

Math 116 First Midterm October 14, 2009

Worst-case analysis of the LPT algorithm for single processor scheduling with time restrictions

Prediction of Effective Asphalt Layer Temperature

Design and Analyses of some 1-D Chaotic Generators for Secure Data

Restricted Three-Body Problem in Different Coordinate Systems

Analytical Value-at-Risk and Expected Shortfall under Regime Switching *

CRITERIA FOR TOEPLITZ OPERATORS ON THE SPHERE. Jingbo Xia

Mechanisms and topology determination of complex chemical and biological network systems from first-passage theoretical approach

Temporal Social Network: Group Query Processing

Decoder Error Probability of MRD Codes

Determining of temperature field in a L-shaped domain

Decoder Error Probability of MRD Codes

Nonparametric Identification and Robust H Controller Synthesis for a Rotational/Translational Actuator

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Optimal Control of a Heterogeneous Two Server System with Consideration for Power and Performance

PIPELINE MECHANICAL DAMAGE CHARACTERIZATION BY MULTIPLE MAGNETIZATION LEVEL DECOUPLING

STEP Support Programme. STEP III Hyperbolic Functions: Solutions

MEASUREMENT OF TURBULENCE STATISTICS USING HOT WIRE ANEMOMETRY

Introdction Finite elds play an increasingly important role in modern digital commnication systems. Typical areas of applications are cryptographic sc

Transcription:

Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Line Orthogonality in Adjacency Eigenspace with Application to Commnity Partition Leting W 1,XiaoweiYing 1,XintaoW 1, Zhi-Ha Zho 2 1 Software and Information Systems Dept, University of North Carolina at Charlotte, USA 2 National Key Laboratory for Novel Software Technology, Nanjing University, China 1 {lw8,xying,xw}@ncced, 2 zhozh@lamdanjedcn Abstract Different from Laplacian or normal matrix, the properties of the adjacency eigenspace received mch less attention Recent work showed that nodes projected into the adjacency eigenspace exhibit an orthogonal line pattern and nodes from the same commnity locate along the same line In this paper, we condct theoretical stdies based on graph pertrbation to demonstrate why this line orthogonality property holds in the adjacency eigenspace and why it generally disappears in the Laplacian and normal eigenspaces Using the orthogonality property in the adjacency eigenspace, we present a graph partition algorithm, AdjClster, which first projects node coordinates to the nit sphere and then applies the classic k-means to find clsters Empirical evalations on synthetic data and real-world social networks validate or theoretical findings and show the effectiveness of or graph partition algorithm 1 Introdction Different from the Laplacian matrix or normal matrix, the properties of the adjacency eigenspace received mch less attention except some recent work [Prakash et al, 21; Ying and W, 29] It was shown by Prakash et al [21] that the singlar vectors of mobile call graphs exhibit an EigenSpokes pattern wherein, when plotted against each other, they have clear, separate lines that neatly align along specific axes The athors sggested that EigenSpokes are associated with the presence of a large nmber of tightlyknit commnities embedded in very sparse graphs Ying and W [29] showed that node coordinates in the adjacency eigenspace of a graph with well strctred commnities form qasi-orthogonal lines (not necessarily axes aligned) and developed a framework to qantify importance (or non-randomness) of a node or a link by exploiting the line orthogonality property However, no theoretical analysis was presented [Ying and W, 29; Prakash et al, 21] to demonstrate why and when this line orthogonality property holds In this paper, we condct theoretical stdies based on matrix pertrbation theory Or theoretical reslts demonstrate why the line orthogonality pattern exists in the adjacency eigenspace Specifically we show that 1) spectral coordinates of nodes withot direct links to other commnities locate exactly on the orthogonal lines; 2) spectral coordinates of nodes with links to other commnities deviate from lines; and 3) for a network with k commnities there exist k orthogonal lines (and each commnity forms one line) in the spectral sbspace formed by the first k eigenvectors of the adjacency matrix We frther give explicit formla (as well as its conditions) to qantify how mch orthogonal lines rotate from the canonical axes and how far spectral coordinates of nodes with direct links to other commnities deviate from the line of their own commnity We also examine the spectral spaces of the Laplacian matrix and the normal matrix nder the pertrbation framework Or findings show that the line orthogonality pattern in general does not hold in the Laplacian eigenspace or the normal eigenspace We frther provide theoretical explanations The discovered orthogonality property in the adjacency eigenspace has potential for a series of applications In this paper, we present an effective graph partition algorithm, AdjClster, which tilizes the line orthogonality property in the adjacency eigenspace The idea is to project node coordinates (along the orthogonal lines) onto the nit sphere in the spectral space and then apply the classic k-means to find clsters Or empirical evalations on synthetic data and real-world social networks validate or theoretical findings and show the effectiveness of or graph partition algorithm 2 Preliminaries 21 Notation A network or graph G is a set of n nodes connected by a set of m links The network considered here is binary, symmetric, connected, and withot self-loops It can be represented as the symmetric adjacency matrix A n n with a ij = 1 if node i is connected to node j and a ij =otherwise Let be the i-th largest eigenvale of A and x i the corresponding eigenvector x ij denotes the j-th entry of x i Formla (1) illstrates or notation The eigenvector x i is represented as a colmn vector The leading eigenvectors x i (i =1,,k) corresponding to the largest k eigenvales contain most topological information of the original graph in the spectral space The k-dimensional spectral space is spanned by (x 1, x k ) When we project node in the k-dimensional sbspace with x i as the basis, the row vector α = (x 1,x 2,,x k ) 2349

is its coordinate of in this sbspace We call α the spectral coordinate of node The eigenvector x i becomes the canonical basis denoted by ξ i =(,,, 1,,),where the i-th entry of ξ i is 1 x 1 x i x k x n x 11 x i1 x k1 x n1 α x 1 x i x k x n x 1n x in x kn x nn 22 Spectral Pertrbation Spectral pertrbation analysis deals with the change of the graph spectra (eigenvales and eigenvector components) when the graph is pertrbed For a symmetric n n matrix A with a symmetric pertrbation E, the matrix after pertrbation can be written as à = A + E Let be the i-th largest eigenvale of A with its eigenvector x i Similarly, and x i denote the eigenvale and eigenvector of à It has been shown that the pertrbed eigenvector x i can be approximated by a linear fnction involving all original eigenvectors (refer to Theorem V28 in [Stewart and Sn, 199]) We qote it as below Lemma 1 Let U = (x 1,,x i 1, x i+1,,x n ), S = diag(,, 1,+1,,λ n ), and ij = x T i Ex j The eigenvector x i (i =1,,k) can be approximated as: (1) x i x i + U( I S) 1 U T Ex i (2) when the following conditions hold: 1 δ = +1 x T i Ex i 2 U T EU 2 > ; 2 γ = U T Ex i 2 < 1 2 δ In this paper, we simplify its approximation by only sing the leading k eigenvectors when the first k eigenvales are significantly greater than the rest ones Based on the simplified approximation shown in Theorem 1, we are able to prove the line orthogonality pattern in the adjacency eigenspace Theorem 1 Assme that the conditions in Lemma 1 hold Frther assme that λ j, for any i =1,,k and j = k +1,,n Then, the eigenvector x i (i =1,,k) can be approximated as: k x i x i + x j + 1 Ex i (3) λ j Proof With Lemma 1, we have x i x i + U( I S) 1 U T Ex i n = x i + x j λ j = x i + k λ j x j + n j=k+1 λ j x j Since λ j for all i =1,,k and j = k +1,,n, λ j ji, and we frther have k n x i x i + x j + x j (4) λ j Note that n j=k+1 j=1,j =i x j n j=1 x j = j=k+1 n x T j Ex i j=1 x j = 1 n Ex i, x j x j = 1 Ex i (5) j=1 The last eqality of (5) is becase x j (j =1,,n)formsan orthogonal basis of the n-dimensional space, and x T j Ex i is jst the projection of vector Ex i onto one of the basis vector x j Combining (4) and (5), we get Eqation (3) 3 Spectral Analysis of Graph Topology We treat the observed graph as a k-block diagonal network (with k disconnected commnities) pertrbed by a matrix consisting all cross-commnity edges and examine pertrbation effects on the eigenvectors and spectral coordinates in the adjacency eigenspace 31 Graph with k Disconnected Commnities For a graph with k disconnected commnities C 1,,C k of size n 1,,n k respectively ( i n i = n), its adjacency matrix A can be written as a block-wise diagonal matrix: A = A 1 A k, (6) where A i is the n i n i adjacency matrix of C i Let λ Ci be the largest eigenvale of A i in magnitde with eigenvector x Ci R ni Withot loss of generality, we assme λ C1 > > λ Ck Since the entries of A i are all nonnegative, with Perron-Frobenis theorem [Stewart and Sn, 199], λ Ci is positive and all the entries x Ci are non-negative When C i contains one dominant component or does not have a clear inner-commnity strctre, the magnitde of λ Ci is significantly larger than the rest eigenvales of A i [Chng et al, 23] Hence when the k disconnected commnities are comparable, = λ Ci, i = 1,,k (the eigenvales and eigenvectors of A i are natrally the eigenvales of A) Here we call two commnities C i and C j are comparable if both of the second largest eigenvales of A i and A j are smaller than λ Ci and λ Cj Two commnities are not comparable when one of them contains either too few edges or nodes and hence does not contribte mch to the graph topology Lemma 2 For a graph with k disconnected comparable commnities as shown in (6), for all i = 1,,k and j = k +1,,n, λ j The first k eigenvectors of A have the following form: x C1 x C2 (x 1, x 2,, x k )=, x Ck 235

6 9 15 11 16 19 12 2 7 4 18 1 17 13 8 14 3 5 27 39 36 26 21 45 31 25 37 41 3 42 35 33 34 4 28 32 23 29 43 38 24 and all the entries of x i are non-negative When we project each node in the sbspace spanned by x 1, x 2,, x k, we have the following reslt Proposition 1 For a graph with k disconnected comparable commnities as shown in (6), spectral coordinates of all nodes locate on the k axes ξ 1,,ξ k where ξ i = (,,, 1,,) is the canonical basis and the i-th entry of ξ i is 1 Specifically, for any node C i, its spectral coordinate has the form α =(,,,x i,,, ) (7) The position of non-zero x i in (7) indicates the commnity that node belongs to; and the vale of x i indicates the weight or importance of node within the commnity C i and hence captres the magnitde of belongings x 2 4 35 3 25 2 15 1 5 5 v t t 1 1 1 2 3 4 x 1 v (a) Topology snapshot s (b) Disconnected graph A y 2 4 35 3 25 2 15 1 5 5 t r 2 v 1 1 1 2 3 4 y 1 s s r 1 (c) Pertrbed graph à Figre 1: Ble circles are the 25 nodes in one commnity and red sqares are the 2 nodes in the other commnity Edges are added across two commnities 2-D case For a graph with two disconnected commnities C 1 and C 2 All the nodes from C 1 lie on the line that passes throgh the origin and the point (1, ) and nodes from C 2 lie on the line that passes throgh the origin and the point (, 1) We show a synthetic graph with two disconnected commnities in Figre 1(a) The solid lines are links within each commnity Figre 1(b) shows the spectral coordinates in the 2-D scatter plot when the two commnities are disconnected We can see that all nodes lie along the two axes 32 Spectral Properties of Observed Graphs Based on Theorem 1, we derive the approximation of the pertrbed spectral coordinate α, which is determined by the original spectral coordinate of itself and that of its neighbors in other commnities Theorem 2 Denote an observed graph as à = A+E where A is as shown in (6) and E denotes the edges across commnities For a node C i,letγ j denote its neighbors in C j for j i, and Γ i = The spectral coordinate of can be approximated as α x i r i + v Γ 1 e v x 1v,, v Γ k e v x kv (8) where scalar x i is the only non-zero entry in its original spectral coordinate shown in (7), e v is the (, v) entry of E, and r i is the i-th row of the following matrix 1 12 λ 2 1k 21 R = λ 2 1 2k λ 2 (9) k1 k2 λ 2 1 Proof With Theorem 1, the leading k eigenvectors of à can be approximated as x i x i + k λ j x j + 1 Ex i Ptting the k colmns together, we have ( x 1,, x k ) (x 1,, x k )R + E( x 1,, x k ) (1) Note that when A can be partitioned as in (6), and the original coordinate α has only one non-zero entry x i as shown in (7), the -th row of ( x i,, x k ) in (1) can be simplified as: α x i ( i1,, i,i 1 1, 1, i,i+1 +1 λ, ik ( ) 1 1 + e v x 1v,, e v x kv, v C 1 v C k = x i r i + e v x 1v,, e v x kv v Γ 1 v Γ k ) Note that e v in the right hand side (RHS) of (8) can be frther removed since e v =1inor setting We inclde e v there for extension to general pertrbations Or next reslt shows that spectral coordinates also locate along k qasiorthogonal lines r i (the i-th row of R), instead of exactly on the axes ξ i when the graph is disconnected Proposition 2 For a graph à = A+E, spectral coordinates form k approximately orthogonal lines Specifically, for any node C i, if it is not directly connected with other commnities, α lies on the line r i ;otherwise,α deviates from lines r i (i =1,,k), where r i is the i-th row of matrix R shown in Eqation (9) Proof First we prove that node C i locates on the line r i When node has no connections to other commnities, 2351

the second part of the RHS of (8) is Hence α x i r i When node has some connections otside C i, the second part of its spectral coordinate in (8) is not eqal to, andit ths deviates from line r i Next we prove that lines r i are approximate orthogonal Let W = R I, thenw T + W = since ij = Hence R T R = (I + W T )(I + W ) = I W T W The it λ t tj λ j λ t Note (i, j) entry of matrix W T W is t i,j that the conditions of Theorem 1 imply that it = x T i Ex t is mch smaller than λ t, and hence W T W Then, R T R I, and we prove the orthogonality property 2-D case Nodes from C 1 lie along line r 1, while nodes from C 2 lie along line r 2,where 12 21 r 1 =(1, ), r 2 =(, 1) λ 2 λ 2 Note that r 1 and r 2 are orthogonal since r 1 r2 T = For nodes that have connections to the other commnity, eg, nodes and v shown in Figre 1(a), their spectral coordinates scatter between two lines For node, its spectral coordinate can be approximated as ) ( 12 v Γ α x 1 (1, +, x ) 2 2v (11) λ 2 λ 2 Its spectral coordinate jmps away from line r 1 The magnitde of jmp is determined by spectral coordinates of its connected nodes in the commnity C 2, as shown by the second parts of RHS of (11) Since the jmp vector is nonnegative, node gets closer to line r 2 Similarly, we can see for node v jmps towards line r 1 In Figre 1(c), we can also see that both r 1 and r 2 rotate clockwisely from the original axes This is becase 12 = x T 1 Ex 2 = i,j e ijx 1i x 2j > There is a negative angle θ between line r 1 and x-axis since tan θ = 12 λ 2 < 33 Laplacian and Normal Eigenspaces Or pertrbation framework based on the adjacency eigenspace tilizes the eigenvectors of the largest k eigenvales, which are generally stable (de to large eigen-gaps) nder pertrbation The line orthogonality property shown in Theorem 2 and Proposition 2 is based on the approximation shown in Theorem 1 Recall that Theorem 1 is derived from Lemma 1 that involves two conditions These two conditions are natrally satisfied if the eigen-gap of any k leading eigenvales is greater than 3 E 2 ( E 2 is the largest eigenvale of E), which garantees the relative smaller change and the order of the eigenvectors preserved after pertrbation For condition 1, it is easy to verify that x T i Ex i 2 = Since U T EU 2 E 2 for graph A with k disconnected comparable commnities, the condition holds when the eigengap +1 is greater than E 2 For condition 2, we can see U T Ex i 2 is also mch smaller than E 2 Hence, condition 2 is satisfied when the eigengap +1 is greater than 3 E 2 Note that E 2 is bonded by the maximm row sm of E and tends to be small when the pertrbation edges are randomly added Next we examine the spectral spaces of the Laplacian matrix and the normal matrix and explain why the line orthogonality does not hold in their eigenspaces The Laplacian matrix L is defined as L = D A,whereD = diag{d 1,,d n } and d i is the degree of node i The normal matrix N is defined as N = D 1 2 AD 1 2 We can easily derive that, for the blockwise diagonal graph, the spectral coordinate of node C i in the Laplacian eigenspace is (,,1,,) where the i- th entry is 1, indicating the node s commnity whereas the coordinate in the normal eigenspace is (,, d,,) Note that the k eigenvectors corresponding to the smallest eigenvales of L captre the commnity strctre However, Lemma 1 is not applicable to L in general nder pertrbation, becase the gap between the k smallest eigenvales and the rest ones is too small and the two conditions in Lemma 1 are violated For the normal matrix, all the eigenvales of N are between 1 and 1 The conditions in Lemma 1 do not hold either becase the eigen-gaps is generally smaller than ΔN 2 Hence it is impossible to explicitly express the pertrbed spectral coordinates sing the original ones and the pertrbation matrix in the Laplacian or normal eigenspace As a reslt, the line orthogonality disappears in the Laplacian or the normal eigenspace 4 Adjacency Eigenspace based Clstering In this section, we present a commnity partition algorithm, AdjClster, which tilizes the line orthogonality pattern in the spectral space of the adjacency matrix When a graph contains k clear commnities, there exist k qasi-orthogonal lines in the k-dimensional spectral space and each line corresponds to a commnity in the graph The spectral coordinate α shold be close to the line corresponding to the commnity that the node belongs to In general, the idea of fitting k orthogonal lines directly in the k-dimensional space is complex As shown in Algorithm 1, we project each spectral coordinate α to the nit sphere in the k-dimensional sbspace by normalizing α to its nit length (line 3) We expect to observe that nodes from one commnity form a clster on the nit sphere Hence there will be k well separated clsters on the nit sphere We apply the clstering k-means algorithm on the nit sphere to prodce a partition of the graph (line 4) Algorithm 1 AdjClster: Adjacency Eigenspace based Clstering Inpt: A,K Otpt: Clstering reslts 1: Compte x 1,,x K by the eigen-decomposition of A 2: for k =2,,K do 3: α =(x 1,,x k ) and ᾱ = α α ; 4: Apply k-means algorithm on {ᾱ } =1,,n ; 5: Compte fitting statistics from k-means algorithm ; 6: end for 7: Otpt partitions nder k with the best fitting statistics To evalate the qality of the partition and determine the k, we se the classic Davies-Boldin Index (DBI ) [Davies and Boldin, 1979] The low DBI indicates otpt cls- 2352

ters with low intra-clster distances and high inter-clster distances When the graph contains k clear commnities, we expect to have the minimm DBI after applying k-means in the k-dimensional spectral space We also expect all the angles between centroids of the otpt clsters are close to 9 since spectral coordinates form qasi-orthogonal lines in the determined k-dimensional spectral space However, in the sbspace spanned by fewer or more eigenvectors, the coordinates scatter in the spaces and do not form clear orthogonal lines, hence we will not obtain a very good fit after applying the k-means on the nit sphere Calclation of the eigenvectors of an n n matrix takes in general a nmber of operations O(n 3 ), which is almost inapplicable for large networks However, in or framework, we only need to calclate the first K eigen-pairs We can determine the appropriate K as examining the eigen-gaps [Stewart and Sn, 199] Frthermore, adjacency matrices in or context are sally sparse The Arnoldi/Lanczos algorithm [Golb and Van Loan, 1996] generally needs O(n) rather than O(n 2 ) floating point operations at each iteration 5 Evalation We se several real network data sets in or evalation: Political books and Political blogs 1, Enron 2, and Facebook dataset [Viswanath et al, 29] We also generate two synthetic graphs: Syn-1 andsyn-2 TheSyn-1 has5 commnities with the nmber of nodes 2, 18, 17, 15, and 14 respectively, and each commnity is generated separately with a power law degree distribtion with the parameter 23 We add cross commnity edges randomly and keep the ratio between inter-commnity edges and inner-commnity edges as 2% in Syn-1 Syn-2 is the same as the Syn-1 exceptthatwe increase the nmber of links between commnity C 4 and C 5 to 8% As a reslt, the Syn-2 has for commnities 51 Line Orthogonality Property x 5 5 4 3 2 1 1 2 2 2 x 4 4 6 4 2 (a) Synthetic-1: x 3, x 4, x 5 x 3 2 C 3 C 4 C 5 4 x 5 4 3 2 1 1 2 3 4 2 x3 2 4 2 (b) Syntetic-2: x 3, x 4, x 5 Figre 2: The plots of spectral coordinates for synthetic networks We se spectral scatter plots to check the line orthogonality property in varios networks We learn that for Syn-1 there exist five orthogonal lines in the spectral space De to the 1 http://www-personalmiched/ mejn/ netdata/ 2 http://wwwcscmed/ enron/ x 4 C 3 C 4 C 5 2 Table 1: Statistics of the spectra for some networks δ vales for both Laplacian and normal (shown in bold) and ΔL 2 for Lapalcian and ΔN 2 for normal (shown in italic) violate conditions in Lemma 1 Polbooks Polblogs Syn-1 Syn-2 Adjacency matrix γ 59 695 387 316 δ 38 38 244 323 +1 582 396 765 826 E 2 278 1361 699 661 Laplacian matrix γ 154 121 41 411 δ -117-735 -237-2537 μ k μ k+1 24 16 3 3 ΔL 2 112 693 158 1564 Normal matrix γ 144 15 24 27 δ -526-29 -14-17 ν k ν k+1 139 7 2 2 ΔN 2 65 35 76 78 space limit, we only show the three orthogonal lines (corresponding to commnities C3, C4, and C5 denoted by different colors) in the space spanned by x 3, x 4, x 5 in Figre 2(a) For Syn-2, we can observe in Figre 2(b) that there is no clear line orthogonality pattern in the space spanned by x 3, x 4, x 5 since there are actally for commnities in Syn-2 Or theoretical analysis in Section 33 showed that the orthogonality pattern does not held in either Laplacian or normal eigenspace becase their small eigen-gap vales affect the stability of the spectral space (Recall the conditions in Lemma 1 and Theorem 1) Table 1 shows the calclated vales of γ, δ, eigen-gap, and the magnitde of pertrbations in adjacency, Laplacian, and normal eigenspaces for varios networks We can see that for adjacency matrices, all the networks generally satisfy conditions, which explains line orthogonality patterns in their adjacency eigenspaces However, for Laplacian or normal matrices, none of networks satisfies the conditions For example, all δ vales for Laplacian or normal matrix (shown in bold) are less than zero, violating Condition 1 in Lemma 1; all vales of ΔL 2 or ΔN 2 (shown in italic) are less than their corresponding eigengaps, incrring the violation of Condition 2 in Lemma 1; and the eigengaps ( μ k μ k+1, ν k ν k+1 ) are relatively small, violating the condition in Theorem 1 Hence, the orthogonality pattern does not held in Laplacian or normal eigenspaces (we skip their scatter plots de to space limitations) 52 Qality of Commnity Partition Table 2 shows the qality of or graph partition algorithm AdjClster The algorithm chooses the vale of k that incrs the minimm DBI for each network data set For a network with a clear commnity strctre, we expect that the DBI is small, the modlarity is away from zero, and the average angle is close to 9 since there exist k qasi-orthogonal lines in the spectral space We can see from Table 2 that all networks show relatively clear commnity strctres The original data descriptions of Polbooks and Polblogs (and Syn-1/Syn-2) provide node-commnity relations So we 2353

Table 2: Statistics of networks and partition qality of Adj- Clster ( k is the nmber of commnities, DBI isthe Davies-Boldin Index, Angle is the average angle between centroids, and Q is the modlarity) Dataset n m k DBI Angle Q Syn-1 84 4917 5 45 87 37 Syn-2 84 5389 4 49 765 34 Polbooks 15 441 2 15 838 45 Polblogs 1222 16714 2 17 94 42 Enron 148 869 6 59 889 48 Facebook 63392 816886 9 83 836 51 Table 3: Accracy (%) of clstering reslts ( Lap denotes the geometric Laplacian clstering, NCt denotes the normalized ct, HE denotes the modlarity based clstering, and SpokEn denotes EigenSpoke) Dataset AdjClster Lap NCt HE SpokEn Syn-1 98 575 844 491 42 Syn-2 851 628 81 459 447 Polbooks 967 935 967 88 935 Polblogs 947 588 953 924 919 are able to compare different algorithms in terms of accracy k i=1 The accracy is defined as Ci Ĉi n where Ĉi denotes the i-th commnity prodced by different algorithms In or experiment, we compare or AdjClster with for graph partition algorithms: one Laplacian based algorithm (the geometric spectral clstering) [Miller and Teng, 1998], one normal based algorithm (the normalized ct [Shi and Malik, 2]), one modlarity based agglomerative clstering algorithm (HE [Wakita and Tsrmi, 27]), and the EigenSpoke algorithm (SpokEn [Prakash et al, 21]) Table 3 shows the accracy vales on the above for networks Note that we cannot report accracy vales for Enron or Facebook since we do not know abot their exact tre commnity partitions We can see that the qality of the partitioning prodced by or algorithm AdjClster is better than (or comparable with) that prodced by the normalized ct in terms of accracy On the contrary, the Laplacian spectrm based algorithm, the modlarity based agglomerative clstering algorithm, and the EigenSpoke algorithm prodce significant low accracy vales, which matches or theoretical analysis 6 Conclsion and Ftre Work In this paper we have demonstrated the line orthogonality in the adjacency eigenspace Using this orthogonality property, we presented or graph partition algorithm AdjClster and showed its effectiveness for commnity partition Althogh we mainly focsed on theoretical stdies of the line orthogonality property in this paper, we believe many applications based on the line orthogonality property cold be developed For example, we cold develop adaptive clstering methods sing adjacency matrix pertrbation for tracking changes in clsters over time We cold also identify bridging nodes by examining otliers from k-means otpt Bridging nodes are the nodes connecting to mltiple commnities In the spectral space, they are neither close to the origin, nor close to any fitted orthogonal line corresponding to a certain commnity Therefore, we cold mark a node as a bridging one if its projection in the nit sphere is not close to any centriod In or ftre work, we will explore the line orthogonality property in more (and larger) social networks and condct complete comparisons with other recently developed spectral clstering algorithms (eg, [Hang et al, 28]) We also plan to extend or stdies to signed graphs which contain both positive and negative edges Acknowledgment This work was spported in part by US National Science Fondation (CCF-147621, CNS-83124) for X Ying, L W, and X W and by NSFC (617397, 612162) and JiangsSF (BK2818) for Z-H Zho References [Chng et al, 23] Fan Chng, Linyan L, and Van V Eigenvales of random power law graphs Annals of Combinatorics, 7:21 33, 23 [Davies and Boldin, 1979] David Davies and Donald Boldin A clster separation measre IEEE Tran on Pattern Analysis and Machine Intelligence, 1:224 227, 1979 [Golb and Van Loan, 1996] Gene H Golb and Charles F Van Loan Matrix Comptations The Johns Hopkins University Press, 1996 [Hang et al, 28] Ling Hang, Donghi Yan, Michael I Jordan, and Nina Taft Spectral clstering with pertrbed data In NIPS, pages 75 712, 28 [Miller and Teng, 1998] John R Gilbert Gary L Miller and Shang-Ha Teng Geometric mesh partitioning: Implementation and experiments SIAM Jornal on Scientific Compting, 19:291 211, 1998 [Prakash et al, 21] B Aditya Prakash, Ashwin Sridharan, Mknd Seshadri, Sridhar Machiraj, and Christos Falotsos Eigenspokes: Srprising patterns and scalable commnity chipping in large graphs In PAKDD, 21 [Shi and Malik, 2] Jianbo Shi and Jitendra Malik Normalized cts and image segmentation IEEE Trans on Pattern Analysis and Machine Intelligence, 22(8):888 95, 2 [Stewart and Sn, 199] G W Stewart and Ji-gang Sn Matrix Pertrbation Theory Academic Press, 199 [Viswanath et al, 29] Bimal Viswanath, Alan Mislove, Meeyong Cha, and Krishna P Gmmadi On the Evoltion of User Interaction in Facebook In WOSN, 29 [Wakita and Tsrmi, 27] Ken Wakita and Toshiyki Tsrmi Finding commnity strctre in mega-scale social networks: [extended abstract] In WWW, pages 1275 1276, 27 [Ying and W, 29] Xiaowei Ying and Xintao W On randomness measres for social networks In SDM, 29 2354