CLUSTERING over graphs is a classical problem with

Size: px
Start display at page:

Download "CLUSTERING over graphs is a classical problem with"

Transcription

1 Maximum Likelihood Latent Space Embedding of Logistic Random Dot Product Graphs Luke O Connor, Muriel Médard and Soheil Feizi ariv:5.85v3 [stat.ml] 3 Aug 27 Abstract A latent space model for a family of random graphs assigns real-valued vectors to nodes of the graph such that edge probabilities are determined by latent positions. Latent space models provide a natural statistical framework for graph visualizing and clustering. A latent space model of particular interest is the Random Dot Product Graph (RDPG), which can be fit using an efficient spectral method; however, this method is based on a heuristic that can fail, even in simple cases. Here, we consider a closely related latent space model, the Logistic RDPG, which uses a logistic link function to map from latent positions to edge likelihoods. Over this model, we show that asymptotically exact imum likelihood inference of latent position vectors can be achieved using an efficient spectral method. Our method involves computing top eigenvectors of a normalized adjacency matrix and scaling eigenvectors using a regression step. The novel regression scaling step is an essential part of the proposed method. In simulations, we show that our proposed method is more accurate and more robust than common practices. We also show the effectiveness of our approach over standard real networks of the karate club and political blogs. Index Terms Latent space models, Stochastic block models, Maximum likelihood INTRODUCTION CLUSTERING over graphs is a classical problem with applications in systems biology, social sciences, and other fields [], [2], [3], [4]. Although most formulations of the clustering problem are NP-hard [5], several approaches have yielded useful approximate algorithms. The most wellstudied approach is spectral clustering. Most spectral methods are not based on a particular generative network model; alternative, model-based approaches have also been proposed, using loopy belief propagation [4], variational Bayes [5], Gibbs sampling [6], and semidefinite programming [22], [23]. Many spectral clustering methods are derived by proposing a discrete optimization problem, and relaxing it to obtain a continuous, convex optimization whose solution is given by the eigenvectors of a normalized adjacency matrix or Laplacian. A post-processing step, typically k- means, is used to extract clusters from these eigenvectors [6], [7]. Different matrix normalizations/transformations include Modularity [], Laplacian [9], normalized Laplacian [], [], [26], and Bethe Hessian [2]. These methods are often characterized theoretically in the context of the Stochastic Block Model (SBM) [3], a simple and canonical model of community structure. Theoretical bounds on the detectability of network structure for large networks have been established for stochastic block models [4], [9], [2], [2]. Several spectral methods have been shown to achieve this recovery threshold [8], [2], [7]. Strong theoretical and empirical results have also been obtained using SDP-based [22], [23] and Belief Propagation-based [4] methods, which often have higher computational complexity than spectral methods. A threshold has also been discovered for perfect S. Feizi is the corresponding author. L. O Connor and S. Feizi contributed equally to this work. L. O Connor is with Harvard University. S. Feizi is with Stanford Universiy. M. Médard is with Massachusetts Institute of Technology (MIT). clustering, i.e. when community structure can be recovered with zero errors [23], [24]. An alternative approach to the clustering problem is to invoke a latent space model. Each node is assigned a latent position v i, and the edges of the graph are drawn independently with probability p ij = g(v i, v j ) for some g( ). Hoff et al. [3] considered two latent space models, the first of which was a distance model, P i,j = l( v i v j µ + βx ij ), l(x) = /( + exp( x)) where edge probabilities depend on the Euclidean distance between two nodes. x ij is a fixed covariate term, which is not learned. Their second model is called a projection model: P i,j = l( v i v j v j + βx ij µ). () Hoff et al. suggest to perform inference using an MCMC approach over both models. Focusing on the distance model, they have also extended their approach to allow the latent positions to themselves be drawn from a mixture distribution containing clusters [32], [33]. Efforts to improve the computational efficiency have been made in references [34], [35], and with a related mixed membership blockmodel in reference [5]. Young et al. [28] introduced the Random Dot Product Graph (RDPG), in which the probability of observing an edge between two nodes is the dot product between their respective latent position vectors. The RDPG model can be written as P i,j = v i v j. This model is related to the projection model of Hoff et al. (). The RDPG provides a useful perspective on spectral clustering, in two ways. First, it has led to theoretical advances, including a central limit theorem for eigenvectors of an adjacency matrix [29] and a justification for k-means

2 2 clustering as a post processing step for spectral clustering [25]. Second, as a natural extension of the SBM, the RDPG describes more comprehensively the types of network structures, such as variable degrees and mixed community memberships, that can be inferred using spectral methods. Let V be the n d matrix of latent positions, where n is the number of nodes and d is the dimension of the latent vectors (d n). Sussman et al. [25] proposed an inference method over the RDPG based on the heuristic that the first eigenvectors of A will approximate the singular vectors of V, as E(A) = V V T. They also characterized the distribution of the eigenvectors of A given V. This heuristic can fail, however, as the first eigenvector of the adjacency matrix often separates high-degree nodes from low-degree nodes, rather than separating the communities. This problem occurs even in the simplest clustering setup: a symmetric SBM with two clusters of equal size and density. Therefore, we were motivated to develop a more robust inference approach. In this paper, we consider a closely related latent space model, the Logistic RDPG, which uses a logistic link function mapping from latent positions to edge probabilities. Like the previously-studied RDPG, the logistic RDPG includes most SBMs as well as other types of network structure, including a variant of the degree corrected SBM. The logistic RDPG is also similar to the projection model, which uses a logistic link function but models a directed graph (with p i,j p j,i ). Over this model, we show that the imum likelihood latent-position inference problem admits an asymptotically exact spectral solution. Our method is to take the top eigenvectors of the mean-centered adjacency matrix and to scale them using a logistic regression step. This result is possible because over the logistic model specifically, the likelihood function separates into a linear term that depends on the observed network and a nonlinear penalty term that does not. Because of its simplicity, the penalty term admits a Frobenius norm approximation, leading to our spectral algorithm. A similar approximation is not obtained using other link functions besides the logistic link. We show that the likelihood of the approximate solution approaches the imum of the likelihood function when the graph is large and the latent-position magnitudes go to zero. The asymptotic regime is not overly restrictive, as it encompasses many large SBMs at or above the detectability threshold [4]. We compare the performance of our method in the graph clustering problem with spectral methods including the Modularity method [], the Normalized Laplacian [9] and the Bethe Hessian [2], and the SDP-based methods [22], [23]. We show that our method outperforms these methods over a broad range of clustering models. We also show the effectiveness of our approach over real networks of karate club and political blogs. 2 LOGISTIC RANDOM DOT PRODUCT GRAPHS In this section, we introduce the logistic RDPG, describe our inference method, and show that it is asymptotically equivalent to the imum likelihood inference. 2. Definitions Let A be the set of adjacency matrices corresponding to undirected, unweighted graphs of size n. Definition (Stochastic Block Model). The Stochastic Block Model is a family of distributions on A parameterized by (k, c, Q), where k is the number of communities, c [k] n is the community membership vector and Q [, ] k k is the matrix of community relationships. For each pair of nodes, an edge is drawn independently with probability P i,j := P r(a ij = c i, c j, Q) = Q ci,c j. Another network model that characterizes low dimensional structures is the Random Dot Product Graph (RDPG). This class of models includes many SBMs. Definition 2 (Random Dot Product Graph). The Random Dot Product Graph with link function g( ) is a family of distributions on A parameterized by an n d matrix of latent positions V R n d. For each pair of nodes, an edge is drawn independently with probability P i,j := P r(a ij = V ) = g(v i v j ), (2) where v i, the i-th row of the matrix V, is the latent position vector assigned to node i. The RDPG has been formulated using a general link function g( ) [28]. The linear RDPG, using the identity link, has been analyzed in the literature because it leads to a spectral inference method. We will refer to this model as either the linear RDPG or as simply the RDPG. In this paper, we consider the Logistic RDPG: Definition 3 (Logistic RDPG). The logistic RDPG is the RDPG with link function: g(x) = l(x µ), l(x) :=, (3) + e x where µ is the offset parameter of the logistic link function. Note that this model is similar to the projection model of Hoff et al. [3] (). The projection model is for a directed graph, with P i,j P j,i owing to the division by v j. Remark. The parameter µ in the logistic RDPG controls the sparsity of the network. If the latent position vector lengths are small, the density of the graph is n(n ) E(A ij ) l( µ). (4) i<j A logistic RDPG with this property is called centered. In Section 2.2, we show that asymptotically exact imum likelihood inference of latent positions over the centered logistic RDPG can be performed using an efficient spectral algorithm. For a general RDPG, the ML inference problem is: Definition 4 (ML inference problem for the RDPG). Let A A. The ML inference problem over the RDPG is: A ij log g( i,j ) + ( A ij ) log( g( i,j )), = V V T, V R n d. Note that in the undirected case, this objective function is twice the log likelihood. (5)

3 Remark 2. A convex semidefinite relaxation of Optimization (5) can be obtained for some link functions as follows: A ij log g( ij ) + ( A ij ) log( g( ij )),, where means that is psd. For example, this optimization is convex for the logistic link function (i.e., g(x) = l(x µ)) and for the linear link function (i.e., g(x) = x). However, this optimization can be slow in practice, and it often leads to excessively high-rank solutions. 2.2 Maximum-Likelihood Inference of Latent Position Vectors Here, we present an efficient, asymptotically exact spectral algorithm for the imum-likelihood (ML) inference problem over the logistic RDPG, subject to mild constraints. We assume that the number of dimensions is given. In practice, these parameters are often set manually, but approaches have been proposed to automatically detect the number of dimensions [4]. The proposed imum-likelihood inference of latent position vectors for the logistic RDPG is described in Algorithm. In the following, we sketch the derivation of this algorithm in a series of lemmas. Proofs for these assertions are presented in Section 7. First, we simplify the likelihood function of Optimization (5) using the logistic link function. Let F () be the log-likelihood, and let the link function be g(x) = l(x µ). Then: F () := i,j = i,j = i,j (6) A ij log l( ij µ) + ( A ij ) log( l( ij µ)) A ij log l( ij µ) l( ij µ) + log( l( ij µ)) A ij ( ij µ) + log( l( ij µ)). (7) We have used that log(l(x)/( l(x))) = x. The imum likelihood problem takes the following form (for given µ): T r(a) + log( + e (ij µ) ), = V V T, V R n d. (8) The objective function has been split into a linear term that depends on the adjacency matrix A and a penalty term that does not depend on A. This simplification is what leads to tractable optimization, and it is the reason that the logistic link is needed; using e.g. a linear link, an optimization of this form is not obtained. We define a penalty function f(x) that keeps only the quadratic and higher-order terms in the penalty term of (8). Let f(x) := (h(x) h() h ()x), Now, h () = l( µ). Let B := A l( µ) n n h(x) = log( l(x µ)). in order to re-write Optimization (8) as: T r(b) = V V T, V R n d. 3 f( ij ), (9) Note that for a centered RDPG with average density Ā, µ = l (Ā), and B is the mean-centered adjacency matrix A Ā n n. In the next step, we convert the penalty term in the objective function into a constraint: Lemma. Suppose that is the optimal solution to Optimization (5). Let h := f( n(n ) ij). Then is also the solution to the following optimization: i,j T r(b) () = V V T, V R n d f( ij ) h. n(n ) In the following key lemma, we show that the inequality constraint of Optimization () can be replaced by its second order Taylor approximation. Lemma 2. For any ɛ > and γ, there exists δ > such that for any graph whose ML solution satisfies h δ and ii γ i n ii, () the following bound is satisfied. Let B be the mean centered adjacency matrix of the chosen graph. Let s R be the optimal value of the following optimization, obtained at = : T r(b), (2) = V V T, V R n d f( ij ) h, n(n ) Let s be the optimal value of the following optimization: where a 2 := f ( µ). Then T r(b), (3) = V V T, V R n d a 2 ij 2 h. n(n ) s ( ɛ)s. The parameter h is related to the average length of latentposition vectors ( ii ). If these lengths approach zero, h approaches zero, for a fixed γ. An implication of this constraint is that the logistic RDPG must be approximately centered. Thus, there is a natural choice for the parameter µ for the purpose of inference: ˆµ = l ( n(n ) A F ). (4) i

4 Algorithm ML Inference for the logistic RDPG Require: Adjacency matrix A, number of dimensions d. (optional) number of clusters k Form the mean-centered adjacency matrix B := A /(n(n )) A. Compute d eigenvectors of B with largest eigenvalues: e,..., e d. Let i = e i e T i for i d. Perform logistic regression of the entries of A lying above the diagonal on the corresponding entries of,..., d, estimating coefficients λ,...λ d subject to the constraint that λ i i. Let V be the matrix formed by concatenating λ e,..., λ d e d. Return V. (optional) Perform k-means on V, and return the inferred clusters. 4 This estimator of µ can be viewed as the imumlikelihood estimator specifically over the centered logistic RDPG. With this choice of µ, B, the mean-centered adjacency matrix, can be written as B = A n(n ) A F. Note that changing the constant in the inequality constraint of Optimization (3) only changes the scale of the solution, since the shape of the feasible set does not change. Thus, in this optimization we avoid needing to know h a priori (as long as the conditions of Lemma 2 are satisfied). Next we show that that the solution to Optimization (3) can be recovered up to a linear transformation using spectral decomposition: Lemma 3. Let be the optimal solution to Optimization (3). Let e,..., e d be the first d eigenvectors of B, corresponding to the largest eigenvalues. Then e,.., e d are identical to the non-null eigenvectors of, up to rotation. Once the eigenvectors of are known, it remains only to recover the corresponding eigenvalues. Instead of recovering the eigenvalues of, we find the eigenvalues that imize the likelihood, given the eigenvectors of. Let i = e i e T i. Then, the imum-likelihood estimate of λ,..., λ d conditional on,..., d =,..., d can be written as follows: λ := arg λ=(λ,...,λ d ) log P (A ij,..., d, λ, µ). (5) Lemma 4. Optimization (5) can be solved by logistic regression of the entries of A on the entries of,..., d, with the constraint that the coefficients are nonnegative, and with intercept µ. These lemmas can be used to show the asymptotic optimality of Algorithm : Theorem. For all ɛ > and γ >, there exists δ > that satisfies the following. For any graph with size n and adjacency matrix A, suppose that is the solution to the optimization Let h := P (A ), = V V T, V R n d. n(n ) f(ij). If then h < δ and ii γ i n P (A = ) P (A = ) > ɛ, ii, (6) where is the solution obtained by Algorithm. Our algorithm is asymptotically exact in the sense that the likelihood ratio between our solution and the true imum converges uniformly to one as the average latent position length shrinks. Importantly, the convergence is uniform over arbitrarily large graphs; therefore, this regime contains most interesting large network models, such as an SBM with large communities that cannot be perfectly recovered. Coupling this algorithm with a k-means post processing step leads to a clustering method with robust performance under different network clustering setups. This result is stronger than the statement that an approximate objective function approximates the likelihood function at the optimum of the likelihood function. Such a result, which can be obtained for many link functions (such as an affine-linear link), is not useful because it does not follow that the optimum of the approximate function lies near the optimum of the true likelihood function. Indeed, for a linear approximation, it has no optimum since the objective function is unbounded. In order to obtain the stronger statement that the likelihood at the optimum of the approximation is large, it is necessary to use a quadratic approximation. For link functions besides the logistic link, the quadratic term in the likelihood function depends on A, and a spectral optimization method cannot be obtained. The condition in Theorem that the lengths of optimal latent vectors are sufficiently small is not restrictive for large networks. Consider a sequence of increasingly-large SBMs with two clusters of fixed relative sizes, and a convergent sequence of admissible connectivity matrices whose average density is fixed. There are three asymptotic regimes for the community structure: () in which the structure of the network is too weak to detect any clusters at all; (2) in which the communities can be partially recovered, but some misassignments will be made; and (3) in which the communities can be recovered perfectly. The true latent position lengths go to zero in regimes () and (2) as well as in part of regime (3) [23]. Theorem requires imum-likelihood latent position lengths, rather than true position lengths, to go to zero. If this is the case, and if imum likelihood achieves the optimum thresholds for partial recovery and perfect recovery, then our method will as well. i

5 5 (a) (b) (c).8 MSE of estimated latent positions MSE of estimated latent positions MSE of estimated latent positions Adjacency Logistic RDPG Modularity Adjacency Logistic RDPG Modularity Adjacency Logistic RDPG Modularity Fig.. Normalized mean squared error (one minus squared correlation) of inferred latent positions for two SBMs (a-b) and a non-sbm logistic RDPG (c). The top eigenvectors of the adjacency matrix A and the modularity matrix M do not characterize the community structure in panel (a) and in panel (b), respectively. Note that in practice, a different eigenvector could be selected or multiple eigenvectors could be used. In panel (c), the top eigenvector of A does not recover the latent structure. In contrast, our method successfully recovers the underlying latent position vectors in all cases. 3 PERFORMANCE EVALUATION OVER SYNTHETIC NETWORKS In this section, we compare the performance of our proposed method (Algorithm ) with existing methods. First, we assess the performance of our algorithm against existing methods in inference of latent position vectors of two standard SBMs depicted in Figure. The network demonstrated in panel (a) has two dense clusters. In this case, the first eigenvector of the modularity matrix M leads to a good estimation of the latent position vector while the first eigenvector of the adjacency matrix A fails to characterize this vector. This is because the first eigenvector of the adjacency matrix correlates with node degrees. The modularity transformation regresses out the degree component and recovers the community structure. However, the top eigenvector of the modularity matrix fails to identify the underlying latent position vector when there is a single dense cluster in the network, and the community structure is correlated with node degrees (Figure -b). This discrepancy highlights the sensitivity of existing heuristic inference methods in different network models (the Modularity method has not previously been considered a latent-position inference method, but we believe that its appropriate to do so). In contrast, our simple normalization allows the underlying latent position vectors to be accurately recovered in both cases. We also verified in panel (c) that our method successfully recovers latent positions for a non-sbm logistic RDPG. In this setup, the adjacency matrix s first eigenvector again correlates with node degrees, and the modularity normalization causes an improvement. We found it remarkable that such a simple normalization (mean centering) enabled such significant improvements; using more sophisticated normalizations such as the Normalized Laplacian and the Bethe Hessian, no improvements over were observed (data not shown). Second, we assessed the ability of our method to detect communities generated from the SBM. We compared against the following existing spectral network clustering methods: Modularity (Newman, 26). We take the first d eigenvectors of the modularity matrix M := A v T v/2 E, where v is the vector of node degrees and E is the number of edges in the network. We then perform k-means clustering on these eigenvectors. Normalized Laplacian (Chung, 997). We take second- through (d + )st- last eigenvectors of L sym := D /2 (D A)D /2, where D is the diagonal matrix of degrees. We then perform k-means clustering on these eigenvectors. Bethe Hessian (Saade et al., 24). We take the second- through (d + )st- last eigenvectors of H(r) := (r 2 ) n n ra + D, where r 2 is the density of the graph as defined in [2]. Unnormalized spectral clustering (Sussman et al., 22). We take the first d eigenvectors of the adjacency matrix A, and perform k-means clustering on these eigenvectors. Spectral clustering on the mean-centered matrix B. We take the first d eigenvectors of the matrix B and perform k-means on them, without a scaling step. Note that in our evaluation we include spectral clustering on the mean-centered adjacency matrix B without subsequent eigenvalue scaling of Algorithm to demonstrate that the scaling step computed by logistic regression is essential to

6 6 (a) (b) (c).2 (d). (e).6.2 (f) 3.5 Logistic RDPG Adjacency matrix Modularity matrix Laplacian matrix Centered B matrix Bethe Hessian Fig. 2. Performance comparison of our method (logistic RDPG) against spectral clustering methods in different clustering setups. Panels (a)-(e) illustrate networks that can be characterized by SBM, while panel (f) illustrates a non-sbm network model. The scale of the x axis is different in panel (f) than the rest of the panels. Our proposed method performs consistently well, while other methods exhibit sensitive and inconsistent performance in different network clustering setups. Note that in some cases, such as for the Laplacian in panel (b), performance is improved by using a different eigenvector or by using a larger number of eigenvectors. the performance of the proposed algorithm. When d =, the methods are equivalent. We also compare the performance of our method against two SDP-based approaches, the method proposed by Hajek et al. (25) and the SDP- method proposed by Amini et al. (24). For all methods we assume that the number of clusters k is given. In our scoring metric, we distinguish between clusters and communities: For instance, in Figure 2-e, there are two clusters and four communities, comprised of nodes belonging only to cluster one, nodes belonging only to cluster two, nodes belonging to both clusters, and nodes belonging to neither. The score that we use is a normalized Jaccard index, defined as: k C l Ĉσ(l) σ Sk l= C l k (7) where C l is the l-th community, Ĉl is the l-th estimated community, and S k is the group of permutations of k elements. Note that one advantage of using this scoring metric is that it weighs differently-sized clusters equally (it does not place higher weights on larger communities.). Figure 2 presents a comparison between our proposed method and existing spectral methods in a wide range of clustering setups. Our proposed method performs consistently well, while other methods exhibit sensitive and inconsistent performance in different network clustering setups. For instance, in the case of two large clusters (b) the second-to-last eigenvector of the Normalized Laplacian fails to correlate with the community structure; in the case of having one dense cluster (a), the Modularity normalization performs poorly; when there are many small clusters (c), the performance of the Bethe Hessian method is poor. In each case, the proposed method performs at least as well as the best alternate method, except in the case of several differentsized clusters (d), when the normalized Laplacian performs marginally better. In the case of overlapping clusters (e), our method performs significantly better than all competing methods. Spectral clustering on B without the scaling step also performs well in this setup; however, its performance is worse in panels (c-d) when d is larger, highlighting the importance of our logistic regression step. The values of k and d for the different simulations were: k = 2, d = ; k = 2, d = ; k = 25, d = 24; k = 7, d = 6; k = 4, d = 2; k = 2, d = for panels (a)-(f), respectively. The values of d are chosen based on the number of dimensions that would be informative to the community structure, if one knew the true latent positions. All networks have nodes, with background density.5. While spectral methods are most prominent network clustering methods owing to their accuracy and efficiency, other approaches have been proposed, notably including

7 7 (a) (b).6 (c) (d) Logistic RDPG Amini et al. SDP Hajek et al. SDP Fig. 3. Performance comparison of our method (logistic RDPG) against semidefinite programming-based clustering methods. Hajek et al. s method is designed for the case of having two equally sizes partitions, thus it is not included in Panels b-d. SDP-based methods, which solve relaxations of the imum likelihood problem over the SBM. We compare the performance of our method with the SDP-based methods proposed by Hajek et al. (25) and Amini et al. (24) (Figure 3). In the symmetric SBM, meaning the SBM with two equally-dense, equally-large communities, we find that our method performs almost equally well as the method of Hajek et al. (25), which is a simple semidefinite relaxation of the likelihood in that particular case. Our method also performs better than the method of Amini et al., which solves a more complicated relaxation of the SBM imumlikelihood problem in the more general case (Figure 3). 4 PERFORMANCE EVALUATION OVER REAL NET- WORKS To assess the performance of the Logistic RDPG over wellcharacterized real networks, we apply it to two well-known real networks. First, we consider the Karate Club network [44]. Originally a single karate club with social ties between various members, the club split into two clubs after a dispute between the instructor and the president. The network contains 34 nodes with the average degree 4.6, including two high degree nodes corresponding to the instructor and the president. Applying our method to this network, we find that the first eigenvector separates the two true clusters perfectly (Figure 4-a). In the second experiment, we consider a network of political blogs, whose edges correspond to links between blogs [45]. This network, which contains 22 nodes with nonzero degrees, is sparse (the average total degree is 27.4) with a number of high degree nodes (6 nodes with degrees larger than ). The nodes in this network have been labeled as either liberal or conservative. We apply our method to this network. Figure 4-b shows inferred latent positions of nodes of this network. As it is illustrated in this figure, nodes with different labels have been separated in the latent space. Note that some nodes are placed near the origin, indicating that they cannot be clustered confidently; this is occurred owing to their low degrees as the correlation between node degrees and distances from the origin was CODE We provide code for the proposed method in the following link: 6 DISCUSSION In this paper, we developed a spectral inference method over logistic Random Dot Product Graphs (RDPGs), and we showed that the proposed method is asymptotically equivalent to the imum likelihood latent-position inference. Previous justifications for spectral clustering have usually been either consistency results [25], [26] or partialrecovery results [7], [8]; to the best of our knowledge, our likelihood-based justification is the first of its kind for a spectral method. This type of justification is satisfying because imum likelihood inference methods can generally be expected to have optimal asymptotic performance characteristics; for example, it is known that imum likelihood estimators are consistent over the SBM [38], [39]. It remains an important future direction to characterize the asymptotic performance of the MLE over the Logistic RDPG. We have focused in this paper on the network clustering problem; however, latent space models such as the Logistic RDPG can be viewed as a more general tool for exploring and analyzing network structures. They can be used for visualization [4], [42] and for inference of partial-membership type structures, similar to the mixedmembership stochastic blockmodel [5]. Our approach can also be generalized to multi-edge graphs, in which the number of edges between two nodes is binomially distributed. Such data is emerging in areas including systems biology, in the form of cell type and tissue specific networks [43].

8 8 (a) Club Club 2 (b) Eigenvector 2 Eigenvector Liberal Conservative Eigenvector -6 Eigenvector Fig. 4. Estimated latent positions for nodes of two real networks. (a) Karate club social network. (b) Political blogs network. 7 PROOFS Proof (Proof of Lemma ). If not, then the optimal solution to Optimization () would be a better solution to Optimization (5) than. Proof 2 (Proof of Lemma 2). The origin is in the feasible set for both optimizations. For each optimization, the objective function value satisfies T r(rc) = rt r(c). Thus, the optimum is either at the origin (if there is no positive solution) or at the boundary of the feasible set. If the optimum is at the origin, we have s = s =. If not, let be be any solution to n i,j f( ij) = h. 2 Let r = F, and let r = h/a 2. Claim: fixing γ, r/r uniformly as h. Define F (a) := n 2 f(a ij / F ) i,j for a >. In addition, since r has been defined such that the quadratic term of F (r ) is a 2 i,j ( r r ij) 2 = h, we have F (r ) = h + n 2 O( i,j ( r r ij) 3 ). (8) Moreover, the Taylor series for f( ) converges in a neighborhood of zero. Because of the constraint ij = ii γ i,j i n ii, we can choose δ such that every entry ij falls within this neighborhood. This constraint also implies 3 n 2 ij ( ( ) ) 3 ij n 2 3 = O n 2 F. Substituting this into (8), we have Therefore, we have F (r) F (r ) F (r ) F (r ) = h + n 2 O(r 3 ). (9) i = h + O(r 3 ) h h = O(r ). (2) Note that f( ) is convex function with f (x) > for all x > and f (x) < for all x <. Thus F is increasing, convex, and zero-valued at the origin: for any a, b >, a b b < F (a) F (b). (2) F (b) Thus r r r = O(r ) and r r = + O(r ). Let r s be the norm of the arg to optimization (2); because the objective function is linear we have that s r r s s. Let r t be the distance to the intersection of the boundary of the feasible set with the ray from the origin through the arg to optimization (3); then s rt r s. We have shown that both ratios tend uniformly to one. This completes the proof. Proof 3 (Proof of Lemma 3). First, suppose we have prior knowledge of eigenvalues of. Denote its nonzero eigenvalues by λ,..., λ d. Then we would be able to recover the optimal solution to Optimization (3) by solving the following optimization T r(b) λ i = λ i i d rank() = d (22) Note that the Frobenius norm of a matrix is determined by eigenvalues of the matrix as follows: 2 F = T r( T ) = T r( 2 ) = λ 2 i. (23) Thus we can drop the Frobenius norm constraint in (3). Let be an n n psd matrix, whose non-null eigenvectors are the columns of a matrix E R n d, and whose respective eigenvalues are λ,..., λ d. Let V := E diag( λ,..., λ d ), so that = V V T. Rewrite the objective function as T r(b) = T r(v T BV ) = d λ i e T i Be i. i= Therefore = EE T and = V V T.

9 9 Proof 4 (Proof of Lemma 4). The upper-triangular entries of A are independent Bernoulli random variables conditional on and µ, with a logistic link function. The coefficients should be nonnegative, as is constrained to be positive semidefinite. Proof 5 (Proof of Theorem ). By Lemma, we have that the solution to optimization () is equal to the loglikelihood, up to addition of a constant. By Lemma 2, we have that for a fixed γ, as h, the quotient s / s converges uniformly to one, where s is the solution to () and s is the solution to optimization (3). The convergence is uniform over the choice of B that is needed for Theorem. Because s and s do not diverge to ±, this also implies that s s, and therefore the loglikelihood ratio, converges uniformly to zero. By Lemma 3, the non-null eigenvectors of the arg of optimization (3) are equivalent (up to rotation) to the first eigenvectors of B. Finally, by Lemma 4, the eigenvalues that imize the likelihood can be recovered using a logistic regression step. By Lemma 2, the theorem would hold if we recovered the eigenvalues solving the approximate optimization (3). By finding the eigenvalues that exactly imize the likelihood, we achieve a likelihood value at least as large. REFERENCES [] M. Girvan and M. Newman, Community structure in social and biological networks, Proc. Natl. Acad. Sci. USA, vol. 99, no. condmat/2, pp , 2. [2] S. Butenko, Clustering challenges in biological networks. World Scientific, 29. [3] N. Mishra, R. Schreiber, I. Stanton, and R. E. Tarjan, Clustering social networks, in Algorithms and Models for the Web-Graph. Springer, 27, pp [4] C.-H. Lee, M. N. Hoehn-Weiss, and S. Karim, Grouping interdependent tasks: Using spectral graph partitioning to study system modularity and performance, Available at SSRN, 24. [5] S. E. Schaeffer, On the np-completeness of some graph cluster measures, ariv preprint cs/56, 28. [6] Ng, Andrew Y., Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems 2 (22): [7] U. Von Luxburg, A tutorial on spectral clustering, Statistics and computing, vol. 7, no. 4, pp , 27. [8] M. E. Newman, Finding community structure in networks using the eigenvectors of matrices, Physical review E, vol. 74, no. 3, 26. [9] B. Mohar, Y. Alavi, G. Chartrand, and O. Oellermann, The laplacian spectrum of graphs, Graph theory, combinatorics, and applications, vol. 2, pp , 99. [] F. R. Chung, Spectral graph theory. American Mathematical Soc., 997, vol. 92. [] Shi, Jianbo, and Jitendra Malik. Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 22.8 (2): [2] A. Saade, F. Krzakala, and L. Zdeborová, Spectral clustering of graphs with the bethe hessian, in Advances in Neural Information Processing Systems, 24, pp [3] Hagen, Lars, and Andrew B. Kahng. New spectral methods for ratio cut partitioning and clustering. Computer-aided design of integrated circuits and systems, ieee transactions on.9 (992): [4] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová, Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications, Physical Review E, vol. 84, no. 6, p. 666, 2. [5] Airoldi, E. M., Blei, D. M., Fienberg, S. E., and ing, E. P. (28), Mixed Membership Stochastic Blockmodels, The Journal of Machine Learning Research, 9, [6] Snijders, T., and Nowicki, K. (997), Estimation and Prediction for Stochastic Blockmodels for Graphs With Latent Block Structure, Journal of Classifi- cation, 4, 75-. [7] Krzakala, Florent, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborov, and Pan Zhang. Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, no. 52 (23): [8] Nadakuditi, Raj Rao, and Mark EJ Newman. Graph spectra and the detectability of community structure in networks. Physical review letters 8, no. 8 (22): 887. [9] Mossel, Elchanan, Joe Neeman, and Allan Sly. Stochastic block models and reconstruction. ariv preprint ariv: (22). [2] Massouli, Laurent. Community detection thresholds and the weak Ramanujan property. Proceedings of the 46th Annual ACM Symposium on Theory of Computing. ACM, 24. [2] Mossel, Elchanan, Joe Neeman, and Allan Sly. A proof of the block model threshold conjecture. ariv preprint ariv:3.45 (23). [22] A. A. Amini and E. Levina, On semidefinite relaxations for the block model, ariv preprint ariv: , 24. [23] B. Hajek, Y. Wu, and J. u, Achieving exact cluster recovery threshold via semidefinite programming: Extensions, ariv preprint ariv: , 25. [24] Abbe, Emmanuel, Afonso S. Bandeira, and Georgina Hall. Exact recovery in the stochastic block model. ariv preprint ariv: (24). [25] D. L. Sussman, M. Tang, D. E. Fishkind, and C. E. Priebe, A consistent adjacency spectral embedding for stochastic blockmodel graphs, Journal of the American Statistical Association, vol. 7, no. 499, pp. 9 28, 22. [26] Rohe, Karl, Sourav Chatterjee, and Bin Yu. Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics (2): [27] Kraetzl, Miro, Christine Nickel, and Edward R. Scheinerman. Random dot product graphs: A model for social networks. Preliminary Manuscript, 25. [28] S. J. Young and E. R. Scheinerman, Random dot product graph models for social networks, in Algorithms and models for the webgraph. Springer, 27, pp [29] A. Athreya, V. Lyzinski, D. J. Marchette, C. E. Priebe, D. L. Sussman, and M. Tang, A central limit theorem for scaled eigenvectors of random dot product graphs, ariv preprint ariv: , 23. [3] P. W. Holland, K. B. Laskey, and S. Leinhardt, Stochastic blockmodels: First steps, Social networks, vol. 5, no. 2, pp. 9 37, 983. [3] Hoff, Peter D., Adrian E. Raftery, and Mark S. Handcock. Latent space approaches to social network analysis. Journal of the american Statistical association (22): [32] Shortreed, Susan, Mark S. Handcock, and Peter Hoff. Positional estimation within a latent space model for networks. Methodology 2. (26): [33] Handcock, Mark S., Adrian E. Raftery, and Jeremy M. Tantrum. Model-based clustering for social networks. Journal of the Royal Statistical Society: Series A (Statistics in Society) 7.2 (27): [34] Salter-Townshend, Michael, and Thomas Brendan Murphy. Variational Bayesian inference for the latent position cluster model. Analyzing Networks and Learning with Graphs Workshop at 23rd annual conference on Neural Information Processing Systems (NIPS 29), Whister, December [35] Friel, Nial, Caitriona Ryan, and Jason Wyse. Bayesian model selection for the latent position cluster model for Social Networks. ariv preprint ariv: (23). [36] T. Qin and K. Rohe, Regularized spectral clustering under the degree-corrected stochastic blockmodel, in Advances in Neural Information Processing Systems, 23, pp [37] S. Van Dongen and A.J. Enright, Metric distances derived from cosine similarity and pearson and spearman correlations, ariv preprint ariv:28.345, 22. [38] Bickel, Peter, David Choi, iangyu Chang, and Hai Zhang. Asymptotic normality of imum likelihood and its variational approximation for stochastic blockmodels. The Annals of Statistics 4, no. 4 (23): [39] Celisse, Alain, Jean-Jacques Daudin, and Laurent Pierre. Consistency of imum-likelihood and variational estimators in the stochastic block model. Electronic Journal of Statistics 6 (22):

10 [4] Zelnik-Manor, Lihi, and Pietro Perona. Self-tuning spectral clustering. In Advances in neural information processing systems, pp [4] Hall, Kenneth M. An r-dimensional quadratic placement algorithm. Management science 7, no. 3 (97): [42] Koren, Yehuda. Drawing graphs by eigenvectors: theory and practice. Computers & Mathematics with Applications 49. (25): [43] Neph, S. et al. Circuitry and dynamics of human transcription factor regulatory networks. Cell 5, (22). [44] Zachary, W. An information flow model for conflict and fission in small groups. Journal of anthropological research, 33(4):452473, (977). [45] L. A Adamic and N. Glance. The political blogosphere and the 24 us election: divided they blog. In Proceedings of the 3rd international workshop on Link discovery, page 36. ACM, (25).

arxiv: v1 [stat.ml] 29 Jul 2012

arxiv: v1 [stat.ml] 29 Jul 2012 arxiv:1207.6745v1 [stat.ml] 29 Jul 2012 Universally Consistent Latent Position Estimation and Vertex Classification for Random Dot Product Graphs Daniel L. Sussman, Minh Tang, Carey E. Priebe Johns Hopkins

More information

The non-backtracking operator

The non-backtracking operator The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:

More information

Two-sample hypothesis testing for random dot product graphs

Two-sample hypothesis testing for random dot product graphs Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,

More information

A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities

A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities Journal of Advanced Statistics, Vol. 3, No. 2, June 2018 https://dx.doi.org/10.22606/jas.2018.32001 15 A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities Laala Zeyneb

More information

Theory and Methods for the Analysis of Social Networks

Theory and Methods for the Analysis of Social Networks Theory and Methods for the Analysis of Social Networks Alexander Volfovsky Department of Statistical Science, Duke University Lecture 1: January 16, 2018 1 / 35 Outline Jan 11 : Brief intro and Guest lecture

More information

A central limit theorem for an omnibus embedding of random dot product graphs

A central limit theorem for an omnibus embedding of random dot product graphs A central limit theorem for an omnibus embedding of random dot product graphs Keith Levin 1 with Avanti Athreya 2, Minh Tang 2, Vince Lyzinski 3 and Carey E. Priebe 2 1 University of Michigan, 2 Johns

More information

arxiv: v1 [stat.me] 6 Nov 2014

arxiv: v1 [stat.me] 6 Nov 2014 Network Cross-Validation for Determining the Number of Communities in Network Data Kehui Chen 1 and Jing Lei arxiv:1411.1715v1 [stat.me] 6 Nov 014 1 Department of Statistics, University of Pittsburgh Department

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

Spectral Partitiong in a Stochastic Block Model

Spectral Partitiong in a Stochastic Block Model Spectral Graph Theory Lecture 21 Spectral Partitiong in a Stochastic Block Model Daniel A. Spielman November 16, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened

More information

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition

More information

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria Community Detection fundamental limits & efficient algorithms Laurent Massoulié, Inria Community Detection From graph of node-to-node interactions, identify groups of similar nodes Example: Graph of US

More information

Reconstruction in the Generalized Stochastic Block Model

Reconstruction in the Generalized Stochastic Block Model Reconstruction in the Generalized Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign GDR

More information

Community Detection. Data Analytics - Community Detection Module

Community Detection. Data Analytics - Community Detection Module Community Detection Data Analytics - Community Detection Module Zachary s karate club Members of a karate club (observed for 3 years). Edges represent interactions outside the activities of the club. Community

More information

Spectral Clustering for Dynamic Block Models

Spectral Clustering for Dynamic Block Models Spectral Clustering for Dynamic Block Models Sharmodeep Bhattacharyya Department of Statistics Oregon State University January 23, 2017 Research Computing Seminar, OSU, Corvallis (Joint work with Shirshendu

More information

Statistical and Computational Phase Transitions in Planted Models

Statistical and Computational Phase Transitions in Planted Models Statistical and Computational Phase Transitions in Planted Models Jiaming Xu Joint work with Yudong Chen (UC Berkeley) Acknowledgement: Prof. Bruce Hajek November 4, 203 Cluster/Community structure in

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

Introduction to Spectral Graph Theory and Graph Clustering

Introduction to Spectral Graph Theory and Graph Clustering Introduction to Spectral Graph Theory and Graph Clustering Chengming Jiang ECS 231 Spring 2016 University of California, Davis 1 / 40 Motivation Image partitioning in computer vision 2 / 40 Motivation

More information

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY OLIVIER GUÉDON AND ROMAN VERSHYNIN Abstract. We present a simple and flexible method to prove consistency of semidefinite optimization

More information

Self-Tuning Spectral Clustering

Self-Tuning Spectral Clustering Self-Tuning Spectral Clustering Lihi Zelnik-Manor Pietro Perona Department of Electrical Engineering Department of Electrical Engineering California Institute of Technology California Institute of Technology

More information

Learning latent structure in complex networks

Learning latent structure in complex networks Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

arxiv: v1 [math.st] 26 Jan 2018

arxiv: v1 [math.st] 26 Jan 2018 CONCENTRATION OF RANDOM GRAPHS AND APPLICATION TO COMMUNITY DETECTION arxiv:1801.08724v1 [math.st] 26 Jan 2018 CAN M. LE, ELIZAVETA LEVINA AND ROMAN VERSHYNIN Abstract. Random matrix theory has played

More information

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Lecture 12 : Graph Laplacians and Cheeger s Inequality CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful

More information

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016 A Random Dot Product Model for Weighted Networks arxiv:1611.02530v1 [stat.ap] 8 Nov 2016 Daryl R. DeFord 1 Daniel N. Rockmore 1,2,3 1 Department of Mathematics, Dartmouth College, Hanover, NH, USA 03755

More information

A limit theorem for scaled eigenvectors of random dot product graphs

A limit theorem for scaled eigenvectors of random dot product graphs Sankhya A manuscript No. (will be inserted by the editor A limit theorem for scaled eigenvectors of random dot product graphs A. Athreya V. Lyzinski C. E. Priebe D. L. Sussman M. Tang D.J. Marchette the

More information

MATH 567: Mathematical Techniques in Data Science Clustering II

MATH 567: Mathematical Techniques in Data Science Clustering II This lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 567: Mathematical Techniques in Data Science Clustering II Dominique Guillot Departments

More information

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Benchmarking recovery theorems for the DC-SBM

Benchmarking recovery theorems for the DC-SBM Benchmarking recovery theorems for the DC-SBM Yali Wan Department of Statistics University of Washington Seattle, WA 98195-4322, USA Marina Meila Department of Statistics University of Washington Seattle,

More information

Benchmarking recovery theorems for the DC-SBM

Benchmarking recovery theorems for the DC-SBM Benchmarking recovery theorems for the DC-SBM Yali Wan Department of Statistics University of Washington Seattle, WA 98195-4322, USA yaliwan@washington.edu Marina Meilă Department of Statistics University

More information

An indicator for the number of clusters using a linear map to simplex structure

An indicator for the number of clusters using a linear map to simplex structure An indicator for the number of clusters using a linear map to simplex structure Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep Zuse Institute Berlin ZIB Takustraße 7, D-495 Berlin, Germany

More information

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

More information

Spectral Redemption: Clustering Sparse Networks

Spectral Redemption: Clustering Sparse Networks Spectral Redemption: Clustering Sparse Networks Florent Krzakala Cristopher Moore Elchanan Mossel Joe Neeman Allan Sly, et al. SFI WORKING PAPER: 03-07-05 SFI Working Papers contain accounts of scienti5ic

More information

Network Cross-Validation for Determining the Number of Communities in Network Data

Network Cross-Validation for Determining the Number of Communities in Network Data Network Cross-Validation for Determining the Number of Communities in Network Data Kehui Chen and Jing Lei University of Pittsburgh and Carnegie Mellon University August 1, 2016 Abstract The stochastic

More information

Lecture 9: Low Rank Approximation

Lecture 9: Low Rank Approximation CSE 521: Design and Analysis of Algorithms I Fall 2018 Lecture 9: Low Rank Approximation Lecturer: Shayan Oveis Gharan February 8th Scribe: Jun Qi Disclaimer: These notes have not been subjected to the

More information

Community detection in stochastic block models via spectral methods

Community detection in stochastic block models via spectral methods Community detection in stochastic block models via spectral methods Laurent Massoulié (MSR-Inria Joint Centre, Inria) based on joint works with: Dan Tomozei (EPFL), Marc Lelarge (Inria), Jiaming Xu (UIUC),

More information

How Robust are Thresholds for Community Detection?

How Robust are Thresholds for Community Detection? How Robust are Thresholds for Community Detection? Ankur Moitra (MIT) joint work with Amelia Perry (MIT) and Alex Wein (MIT) Let me tell you a story about the success of belief propagation and statistical

More information

Mixed Membership Stochastic Blockmodels

Mixed Membership Stochastic Blockmodels Mixed Membership Stochastic Blockmodels (2008) Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg and Eric P. Xing Herrissa Lamothe Princeton University Herrissa Lamothe (Princeton University) Mixed

More information

Clustering from Sparse Pairwise Measurements

Clustering from Sparse Pairwise Measurements Clustering from Sparse Pairwise Measurements arxiv:6006683v2 [cssi] 9 May 206 Alaa Saade Laboratoire de Physique Statistique École Normale Supérieure, 24 Rue Lhomond Paris 75005 Marc Lelarge INRIA and

More information

Spectral Clustering on Handwritten Digits Database

Spectral Clustering on Handwritten Digits Database University of Maryland-College Park Advance Scientific Computing I,II Spectral Clustering on Handwritten Digits Database Author: Danielle Middlebrooks Dmiddle1@math.umd.edu Second year AMSC Student Advisor:

More information

Network Representation Using Graph Root Distributions

Network Representation Using Graph Root Distributions Network Representation Using Graph Root Distributions Jing Lei Department of Statistics and Data Science Carnegie Mellon University 2018.04 Network Data Network data record interactions (edges) between

More information

Estimating network edge probabilities by neighbourhood smoothing

Estimating network edge probabilities by neighbourhood smoothing Biometrika (27), 4, 4,pp. 77 783 doi:.93/biomet/asx42 Printed in Great Britain Advance Access publication 5 September 27 Estimating network edge probabilities by neighbourhood smoothing BY YUAN ZHANG Department

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models

Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models JMLR: Workshop and Conference Proceedings vol 35:1 15, 2014 Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models Elchanan Mossel mossel@stat.berkeley.edu Department of Statistics

More information

Empirical Bayes estimation for the stochastic blockmodel

Empirical Bayes estimation for the stochastic blockmodel Electronic Journal of Statistics Vol. 10 (2016) 761 782 ISSN: 1935-7524 DOI: 10.1214/16-EJS1115 Empirical Bayes estimation for the stochastic blockmodel Shakira Suwan,DominicS.Lee Department of Mathematics

More information

arxiv: v1 [stat.me] 12 May 2017

arxiv: v1 [stat.me] 12 May 2017 Consistency of adjacency spectral embedding for the mixed membership stochastic blockmodel Patrick Rubin-Delanchy *, Carey E. Priebe **, and Minh Tang ** * University of Oxford and Heilbronn Institute

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Spectral Generative Models for Graphs

Spectral Generative Models for Graphs Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz Spectral K-Way Ratio-Cut Partitioning Part I: Preliminary Results Pak K. Chan, Martine Schlag and Jason Zien Computer Engineering Board of Studies University of California, Santa Cruz May, 99 Abstract

More information

Statistical Inference on Random Dot Product Graphs: a Survey

Statistical Inference on Random Dot Product Graphs: a Survey Journal of Machine Learning Research 8 (8) -9 Submitted 8/7; Revised 8/7; Published 5/8 Statistical Inference on Random Dot Product Graphs: a Survey Avanti Athreya Donniell E. Fishkind Minh Tang Carey

More information

A spectral clustering algorithm based on Gram operators

A spectral clustering algorithm based on Gram operators A spectral clustering algorithm based on Gram operators Ilaria Giulini De partement de Mathe matiques et Applications ENS, Paris Joint work with Olivier Catoni 1 july 2015 Clustering task of grouping

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

A physical model for efficient rankings in networks

A physical model for efficient rankings in networks A physical model for efficient rankings in networks Daniel Larremore Assistant Professor Dept. of Computer Science & BioFrontiers Institute March 5, 2018 CompleNet danlarremore.com @danlarremore The idea

More information

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks Shen HW, Cheng XQ, Wang YZ et al. A dimensionality reduction framework for detection of multiscale structure in heterogeneous networks. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(2): 341 357 Mar. 2012.

More information

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 - Clustering Lorenzo Rosasco UNIGE-MIT-IIT About this class We will consider an unsupervised setting, and in particular the problem of clustering unlabeled data into coherent groups. MLCC 2018

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Graph Detection and Estimation Theory

Graph Detection and Estimation Theory Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and

More information

Spectral Alignment of Networks Soheil Feizi, Gerald Quon, Muriel Medard, Manolis Kellis, and Ali Jadbabaie

Spectral Alignment of Networks Soheil Feizi, Gerald Quon, Muriel Medard, Manolis Kellis, and Ali Jadbabaie Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-205-005 February 8, 205 Spectral Alignment of Networks Soheil Feizi, Gerald Quon, Muriel Medard, Manolis Kellis, and

More information

Matrix estimation by Universal Singular Value Thresholding

Matrix estimation by Universal Singular Value Thresholding Matrix estimation by Universal Singular Value Thresholding Courant Institute, NYU Let us begin with an example: Suppose that we have an undirected random graph G on n vertices. Model: There is a real symmetric

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Manifold Learning for Subsequent Inference

Manifold Learning for Subsequent Inference Manifold Learning for Subsequent Inference Carey E. Priebe Johns Hopkins University June 20, 2018 DARPA Fundamental Limits of Learning (FunLoL) Los Angeles, California http://arxiv.org/abs/1806.01401 Key

More information

Spectral thresholds in the bipartite stochastic block model

Spectral thresholds in the bipartite stochastic block model JMLR: Workshop and Conference Proceedings vol 49:1 17, 2016 Spectral thresholds in the bipartite stochastic block model Laura Florescu New York University Will Perkins University of Birmingham FLORESCU@CIMS.NYU.EDU

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Semidefinite Programming

Semidefinite Programming Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has

More information

Spectral Clustering. Guokun Lai 2016/10

Spectral Clustering. Guokun Lai 2016/10 Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph

More information

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley Jure Leskovec (@jure) Joint work with Jaewon Yang, Julian McAuley Given a network, find communities! Sets of nodes with common function, role or property 2 3 Q: How and why do communities form? A: Strength

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization Jess Banks Cristopher Moore Roman Vershynin Nicolas Verzelen Jiaming Xu Abstract We study the problem

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Foundations of Adjacency Spectral Embedding. Daniel L. Sussman

Foundations of Adjacency Spectral Embedding. Daniel L. Sussman Foundations of Adjacency Spectral Embedding by Daniel L. Sussman A dissertation submitted to The Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy. Baltimore,

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

Graphs in Machine Learning

Graphs in Machine Learning Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman January 27, 2015 MVA 2014/2015

More information

The Forward-Backward Embedding of Directed Graphs

The Forward-Backward Embedding of Directed Graphs The Forward-Backward Embedding of Directed Graphs Anonymous authors Paper under double-blind review Abstract We introduce a novel embedding of directed graphs derived from the singular value decomposition

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Lecture Semidefinite Programming and Graph Partitioning

Lecture Semidefinite Programming and Graph Partitioning Approximation Algorithms and Hardness of Approximation April 16, 013 Lecture 14 Lecturer: Alantha Newman Scribes: Marwa El Halabi 1 Semidefinite Programming and Graph Partitioning In previous lectures,

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Impact of regularization on Spectral Clustering

Impact of regularization on Spectral Clustering Impact of regularization on Spectral Clustering Antony Joseph and Bin Yu December 5, 2013 Abstract The performance of spectral clustering is considerably improved via regularization, as demonstrated empirically

More information

CLOSE-TO-CLEAN REGULARIZATION RELATES

CLOSE-TO-CLEAN REGULARIZATION RELATES Worshop trac - ICLR 016 CLOSE-TO-CLEAN REGULARIZATION RELATES VIRTUAL ADVERSARIAL TRAINING, LADDER NETWORKS AND OTHERS Mudassar Abbas, Jyri Kivinen, Tapani Raio Department of Computer Science, School of

More information

Spectral thresholds in the bipartite stochastic block model

Spectral thresholds in the bipartite stochastic block model Spectral thresholds in the bipartite stochastic block model Laura Florescu and Will Perkins NYU and U of Birmingham September 27, 2016 Laura Florescu and Will Perkins Spectral thresholds in the bipartite

More information

Doubly Stochastic Normalization for Spectral Clustering

Doubly Stochastic Normalization for Spectral Clustering Doubly Stochastic Normalization for Spectral Clustering Ron Zass and Amnon Shashua Abstract In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that

More information

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Communities, Spectral Clustering, and Random Walks

Communities, Spectral Clustering, and Random Walks Communities, Spectral Clustering, and Random Walks David Bindel Department of Computer Science Cornell University 26 Sep 2011 20 21 19 16 22 28 17 18 29 26 27 30 23 1 25 5 8 24 2 4 14 3 9 13 15 11 10 12

More information

Learning to Learn and Collaborative Filtering

Learning to Learn and Collaborative Filtering Appearing in NIPS 2005 workshop Inductive Transfer: Canada, December, 2005. 10 Years Later, Whistler, Learning to Learn and Collaborative Filtering Kai Yu, Volker Tresp Siemens AG, 81739 Munich, Germany

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Graph Clustering Algorithms

Graph Clustering Algorithms PhD Course on Graph Mining Algorithms, Università di Pisa February, 2018 Clustering: Intuition to Formalization Task Partition a graph into natural groups so that the nodes in the same cluster are more

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Robust Motion Segmentation by Spectral Clustering

Robust Motion Segmentation by Spectral Clustering Robust Motion Segmentation by Spectral Clustering Hongbin Wang and Phil F. Culverhouse Centre for Robotics Intelligent Systems University of Plymouth Plymouth, PL4 8AA, UK {hongbin.wang, P.Culverhouse}@plymouth.ac.uk

More information