CLUSTERING over graphs is a classical problem with
|
|
- Neal Nichols
- 5 years ago
- Views:
Transcription
1 Maximum Likelihood Latent Space Embedding of Logistic Random Dot Product Graphs Luke O Connor, Muriel Médard and Soheil Feizi ariv:5.85v3 [stat.ml] 3 Aug 27 Abstract A latent space model for a family of random graphs assigns real-valued vectors to nodes of the graph such that edge probabilities are determined by latent positions. Latent space models provide a natural statistical framework for graph visualizing and clustering. A latent space model of particular interest is the Random Dot Product Graph (RDPG), which can be fit using an efficient spectral method; however, this method is based on a heuristic that can fail, even in simple cases. Here, we consider a closely related latent space model, the Logistic RDPG, which uses a logistic link function to map from latent positions to edge likelihoods. Over this model, we show that asymptotically exact imum likelihood inference of latent position vectors can be achieved using an efficient spectral method. Our method involves computing top eigenvectors of a normalized adjacency matrix and scaling eigenvectors using a regression step. The novel regression scaling step is an essential part of the proposed method. In simulations, we show that our proposed method is more accurate and more robust than common practices. We also show the effectiveness of our approach over standard real networks of the karate club and political blogs. Index Terms Latent space models, Stochastic block models, Maximum likelihood INTRODUCTION CLUSTERING over graphs is a classical problem with applications in systems biology, social sciences, and other fields [], [2], [3], [4]. Although most formulations of the clustering problem are NP-hard [5], several approaches have yielded useful approximate algorithms. The most wellstudied approach is spectral clustering. Most spectral methods are not based on a particular generative network model; alternative, model-based approaches have also been proposed, using loopy belief propagation [4], variational Bayes [5], Gibbs sampling [6], and semidefinite programming [22], [23]. Many spectral clustering methods are derived by proposing a discrete optimization problem, and relaxing it to obtain a continuous, convex optimization whose solution is given by the eigenvectors of a normalized adjacency matrix or Laplacian. A post-processing step, typically k- means, is used to extract clusters from these eigenvectors [6], [7]. Different matrix normalizations/transformations include Modularity [], Laplacian [9], normalized Laplacian [], [], [26], and Bethe Hessian [2]. These methods are often characterized theoretically in the context of the Stochastic Block Model (SBM) [3], a simple and canonical model of community structure. Theoretical bounds on the detectability of network structure for large networks have been established for stochastic block models [4], [9], [2], [2]. Several spectral methods have been shown to achieve this recovery threshold [8], [2], [7]. Strong theoretical and empirical results have also been obtained using SDP-based [22], [23] and Belief Propagation-based [4] methods, which often have higher computational complexity than spectral methods. A threshold has also been discovered for perfect S. Feizi is the corresponding author. L. O Connor and S. Feizi contributed equally to this work. L. O Connor is with Harvard University. S. Feizi is with Stanford Universiy. M. Médard is with Massachusetts Institute of Technology (MIT). clustering, i.e. when community structure can be recovered with zero errors [23], [24]. An alternative approach to the clustering problem is to invoke a latent space model. Each node is assigned a latent position v i, and the edges of the graph are drawn independently with probability p ij = g(v i, v j ) for some g( ). Hoff et al. [3] considered two latent space models, the first of which was a distance model, P i,j = l( v i v j µ + βx ij ), l(x) = /( + exp( x)) where edge probabilities depend on the Euclidean distance between two nodes. x ij is a fixed covariate term, which is not learned. Their second model is called a projection model: P i,j = l( v i v j v j + βx ij µ). () Hoff et al. suggest to perform inference using an MCMC approach over both models. Focusing on the distance model, they have also extended their approach to allow the latent positions to themselves be drawn from a mixture distribution containing clusters [32], [33]. Efforts to improve the computational efficiency have been made in references [34], [35], and with a related mixed membership blockmodel in reference [5]. Young et al. [28] introduced the Random Dot Product Graph (RDPG), in which the probability of observing an edge between two nodes is the dot product between their respective latent position vectors. The RDPG model can be written as P i,j = v i v j. This model is related to the projection model of Hoff et al. (). The RDPG provides a useful perspective on spectral clustering, in two ways. First, it has led to theoretical advances, including a central limit theorem for eigenvectors of an adjacency matrix [29] and a justification for k-means
2 2 clustering as a post processing step for spectral clustering [25]. Second, as a natural extension of the SBM, the RDPG describes more comprehensively the types of network structures, such as variable degrees and mixed community memberships, that can be inferred using spectral methods. Let V be the n d matrix of latent positions, where n is the number of nodes and d is the dimension of the latent vectors (d n). Sussman et al. [25] proposed an inference method over the RDPG based on the heuristic that the first eigenvectors of A will approximate the singular vectors of V, as E(A) = V V T. They also characterized the distribution of the eigenvectors of A given V. This heuristic can fail, however, as the first eigenvector of the adjacency matrix often separates high-degree nodes from low-degree nodes, rather than separating the communities. This problem occurs even in the simplest clustering setup: a symmetric SBM with two clusters of equal size and density. Therefore, we were motivated to develop a more robust inference approach. In this paper, we consider a closely related latent space model, the Logistic RDPG, which uses a logistic link function mapping from latent positions to edge probabilities. Like the previously-studied RDPG, the logistic RDPG includes most SBMs as well as other types of network structure, including a variant of the degree corrected SBM. The logistic RDPG is also similar to the projection model, which uses a logistic link function but models a directed graph (with p i,j p j,i ). Over this model, we show that the imum likelihood latent-position inference problem admits an asymptotically exact spectral solution. Our method is to take the top eigenvectors of the mean-centered adjacency matrix and to scale them using a logistic regression step. This result is possible because over the logistic model specifically, the likelihood function separates into a linear term that depends on the observed network and a nonlinear penalty term that does not. Because of its simplicity, the penalty term admits a Frobenius norm approximation, leading to our spectral algorithm. A similar approximation is not obtained using other link functions besides the logistic link. We show that the likelihood of the approximate solution approaches the imum of the likelihood function when the graph is large and the latent-position magnitudes go to zero. The asymptotic regime is not overly restrictive, as it encompasses many large SBMs at or above the detectability threshold [4]. We compare the performance of our method in the graph clustering problem with spectral methods including the Modularity method [], the Normalized Laplacian [9] and the Bethe Hessian [2], and the SDP-based methods [22], [23]. We show that our method outperforms these methods over a broad range of clustering models. We also show the effectiveness of our approach over real networks of karate club and political blogs. 2 LOGISTIC RANDOM DOT PRODUCT GRAPHS In this section, we introduce the logistic RDPG, describe our inference method, and show that it is asymptotically equivalent to the imum likelihood inference. 2. Definitions Let A be the set of adjacency matrices corresponding to undirected, unweighted graphs of size n. Definition (Stochastic Block Model). The Stochastic Block Model is a family of distributions on A parameterized by (k, c, Q), where k is the number of communities, c [k] n is the community membership vector and Q [, ] k k is the matrix of community relationships. For each pair of nodes, an edge is drawn independently with probability P i,j := P r(a ij = c i, c j, Q) = Q ci,c j. Another network model that characterizes low dimensional structures is the Random Dot Product Graph (RDPG). This class of models includes many SBMs. Definition 2 (Random Dot Product Graph). The Random Dot Product Graph with link function g( ) is a family of distributions on A parameterized by an n d matrix of latent positions V R n d. For each pair of nodes, an edge is drawn independently with probability P i,j := P r(a ij = V ) = g(v i v j ), (2) where v i, the i-th row of the matrix V, is the latent position vector assigned to node i. The RDPG has been formulated using a general link function g( ) [28]. The linear RDPG, using the identity link, has been analyzed in the literature because it leads to a spectral inference method. We will refer to this model as either the linear RDPG or as simply the RDPG. In this paper, we consider the Logistic RDPG: Definition 3 (Logistic RDPG). The logistic RDPG is the RDPG with link function: g(x) = l(x µ), l(x) :=, (3) + e x where µ is the offset parameter of the logistic link function. Note that this model is similar to the projection model of Hoff et al. [3] (). The projection model is for a directed graph, with P i,j P j,i owing to the division by v j. Remark. The parameter µ in the logistic RDPG controls the sparsity of the network. If the latent position vector lengths are small, the density of the graph is n(n ) E(A ij ) l( µ). (4) i<j A logistic RDPG with this property is called centered. In Section 2.2, we show that asymptotically exact imum likelihood inference of latent positions over the centered logistic RDPG can be performed using an efficient spectral algorithm. For a general RDPG, the ML inference problem is: Definition 4 (ML inference problem for the RDPG). Let A A. The ML inference problem over the RDPG is: A ij log g( i,j ) + ( A ij ) log( g( i,j )), = V V T, V R n d. Note that in the undirected case, this objective function is twice the log likelihood. (5)
3 Remark 2. A convex semidefinite relaxation of Optimization (5) can be obtained for some link functions as follows: A ij log g( ij ) + ( A ij ) log( g( ij )),, where means that is psd. For example, this optimization is convex for the logistic link function (i.e., g(x) = l(x µ)) and for the linear link function (i.e., g(x) = x). However, this optimization can be slow in practice, and it often leads to excessively high-rank solutions. 2.2 Maximum-Likelihood Inference of Latent Position Vectors Here, we present an efficient, asymptotically exact spectral algorithm for the imum-likelihood (ML) inference problem over the logistic RDPG, subject to mild constraints. We assume that the number of dimensions is given. In practice, these parameters are often set manually, but approaches have been proposed to automatically detect the number of dimensions [4]. The proposed imum-likelihood inference of latent position vectors for the logistic RDPG is described in Algorithm. In the following, we sketch the derivation of this algorithm in a series of lemmas. Proofs for these assertions are presented in Section 7. First, we simplify the likelihood function of Optimization (5) using the logistic link function. Let F () be the log-likelihood, and let the link function be g(x) = l(x µ). Then: F () := i,j = i,j = i,j (6) A ij log l( ij µ) + ( A ij ) log( l( ij µ)) A ij log l( ij µ) l( ij µ) + log( l( ij µ)) A ij ( ij µ) + log( l( ij µ)). (7) We have used that log(l(x)/( l(x))) = x. The imum likelihood problem takes the following form (for given µ): T r(a) + log( + e (ij µ) ), = V V T, V R n d. (8) The objective function has been split into a linear term that depends on the adjacency matrix A and a penalty term that does not depend on A. This simplification is what leads to tractable optimization, and it is the reason that the logistic link is needed; using e.g. a linear link, an optimization of this form is not obtained. We define a penalty function f(x) that keeps only the quadratic and higher-order terms in the penalty term of (8). Let f(x) := (h(x) h() h ()x), Now, h () = l( µ). Let B := A l( µ) n n h(x) = log( l(x µ)). in order to re-write Optimization (8) as: T r(b) = V V T, V R n d. 3 f( ij ), (9) Note that for a centered RDPG with average density Ā, µ = l (Ā), and B is the mean-centered adjacency matrix A Ā n n. In the next step, we convert the penalty term in the objective function into a constraint: Lemma. Suppose that is the optimal solution to Optimization (5). Let h := f( n(n ) ij). Then is also the solution to the following optimization: i,j T r(b) () = V V T, V R n d f( ij ) h. n(n ) In the following key lemma, we show that the inequality constraint of Optimization () can be replaced by its second order Taylor approximation. Lemma 2. For any ɛ > and γ, there exists δ > such that for any graph whose ML solution satisfies h δ and ii γ i n ii, () the following bound is satisfied. Let B be the mean centered adjacency matrix of the chosen graph. Let s R be the optimal value of the following optimization, obtained at = : T r(b), (2) = V V T, V R n d f( ij ) h, n(n ) Let s be the optimal value of the following optimization: where a 2 := f ( µ). Then T r(b), (3) = V V T, V R n d a 2 ij 2 h. n(n ) s ( ɛ)s. The parameter h is related to the average length of latentposition vectors ( ii ). If these lengths approach zero, h approaches zero, for a fixed γ. An implication of this constraint is that the logistic RDPG must be approximately centered. Thus, there is a natural choice for the parameter µ for the purpose of inference: ˆµ = l ( n(n ) A F ). (4) i
4 Algorithm ML Inference for the logistic RDPG Require: Adjacency matrix A, number of dimensions d. (optional) number of clusters k Form the mean-centered adjacency matrix B := A /(n(n )) A. Compute d eigenvectors of B with largest eigenvalues: e,..., e d. Let i = e i e T i for i d. Perform logistic regression of the entries of A lying above the diagonal on the corresponding entries of,..., d, estimating coefficients λ,...λ d subject to the constraint that λ i i. Let V be the matrix formed by concatenating λ e,..., λ d e d. Return V. (optional) Perform k-means on V, and return the inferred clusters. 4 This estimator of µ can be viewed as the imumlikelihood estimator specifically over the centered logistic RDPG. With this choice of µ, B, the mean-centered adjacency matrix, can be written as B = A n(n ) A F. Note that changing the constant in the inequality constraint of Optimization (3) only changes the scale of the solution, since the shape of the feasible set does not change. Thus, in this optimization we avoid needing to know h a priori (as long as the conditions of Lemma 2 are satisfied). Next we show that that the solution to Optimization (3) can be recovered up to a linear transformation using spectral decomposition: Lemma 3. Let be the optimal solution to Optimization (3). Let e,..., e d be the first d eigenvectors of B, corresponding to the largest eigenvalues. Then e,.., e d are identical to the non-null eigenvectors of, up to rotation. Once the eigenvectors of are known, it remains only to recover the corresponding eigenvalues. Instead of recovering the eigenvalues of, we find the eigenvalues that imize the likelihood, given the eigenvectors of. Let i = e i e T i. Then, the imum-likelihood estimate of λ,..., λ d conditional on,..., d =,..., d can be written as follows: λ := arg λ=(λ,...,λ d ) log P (A ij,..., d, λ, µ). (5) Lemma 4. Optimization (5) can be solved by logistic regression of the entries of A on the entries of,..., d, with the constraint that the coefficients are nonnegative, and with intercept µ. These lemmas can be used to show the asymptotic optimality of Algorithm : Theorem. For all ɛ > and γ >, there exists δ > that satisfies the following. For any graph with size n and adjacency matrix A, suppose that is the solution to the optimization Let h := P (A ), = V V T, V R n d. n(n ) f(ij). If then h < δ and ii γ i n P (A = ) P (A = ) > ɛ, ii, (6) where is the solution obtained by Algorithm. Our algorithm is asymptotically exact in the sense that the likelihood ratio between our solution and the true imum converges uniformly to one as the average latent position length shrinks. Importantly, the convergence is uniform over arbitrarily large graphs; therefore, this regime contains most interesting large network models, such as an SBM with large communities that cannot be perfectly recovered. Coupling this algorithm with a k-means post processing step leads to a clustering method with robust performance under different network clustering setups. This result is stronger than the statement that an approximate objective function approximates the likelihood function at the optimum of the likelihood function. Such a result, which can be obtained for many link functions (such as an affine-linear link), is not useful because it does not follow that the optimum of the approximate function lies near the optimum of the true likelihood function. Indeed, for a linear approximation, it has no optimum since the objective function is unbounded. In order to obtain the stronger statement that the likelihood at the optimum of the approximation is large, it is necessary to use a quadratic approximation. For link functions besides the logistic link, the quadratic term in the likelihood function depends on A, and a spectral optimization method cannot be obtained. The condition in Theorem that the lengths of optimal latent vectors are sufficiently small is not restrictive for large networks. Consider a sequence of increasingly-large SBMs with two clusters of fixed relative sizes, and a convergent sequence of admissible connectivity matrices whose average density is fixed. There are three asymptotic regimes for the community structure: () in which the structure of the network is too weak to detect any clusters at all; (2) in which the communities can be partially recovered, but some misassignments will be made; and (3) in which the communities can be recovered perfectly. The true latent position lengths go to zero in regimes () and (2) as well as in part of regime (3) [23]. Theorem requires imum-likelihood latent position lengths, rather than true position lengths, to go to zero. If this is the case, and if imum likelihood achieves the optimum thresholds for partial recovery and perfect recovery, then our method will as well. i
5 5 (a) (b) (c).8 MSE of estimated latent positions MSE of estimated latent positions MSE of estimated latent positions Adjacency Logistic RDPG Modularity Adjacency Logistic RDPG Modularity Adjacency Logistic RDPG Modularity Fig.. Normalized mean squared error (one minus squared correlation) of inferred latent positions for two SBMs (a-b) and a non-sbm logistic RDPG (c). The top eigenvectors of the adjacency matrix A and the modularity matrix M do not characterize the community structure in panel (a) and in panel (b), respectively. Note that in practice, a different eigenvector could be selected or multiple eigenvectors could be used. In panel (c), the top eigenvector of A does not recover the latent structure. In contrast, our method successfully recovers the underlying latent position vectors in all cases. 3 PERFORMANCE EVALUATION OVER SYNTHETIC NETWORKS In this section, we compare the performance of our proposed method (Algorithm ) with existing methods. First, we assess the performance of our algorithm against existing methods in inference of latent position vectors of two standard SBMs depicted in Figure. The network demonstrated in panel (a) has two dense clusters. In this case, the first eigenvector of the modularity matrix M leads to a good estimation of the latent position vector while the first eigenvector of the adjacency matrix A fails to characterize this vector. This is because the first eigenvector of the adjacency matrix correlates with node degrees. The modularity transformation regresses out the degree component and recovers the community structure. However, the top eigenvector of the modularity matrix fails to identify the underlying latent position vector when there is a single dense cluster in the network, and the community structure is correlated with node degrees (Figure -b). This discrepancy highlights the sensitivity of existing heuristic inference methods in different network models (the Modularity method has not previously been considered a latent-position inference method, but we believe that its appropriate to do so). In contrast, our simple normalization allows the underlying latent position vectors to be accurately recovered in both cases. We also verified in panel (c) that our method successfully recovers latent positions for a non-sbm logistic RDPG. In this setup, the adjacency matrix s first eigenvector again correlates with node degrees, and the modularity normalization causes an improvement. We found it remarkable that such a simple normalization (mean centering) enabled such significant improvements; using more sophisticated normalizations such as the Normalized Laplacian and the Bethe Hessian, no improvements over were observed (data not shown). Second, we assessed the ability of our method to detect communities generated from the SBM. We compared against the following existing spectral network clustering methods: Modularity (Newman, 26). We take the first d eigenvectors of the modularity matrix M := A v T v/2 E, where v is the vector of node degrees and E is the number of edges in the network. We then perform k-means clustering on these eigenvectors. Normalized Laplacian (Chung, 997). We take second- through (d + )st- last eigenvectors of L sym := D /2 (D A)D /2, where D is the diagonal matrix of degrees. We then perform k-means clustering on these eigenvectors. Bethe Hessian (Saade et al., 24). We take the second- through (d + )st- last eigenvectors of H(r) := (r 2 ) n n ra + D, where r 2 is the density of the graph as defined in [2]. Unnormalized spectral clustering (Sussman et al., 22). We take the first d eigenvectors of the adjacency matrix A, and perform k-means clustering on these eigenvectors. Spectral clustering on the mean-centered matrix B. We take the first d eigenvectors of the matrix B and perform k-means on them, without a scaling step. Note that in our evaluation we include spectral clustering on the mean-centered adjacency matrix B without subsequent eigenvalue scaling of Algorithm to demonstrate that the scaling step computed by logistic regression is essential to
6 6 (a) (b) (c).2 (d). (e).6.2 (f) 3.5 Logistic RDPG Adjacency matrix Modularity matrix Laplacian matrix Centered B matrix Bethe Hessian Fig. 2. Performance comparison of our method (logistic RDPG) against spectral clustering methods in different clustering setups. Panels (a)-(e) illustrate networks that can be characterized by SBM, while panel (f) illustrates a non-sbm network model. The scale of the x axis is different in panel (f) than the rest of the panels. Our proposed method performs consistently well, while other methods exhibit sensitive and inconsistent performance in different network clustering setups. Note that in some cases, such as for the Laplacian in panel (b), performance is improved by using a different eigenvector or by using a larger number of eigenvectors. the performance of the proposed algorithm. When d =, the methods are equivalent. We also compare the performance of our method against two SDP-based approaches, the method proposed by Hajek et al. (25) and the SDP- method proposed by Amini et al. (24). For all methods we assume that the number of clusters k is given. In our scoring metric, we distinguish between clusters and communities: For instance, in Figure 2-e, there are two clusters and four communities, comprised of nodes belonging only to cluster one, nodes belonging only to cluster two, nodes belonging to both clusters, and nodes belonging to neither. The score that we use is a normalized Jaccard index, defined as: k C l Ĉσ(l) σ Sk l= C l k (7) where C l is the l-th community, Ĉl is the l-th estimated community, and S k is the group of permutations of k elements. Note that one advantage of using this scoring metric is that it weighs differently-sized clusters equally (it does not place higher weights on larger communities.). Figure 2 presents a comparison between our proposed method and existing spectral methods in a wide range of clustering setups. Our proposed method performs consistently well, while other methods exhibit sensitive and inconsistent performance in different network clustering setups. For instance, in the case of two large clusters (b) the second-to-last eigenvector of the Normalized Laplacian fails to correlate with the community structure; in the case of having one dense cluster (a), the Modularity normalization performs poorly; when there are many small clusters (c), the performance of the Bethe Hessian method is poor. In each case, the proposed method performs at least as well as the best alternate method, except in the case of several differentsized clusters (d), when the normalized Laplacian performs marginally better. In the case of overlapping clusters (e), our method performs significantly better than all competing methods. Spectral clustering on B without the scaling step also performs well in this setup; however, its performance is worse in panels (c-d) when d is larger, highlighting the importance of our logistic regression step. The values of k and d for the different simulations were: k = 2, d = ; k = 2, d = ; k = 25, d = 24; k = 7, d = 6; k = 4, d = 2; k = 2, d = for panels (a)-(f), respectively. The values of d are chosen based on the number of dimensions that would be informative to the community structure, if one knew the true latent positions. All networks have nodes, with background density.5. While spectral methods are most prominent network clustering methods owing to their accuracy and efficiency, other approaches have been proposed, notably including
7 7 (a) (b).6 (c) (d) Logistic RDPG Amini et al. SDP Hajek et al. SDP Fig. 3. Performance comparison of our method (logistic RDPG) against semidefinite programming-based clustering methods. Hajek et al. s method is designed for the case of having two equally sizes partitions, thus it is not included in Panels b-d. SDP-based methods, which solve relaxations of the imum likelihood problem over the SBM. We compare the performance of our method with the SDP-based methods proposed by Hajek et al. (25) and Amini et al. (24) (Figure 3). In the symmetric SBM, meaning the SBM with two equally-dense, equally-large communities, we find that our method performs almost equally well as the method of Hajek et al. (25), which is a simple semidefinite relaxation of the likelihood in that particular case. Our method also performs better than the method of Amini et al., which solves a more complicated relaxation of the SBM imumlikelihood problem in the more general case (Figure 3). 4 PERFORMANCE EVALUATION OVER REAL NET- WORKS To assess the performance of the Logistic RDPG over wellcharacterized real networks, we apply it to two well-known real networks. First, we consider the Karate Club network [44]. Originally a single karate club with social ties between various members, the club split into two clubs after a dispute between the instructor and the president. The network contains 34 nodes with the average degree 4.6, including two high degree nodes corresponding to the instructor and the president. Applying our method to this network, we find that the first eigenvector separates the two true clusters perfectly (Figure 4-a). In the second experiment, we consider a network of political blogs, whose edges correspond to links between blogs [45]. This network, which contains 22 nodes with nonzero degrees, is sparse (the average total degree is 27.4) with a number of high degree nodes (6 nodes with degrees larger than ). The nodes in this network have been labeled as either liberal or conservative. We apply our method to this network. Figure 4-b shows inferred latent positions of nodes of this network. As it is illustrated in this figure, nodes with different labels have been separated in the latent space. Note that some nodes are placed near the origin, indicating that they cannot be clustered confidently; this is occurred owing to their low degrees as the correlation between node degrees and distances from the origin was CODE We provide code for the proposed method in the following link: 6 DISCUSSION In this paper, we developed a spectral inference method over logistic Random Dot Product Graphs (RDPGs), and we showed that the proposed method is asymptotically equivalent to the imum likelihood latent-position inference. Previous justifications for spectral clustering have usually been either consistency results [25], [26] or partialrecovery results [7], [8]; to the best of our knowledge, our likelihood-based justification is the first of its kind for a spectral method. This type of justification is satisfying because imum likelihood inference methods can generally be expected to have optimal asymptotic performance characteristics; for example, it is known that imum likelihood estimators are consistent over the SBM [38], [39]. It remains an important future direction to characterize the asymptotic performance of the MLE over the Logistic RDPG. We have focused in this paper on the network clustering problem; however, latent space models such as the Logistic RDPG can be viewed as a more general tool for exploring and analyzing network structures. They can be used for visualization [4], [42] and for inference of partial-membership type structures, similar to the mixedmembership stochastic blockmodel [5]. Our approach can also be generalized to multi-edge graphs, in which the number of edges between two nodes is binomially distributed. Such data is emerging in areas including systems biology, in the form of cell type and tissue specific networks [43].
8 8 (a) Club Club 2 (b) Eigenvector 2 Eigenvector Liberal Conservative Eigenvector -6 Eigenvector Fig. 4. Estimated latent positions for nodes of two real networks. (a) Karate club social network. (b) Political blogs network. 7 PROOFS Proof (Proof of Lemma ). If not, then the optimal solution to Optimization () would be a better solution to Optimization (5) than. Proof 2 (Proof of Lemma 2). The origin is in the feasible set for both optimizations. For each optimization, the objective function value satisfies T r(rc) = rt r(c). Thus, the optimum is either at the origin (if there is no positive solution) or at the boundary of the feasible set. If the optimum is at the origin, we have s = s =. If not, let be be any solution to n i,j f( ij) = h. 2 Let r = F, and let r = h/a 2. Claim: fixing γ, r/r uniformly as h. Define F (a) := n 2 f(a ij / F ) i,j for a >. In addition, since r has been defined such that the quadratic term of F (r ) is a 2 i,j ( r r ij) 2 = h, we have F (r ) = h + n 2 O( i,j ( r r ij) 3 ). (8) Moreover, the Taylor series for f( ) converges in a neighborhood of zero. Because of the constraint ij = ii γ i,j i n ii, we can choose δ such that every entry ij falls within this neighborhood. This constraint also implies 3 n 2 ij ( ( ) ) 3 ij n 2 3 = O n 2 F. Substituting this into (8), we have Therefore, we have F (r) F (r ) F (r ) F (r ) = h + n 2 O(r 3 ). (9) i = h + O(r 3 ) h h = O(r ). (2) Note that f( ) is convex function with f (x) > for all x > and f (x) < for all x <. Thus F is increasing, convex, and zero-valued at the origin: for any a, b >, a b b < F (a) F (b). (2) F (b) Thus r r r = O(r ) and r r = + O(r ). Let r s be the norm of the arg to optimization (2); because the objective function is linear we have that s r r s s. Let r t be the distance to the intersection of the boundary of the feasible set with the ray from the origin through the arg to optimization (3); then s rt r s. We have shown that both ratios tend uniformly to one. This completes the proof. Proof 3 (Proof of Lemma 3). First, suppose we have prior knowledge of eigenvalues of. Denote its nonzero eigenvalues by λ,..., λ d. Then we would be able to recover the optimal solution to Optimization (3) by solving the following optimization T r(b) λ i = λ i i d rank() = d (22) Note that the Frobenius norm of a matrix is determined by eigenvalues of the matrix as follows: 2 F = T r( T ) = T r( 2 ) = λ 2 i. (23) Thus we can drop the Frobenius norm constraint in (3). Let be an n n psd matrix, whose non-null eigenvectors are the columns of a matrix E R n d, and whose respective eigenvalues are λ,..., λ d. Let V := E diag( λ,..., λ d ), so that = V V T. Rewrite the objective function as T r(b) = T r(v T BV ) = d λ i e T i Be i. i= Therefore = EE T and = V V T.
9 9 Proof 4 (Proof of Lemma 4). The upper-triangular entries of A are independent Bernoulli random variables conditional on and µ, with a logistic link function. The coefficients should be nonnegative, as is constrained to be positive semidefinite. Proof 5 (Proof of Theorem ). By Lemma, we have that the solution to optimization () is equal to the loglikelihood, up to addition of a constant. By Lemma 2, we have that for a fixed γ, as h, the quotient s / s converges uniformly to one, where s is the solution to () and s is the solution to optimization (3). The convergence is uniform over the choice of B that is needed for Theorem. Because s and s do not diverge to ±, this also implies that s s, and therefore the loglikelihood ratio, converges uniformly to zero. By Lemma 3, the non-null eigenvectors of the arg of optimization (3) are equivalent (up to rotation) to the first eigenvectors of B. Finally, by Lemma 4, the eigenvalues that imize the likelihood can be recovered using a logistic regression step. By Lemma 2, the theorem would hold if we recovered the eigenvalues solving the approximate optimization (3). By finding the eigenvalues that exactly imize the likelihood, we achieve a likelihood value at least as large. REFERENCES [] M. Girvan and M. Newman, Community structure in social and biological networks, Proc. Natl. Acad. Sci. USA, vol. 99, no. condmat/2, pp , 2. [2] S. Butenko, Clustering challenges in biological networks. World Scientific, 29. [3] N. Mishra, R. Schreiber, I. Stanton, and R. E. Tarjan, Clustering social networks, in Algorithms and Models for the Web-Graph. Springer, 27, pp [4] C.-H. Lee, M. N. Hoehn-Weiss, and S. Karim, Grouping interdependent tasks: Using spectral graph partitioning to study system modularity and performance, Available at SSRN, 24. [5] S. E. Schaeffer, On the np-completeness of some graph cluster measures, ariv preprint cs/56, 28. [6] Ng, Andrew Y., Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems 2 (22): [7] U. Von Luxburg, A tutorial on spectral clustering, Statistics and computing, vol. 7, no. 4, pp , 27. [8] M. E. Newman, Finding community structure in networks using the eigenvectors of matrices, Physical review E, vol. 74, no. 3, 26. [9] B. Mohar, Y. Alavi, G. Chartrand, and O. Oellermann, The laplacian spectrum of graphs, Graph theory, combinatorics, and applications, vol. 2, pp , 99. [] F. R. Chung, Spectral graph theory. American Mathematical Soc., 997, vol. 92. [] Shi, Jianbo, and Jitendra Malik. Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 22.8 (2): [2] A. Saade, F. Krzakala, and L. Zdeborová, Spectral clustering of graphs with the bethe hessian, in Advances in Neural Information Processing Systems, 24, pp [3] Hagen, Lars, and Andrew B. Kahng. New spectral methods for ratio cut partitioning and clustering. Computer-aided design of integrated circuits and systems, ieee transactions on.9 (992): [4] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová, Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications, Physical Review E, vol. 84, no. 6, p. 666, 2. [5] Airoldi, E. M., Blei, D. M., Fienberg, S. E., and ing, E. P. (28), Mixed Membership Stochastic Blockmodels, The Journal of Machine Learning Research, 9, [6] Snijders, T., and Nowicki, K. (997), Estimation and Prediction for Stochastic Blockmodels for Graphs With Latent Block Structure, Journal of Classifi- cation, 4, 75-. [7] Krzakala, Florent, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborov, and Pan Zhang. Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, no. 52 (23): [8] Nadakuditi, Raj Rao, and Mark EJ Newman. Graph spectra and the detectability of community structure in networks. Physical review letters 8, no. 8 (22): 887. [9] Mossel, Elchanan, Joe Neeman, and Allan Sly. Stochastic block models and reconstruction. ariv preprint ariv: (22). [2] Massouli, Laurent. Community detection thresholds and the weak Ramanujan property. Proceedings of the 46th Annual ACM Symposium on Theory of Computing. ACM, 24. [2] Mossel, Elchanan, Joe Neeman, and Allan Sly. A proof of the block model threshold conjecture. ariv preprint ariv:3.45 (23). [22] A. A. Amini and E. Levina, On semidefinite relaxations for the block model, ariv preprint ariv: , 24. [23] B. Hajek, Y. Wu, and J. u, Achieving exact cluster recovery threshold via semidefinite programming: Extensions, ariv preprint ariv: , 25. [24] Abbe, Emmanuel, Afonso S. Bandeira, and Georgina Hall. Exact recovery in the stochastic block model. ariv preprint ariv: (24). [25] D. L. Sussman, M. Tang, D. E. Fishkind, and C. E. Priebe, A consistent adjacency spectral embedding for stochastic blockmodel graphs, Journal of the American Statistical Association, vol. 7, no. 499, pp. 9 28, 22. [26] Rohe, Karl, Sourav Chatterjee, and Bin Yu. Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics (2): [27] Kraetzl, Miro, Christine Nickel, and Edward R. Scheinerman. Random dot product graphs: A model for social networks. Preliminary Manuscript, 25. [28] S. J. Young and E. R. Scheinerman, Random dot product graph models for social networks, in Algorithms and models for the webgraph. Springer, 27, pp [29] A. Athreya, V. Lyzinski, D. J. Marchette, C. E. Priebe, D. L. Sussman, and M. Tang, A central limit theorem for scaled eigenvectors of random dot product graphs, ariv preprint ariv: , 23. [3] P. W. Holland, K. B. Laskey, and S. Leinhardt, Stochastic blockmodels: First steps, Social networks, vol. 5, no. 2, pp. 9 37, 983. [3] Hoff, Peter D., Adrian E. Raftery, and Mark S. Handcock. Latent space approaches to social network analysis. Journal of the american Statistical association (22): [32] Shortreed, Susan, Mark S. Handcock, and Peter Hoff. Positional estimation within a latent space model for networks. Methodology 2. (26): [33] Handcock, Mark S., Adrian E. Raftery, and Jeremy M. Tantrum. Model-based clustering for social networks. Journal of the Royal Statistical Society: Series A (Statistics in Society) 7.2 (27): [34] Salter-Townshend, Michael, and Thomas Brendan Murphy. Variational Bayesian inference for the latent position cluster model. Analyzing Networks and Learning with Graphs Workshop at 23rd annual conference on Neural Information Processing Systems (NIPS 29), Whister, December [35] Friel, Nial, Caitriona Ryan, and Jason Wyse. Bayesian model selection for the latent position cluster model for Social Networks. ariv preprint ariv: (23). [36] T. Qin and K. Rohe, Regularized spectral clustering under the degree-corrected stochastic blockmodel, in Advances in Neural Information Processing Systems, 23, pp [37] S. Van Dongen and A.J. Enright, Metric distances derived from cosine similarity and pearson and spearman correlations, ariv preprint ariv:28.345, 22. [38] Bickel, Peter, David Choi, iangyu Chang, and Hai Zhang. Asymptotic normality of imum likelihood and its variational approximation for stochastic blockmodels. The Annals of Statistics 4, no. 4 (23): [39] Celisse, Alain, Jean-Jacques Daudin, and Laurent Pierre. Consistency of imum-likelihood and variational estimators in the stochastic block model. Electronic Journal of Statistics 6 (22):
10 [4] Zelnik-Manor, Lihi, and Pietro Perona. Self-tuning spectral clustering. In Advances in neural information processing systems, pp [4] Hall, Kenneth M. An r-dimensional quadratic placement algorithm. Management science 7, no. 3 (97): [42] Koren, Yehuda. Drawing graphs by eigenvectors: theory and practice. Computers & Mathematics with Applications 49. (25): [43] Neph, S. et al. Circuitry and dynamics of human transcription factor regulatory networks. Cell 5, (22). [44] Zachary, W. An information flow model for conflict and fission in small groups. Journal of anthropological research, 33(4):452473, (977). [45] L. A Adamic and N. Glance. The political blogosphere and the 24 us election: divided they blog. In Proceedings of the 3rd international workshop on Link discovery, page 36. ACM, (25).
arxiv: v1 [stat.ml] 29 Jul 2012
arxiv:1207.6745v1 [stat.ml] 29 Jul 2012 Universally Consistent Latent Position Estimation and Vertex Classification for Random Dot Product Graphs Daniel L. Sussman, Minh Tang, Carey E. Priebe Johns Hopkins
More informationThe non-backtracking operator
The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:
More informationTwo-sample hypothesis testing for random dot product graphs
Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,
More informationA Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities
Journal of Advanced Statistics, Vol. 3, No. 2, June 2018 https://dx.doi.org/10.22606/jas.2018.32001 15 A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities Laala Zeyneb
More informationTheory and Methods for the Analysis of Social Networks
Theory and Methods for the Analysis of Social Networks Alexander Volfovsky Department of Statistical Science, Duke University Lecture 1: January 16, 2018 1 / 35 Outline Jan 11 : Brief intro and Guest lecture
More informationA central limit theorem for an omnibus embedding of random dot product graphs
A central limit theorem for an omnibus embedding of random dot product graphs Keith Levin 1 with Avanti Athreya 2, Minh Tang 2, Vince Lyzinski 3 and Carey E. Priebe 2 1 University of Michigan, 2 Johns
More informationarxiv: v1 [stat.me] 6 Nov 2014
Network Cross-Validation for Determining the Number of Communities in Network Data Kehui Chen 1 and Jing Lei arxiv:1411.1715v1 [stat.me] 6 Nov 014 1 Department of Statistics, University of Pittsburgh Department
More informationCertifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering
Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint
More informationSpectral Partitiong in a Stochastic Block Model
Spectral Graph Theory Lecture 21 Spectral Partitiong in a Stochastic Block Model Daniel A. Spielman November 16, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened
More informationSpectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods
Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition
More informationCommunity Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria
Community Detection fundamental limits & efficient algorithms Laurent Massoulié, Inria Community Detection From graph of node-to-node interactions, identify groups of similar nodes Example: Graph of US
More informationReconstruction in the Generalized Stochastic Block Model
Reconstruction in the Generalized Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign GDR
More informationCommunity Detection. Data Analytics - Community Detection Module
Community Detection Data Analytics - Community Detection Module Zachary s karate club Members of a karate club (observed for 3 years). Edges represent interactions outside the activities of the club. Community
More informationSpectral Clustering for Dynamic Block Models
Spectral Clustering for Dynamic Block Models Sharmodeep Bhattacharyya Department of Statistics Oregon State University January 23, 2017 Research Computing Seminar, OSU, Corvallis (Joint work with Shirshendu
More informationStatistical and Computational Phase Transitions in Planted Models
Statistical and Computational Phase Transitions in Planted Models Jiaming Xu Joint work with Yudong Chen (UC Berkeley) Acknowledgement: Prof. Bruce Hajek November 4, 203 Cluster/Community structure in
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018
ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space
More informationIntroduction to Spectral Graph Theory and Graph Clustering
Introduction to Spectral Graph Theory and Graph Clustering Chengming Jiang ECS 231 Spring 2016 University of California, Davis 1 / 40 Motivation Image partitioning in computer vision 2 / 40 Motivation
More informationCOMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY
COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY OLIVIER GUÉDON AND ROMAN VERSHYNIN Abstract. We present a simple and flexible method to prove consistency of semidefinite optimization
More informationSelf-Tuning Spectral Clustering
Self-Tuning Spectral Clustering Lihi Zelnik-Manor Pietro Perona Department of Electrical Engineering Department of Electrical Engineering California Institute of Technology California Institute of Technology
More informationLearning latent structure in complex networks
Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationNonparametric Bayesian Matrix Factorization for Assortative Networks
Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin
More informationarxiv: v1 [math.st] 26 Jan 2018
CONCENTRATION OF RANDOM GRAPHS AND APPLICATION TO COMMUNITY DETECTION arxiv:1801.08724v1 [math.st] 26 Jan 2018 CAN M. LE, ELIZAVETA LEVINA AND ROMAN VERSHYNIN Abstract. Random matrix theory has played
More informationLecture 12 : Graph Laplacians and Cheeger s Inequality
CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful
More informationA Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016
A Random Dot Product Model for Weighted Networks arxiv:1611.02530v1 [stat.ap] 8 Nov 2016 Daryl R. DeFord 1 Daniel N. Rockmore 1,2,3 1 Department of Mathematics, Dartmouth College, Hanover, NH, USA 03755
More informationA limit theorem for scaled eigenvectors of random dot product graphs
Sankhya A manuscript No. (will be inserted by the editor A limit theorem for scaled eigenvectors of random dot product graphs A. Athreya V. Lyzinski C. E. Priebe D. L. Sussman M. Tang D.J. Marchette the
More informationMATH 567: Mathematical Techniques in Data Science Clustering II
This lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 567: Mathematical Techniques in Data Science Clustering II Dominique Guillot Departments
More informationSpectral Clustering. Zitao Liu
Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of
More informationThe Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationBenchmarking recovery theorems for the DC-SBM
Benchmarking recovery theorems for the DC-SBM Yali Wan Department of Statistics University of Washington Seattle, WA 98195-4322, USA Marina Meila Department of Statistics University of Washington Seattle,
More informationBenchmarking recovery theorems for the DC-SBM
Benchmarking recovery theorems for the DC-SBM Yali Wan Department of Statistics University of Washington Seattle, WA 98195-4322, USA yaliwan@washington.edu Marina Meilă Department of Statistics University
More informationAn indicator for the number of clusters using a linear map to simplex structure
An indicator for the number of clusters using a linear map to simplex structure Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep Zuse Institute Berlin ZIB Takustraße 7, D-495 Berlin, Germany
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationSpectral Redemption: Clustering Sparse Networks
Spectral Redemption: Clustering Sparse Networks Florent Krzakala Cristopher Moore Elchanan Mossel Joe Neeman Allan Sly, et al. SFI WORKING PAPER: 03-07-05 SFI Working Papers contain accounts of scienti5ic
More informationNetwork Cross-Validation for Determining the Number of Communities in Network Data
Network Cross-Validation for Determining the Number of Communities in Network Data Kehui Chen and Jing Lei University of Pittsburgh and Carnegie Mellon University August 1, 2016 Abstract The stochastic
More informationLecture 9: Low Rank Approximation
CSE 521: Design and Analysis of Algorithms I Fall 2018 Lecture 9: Low Rank Approximation Lecturer: Shayan Oveis Gharan February 8th Scribe: Jun Qi Disclaimer: These notes have not been subjected to the
More informationCommunity detection in stochastic block models via spectral methods
Community detection in stochastic block models via spectral methods Laurent Massoulié (MSR-Inria Joint Centre, Inria) based on joint works with: Dan Tomozei (EPFL), Marc Lelarge (Inria), Jiaming Xu (UIUC),
More informationHow Robust are Thresholds for Community Detection?
How Robust are Thresholds for Community Detection? Ankur Moitra (MIT) joint work with Amelia Perry (MIT) and Alex Wein (MIT) Let me tell you a story about the success of belief propagation and statistical
More informationMixed Membership Stochastic Blockmodels
Mixed Membership Stochastic Blockmodels (2008) Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg and Eric P. Xing Herrissa Lamothe Princeton University Herrissa Lamothe (Princeton University) Mixed
More informationClustering from Sparse Pairwise Measurements
Clustering from Sparse Pairwise Measurements arxiv:6006683v2 [cssi] 9 May 206 Alaa Saade Laboratoire de Physique Statistique École Normale Supérieure, 24 Rue Lhomond Paris 75005 Marc Lelarge INRIA and
More informationSpectral Clustering on Handwritten Digits Database
University of Maryland-College Park Advance Scientific Computing I,II Spectral Clustering on Handwritten Digits Database Author: Danielle Middlebrooks Dmiddle1@math.umd.edu Second year AMSC Student Advisor:
More informationNetwork Representation Using Graph Root Distributions
Network Representation Using Graph Root Distributions Jing Lei Department of Statistics and Data Science Carnegie Mellon University 2018.04 Network Data Network data record interactions (edges) between
More informationEstimating network edge probabilities by neighbourhood smoothing
Biometrika (27), 4, 4,pp. 77 783 doi:.93/biomet/asx42 Printed in Great Britain Advance Access publication 5 September 27 Estimating network edge probabilities by neighbourhood smoothing BY YUAN ZHANG Department
More informationFinding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October
Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find
More informationBelief Propagation, Robust Reconstruction and Optimal Recovery of Block Models
JMLR: Workshop and Conference Proceedings vol 35:1 15, 2014 Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models Elchanan Mossel mossel@stat.berkeley.edu Department of Statistics
More informationEmpirical Bayes estimation for the stochastic blockmodel
Electronic Journal of Statistics Vol. 10 (2016) 761 782 ISSN: 1935-7524 DOI: 10.1214/16-EJS1115 Empirical Bayes estimation for the stochastic blockmodel Shakira Suwan,DominicS.Lee Department of Mathematics
More informationarxiv: v1 [stat.me] 12 May 2017
Consistency of adjacency spectral embedding for the mixed membership stochastic blockmodel Patrick Rubin-Delanchy *, Carey E. Priebe **, and Minh Tang ** * University of Oxford and Heilbronn Institute
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationSpectral Generative Models for Graphs
Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationPart I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz
Spectral K-Way Ratio-Cut Partitioning Part I: Preliminary Results Pak K. Chan, Martine Schlag and Jason Zien Computer Engineering Board of Studies University of California, Santa Cruz May, 99 Abstract
More informationStatistical Inference on Random Dot Product Graphs: a Survey
Journal of Machine Learning Research 8 (8) -9 Submitted 8/7; Revised 8/7; Published 5/8 Statistical Inference on Random Dot Product Graphs: a Survey Avanti Athreya Donniell E. Fishkind Minh Tang Carey
More informationA spectral clustering algorithm based on Gram operators
A spectral clustering algorithm based on Gram operators Ilaria Giulini De partement de Mathe matiques et Applications ENS, Paris Joint work with Olivier Catoni 1 july 2015 Clustering task of grouping
More informationDistributed Inexact Newton-type Pursuit for Non-convex Sparse Learning
Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology
More informationA physical model for efficient rankings in networks
A physical model for efficient rankings in networks Daniel Larremore Assistant Professor Dept. of Computer Science & BioFrontiers Institute March 5, 2018 CompleNet danlarremore.com @danlarremore The idea
More informationA Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks
Shen HW, Cheng XQ, Wang YZ et al. A dimensionality reduction framework for detection of multiscale structure in heterogeneous networks. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(2): 341 357 Mar. 2012.
More informationMLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 - Clustering Lorenzo Rosasco UNIGE-MIT-IIT About this class We will consider an unsupervised setting, and in particular the problem of clustering unlabeled data into coherent groups. MLCC 2018
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationGraph Detection and Estimation Theory
Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and
More informationSpectral Alignment of Networks Soheil Feizi, Gerald Quon, Muriel Medard, Manolis Kellis, and Ali Jadbabaie
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-205-005 February 8, 205 Spectral Alignment of Networks Soheil Feizi, Gerald Quon, Muriel Medard, Manolis Kellis, and
More informationMatrix estimation by Universal Singular Value Thresholding
Matrix estimation by Universal Singular Value Thresholding Courant Institute, NYU Let us begin with an example: Suppose that we have an undirected random graph G on n vertices. Model: There is a real symmetric
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationManifold Learning for Subsequent Inference
Manifold Learning for Subsequent Inference Carey E. Priebe Johns Hopkins University June 20, 2018 DARPA Fundamental Limits of Learning (FunLoL) Los Angeles, California http://arxiv.org/abs/1806.01401 Key
More informationSpectral thresholds in the bipartite stochastic block model
JMLR: Workshop and Conference Proceedings vol 49:1 17, 2016 Spectral thresholds in the bipartite stochastic block model Laura Florescu New York University Will Perkins University of Birmingham FLORESCU@CIMS.NYU.EDU
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationSemidefinite Programming
Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has
More informationSpectral Clustering. Guokun Lai 2016/10
Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph
More informationJure Leskovec Joint work with Jaewon Yang, Julian McAuley
Jure Leskovec (@jure) Joint work with Jaewon Yang, Julian McAuley Given a network, find communities! Sets of nodes with common function, role or property 2 3 Q: How and why do communities form? A: Strength
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationSPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS
SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised
More informationInformation-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization
Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization Jess Banks Cristopher Moore Roman Vershynin Nicolas Verzelen Jiaming Xu Abstract We study the problem
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationFoundations of Adjacency Spectral Embedding. Daniel L. Sussman
Foundations of Adjacency Spectral Embedding by Daniel L. Sussman A dissertation submitted to The Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy. Baltimore,
More informationApproximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)
Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles
More informationCSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13
CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman January 27, 2015 MVA 2014/2015
More informationThe Forward-Backward Embedding of Directed Graphs
The Forward-Backward Embedding of Directed Graphs Anonymous authors Paper under double-blind review Abstract We introduce a novel embedding of directed graphs derived from the singular value decomposition
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationLecture Semidefinite Programming and Graph Partitioning
Approximation Algorithms and Hardness of Approximation April 16, 013 Lecture 14 Lecturer: Alantha Newman Scribes: Marwa El Halabi 1 Semidefinite Programming and Graph Partitioning In previous lectures,
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationWHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,
WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu
More informationCS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu
CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge
More informationImpact of regularization on Spectral Clustering
Impact of regularization on Spectral Clustering Antony Joseph and Bin Yu December 5, 2013 Abstract The performance of spectral clustering is considerably improved via regularization, as demonstrated empirically
More informationCLOSE-TO-CLEAN REGULARIZATION RELATES
Worshop trac - ICLR 016 CLOSE-TO-CLEAN REGULARIZATION RELATES VIRTUAL ADVERSARIAL TRAINING, LADDER NETWORKS AND OTHERS Mudassar Abbas, Jyri Kivinen, Tapani Raio Department of Computer Science, School of
More informationSpectral thresholds in the bipartite stochastic block model
Spectral thresholds in the bipartite stochastic block model Laura Florescu and Will Perkins NYU and U of Birmingham September 27, 2016 Laura Florescu and Will Perkins Spectral thresholds in the bipartite
More informationDoubly Stochastic Normalization for Spectral Clustering
Doubly Stochastic Normalization for Spectral Clustering Ron Zass and Amnon Shashua Abstract In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that
More informationSpectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms
A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an
More information10-725/36-725: Convex Optimization Prerequisite Topics
10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More informationCommunities, Spectral Clustering, and Random Walks
Communities, Spectral Clustering, and Random Walks David Bindel Department of Computer Science Cornell University 26 Sep 2011 20 21 19 16 22 28 17 18 29 26 27 30 23 1 25 5 8 24 2 4 14 3 9 13 15 11 10 12
More informationLearning to Learn and Collaborative Filtering
Appearing in NIPS 2005 workshop Inductive Transfer: Canada, December, 2005. 10 Years Later, Whistler, Learning to Learn and Collaborative Filtering Kai Yu, Volker Tresp Siemens AG, 81739 Munich, Germany
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationGraph Clustering Algorithms
PhD Course on Graph Mining Algorithms, Università di Pisa February, 2018 Clustering: Intuition to Formalization Task Partition a graph into natural groups so that the nodes in the same cluster are more
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationRobust Motion Segmentation by Spectral Clustering
Robust Motion Segmentation by Spectral Clustering Hongbin Wang and Phil F. Culverhouse Centre for Robotics Intelligent Systems University of Plymouth Plymouth, PL4 8AA, UK {hongbin.wang, P.Culverhouse}@plymouth.ac.uk
More information