Hypothesis testing for automated community detection in networks

Size: px
Start display at page:

Download "Hypothesis testing for automated community detection in networks"

Transcription

1 J. R. Statist. Soc. B (216) 78, Part 1, pp Hypothesis testing for automated community detection in networks Peter J. Bickel University of California at Berkeley, USA and Purnamrita Sarkar University of Texas at Austin, USA [Received November 213. Revised February 215] Summary. Community detection in networks is a key exploratory tool with applications in a diverse set of areas, ranging from finding communities in social and biological networks to identifying link farms in the World Wide Web. The problem of finding communities or clusters in a network has received much attention from statistics, physics and computer science. However, most clustering algorithms assume knowledge of the number of clusters k. We propose to determine k automatically in a graph generated from a stochastic block model by using a hypothesis test of independent interest. Our main contribution is twofold; first, we theoretically establish the limiting distribution of the principal eigenvalue of the suitably centred and scaled adjacency matrix and use that distribution for our test of the hypothesis that a random graph is of Erdó s Rényi (noise) type. Secondly, we use this test to design a recursive bipartitioning algorithm, which naturally uncovers nested community structure. Using simulations and quantifiable classification tasks on real world networks with ground truth, we show that our algorithm outperforms state of the art methods. Keywords: Asymptotic analysis; Community detection; Hypothesis testing; Networks; Stochastic block model; Tracy Widom distribution 1. Introduction Network structured data can be found in many real world problems. Facebook is an undirected network of entities where edges are formed by who knows whom. The World Wide Web is a giant directed network with Web pages as nodes and hyperlinks as edges. Finding community structure in network data is a key ingredient in many graph mining problems. For example, viral marketing targets tightly knit groups in social networks to increase popularity of a brand of product. There are many clustering algorithms in computer science and statistics literature. However, most suffer from a common issue: one has to assume that the number of clusters k is known a priori. For labelled data, a common approach for learning k is cross-validating using held-out data. However, cross-validation requires a large amount of computation, and for sparse graphs it is suboptimal to leave out data. In this paper we address this problem via a hypothesis testing framework based on random-matrix theory. This framework naturally leads to a recursive bipartitioning algorithm, yielding a hierarchical clustering structure of the data. For genetic data, Patterson et al. (26) showed how to combine principal components analysis with random-matrix theory to discover whether the data have cluster structure. This work Address for correspondence: Purnamrita Sarkar, Department of Statistics and Data Sciences, College of Natural Sciences, University of Texas at Austin, Austin, TX 78712, USA. purna.sarkar@austin.utexas.edu 215 Royal Statistical Society /16/78253

2 254 P. J. Bickel and P. Sarkar uses existing results on the limit distribution of the largest eigenvalue of large random covariance matrices. In standard machine learning literature where data points are represented by real-valued features, Pelleg and Moore (2) jointly optimized over the set of cluster locations and number of cluster centres in the kmeans clustering algorithm to maximize the Bayesian information criterion. Hamerly and Elkan (23) proposed a hierarchical clustering algorithm based on the Anderson Darling statistic which tests whether the data assigned to a cluster come from a Gaussian distribution. For network clustering, finding the number of clusters automatically via a series of hypothesis tests has been proposed by Zhao et al. (211), who presented a label switching algorithm for extracting tight clusters from a graph sequentially, using a characterization of an associative cluster. Although the criterion is not probabilistically based, the stopping rule is based on parametric bootstraps from an underlying probability model. They showed attractive consistency properties of their method under block and related models. We take a probabilistic approach, considering the problem of finding the number of clusters in a graph generated from a stochastic block model, which is a widely used model for generating labelled graphs (Holland et al., 1983). We begin by constructing a test of the null hypothesis based on the very rapid computation of the largest eigenvalue of an appropriately centred and scaled adjacency matrix. Our null hypothesis is that there is only one cluster, i.e. the network is generated from an Erdó s Rényi G n,p -graph, where n denotes the number of nodes and p denotes the probability of linkage between a pair of nodes. Existing literature (Lee and Yin, 214) can be used to show that this largest eigenvalue asymptotically has the Tracy Widom distribution. Using recent theoretical results from random-matrix theory, we show that this limit also holds for our statistic, when the probability of an edge p is unknown, and the centring and scaling are done using an estimate of p. Our theory holds for p constant with respect to n, i.e. the dense asymptotic regime where the average degree is growing linearly with n. We are currently investigating the behaviour of the largest eigenvalue when p decays as n. We show how to obtain Bartlett-type corrections (Bartlett, 1937) for our test statistic when the graph is small or sparse. Although we cannot yet establish theory for this correction, we show its effectiveness by using simulations and labelled real world networks. On quantifiable classification tasks on real world networks with ground truth, our method outperforms McAuley and Leskovec s (212) algorithm which has been shown to perform better than known methods for obtaining overlapping clusters in networks. Further, we show that our recursive bipartitioning algorithm gives a multiscale view of smaller communities with different densities nested inside bigger communities. Although our theory applies only to block models, our simulations and data examples show that our method is quite robust to deviations from the block model assumptions (Section 4.1.3). Our paper is organized as follows. Section 2 gives the background on block and other models to be considered in our theory and simulations. Section 3 presents our main results on the hypothesis test and the recursive bipartitioning scheme. We present experimental results on simulated networks and labelled real world networks in Section 4. We conclude with a discussion in Section Background Latent variable models have been explored by many researchers for modelling networks (Raftery et al., 22; Bickel and Chen, 29). The general set-up of a latent variable model assigns n latent random variables Z :=.Z 1, Z 2, :::, Z n / to n nodes in a network. These variables take values in a general space Z. The linkage probabilities between two nodes is specified via a symmetric map h : Z Z [, 1]. Bickel and Chen (29) took Z i to be independent and identically distributed

3 Hypothesis Testing for Community Detection 255 uniform(,1) random variables. Raftery et al. (22) modelled these as positions in some d- dimensional latent space. Handcock et al. (27) proposed to use a mixture of multivariate Gaussian distributions, each for a separate cluster. A stochastic block model is a special class of these models, where Z i is a binary length k vector encoding membership of a node in a cluster. This has been a widely popular model (Snijders and Nowicki, 1997; Bickel and Chen, 29; Birmelé and Ambroise, 211) for modelling community structure in networks Stochastic block model For our theoretical results we focus on community detection in graphs generated from stochastic block models. Informally, a stochastic block model with k classes assigns latent cluster memberships to every node in a graph. We shall denote by n the number of nodes in a graph. Each pair of nodes with identical cluster memberships for the end points have identical probability of linkage, thus leading to stochastic equivalence. Let Z denote an n k binary matrix where each row has exactly one 1 and the ith column has n i 1s, i.e. the ith class has n i nodes with Σ i n i = n. For this paper, we shall assume that Z is fixed and unknown. By definition there are no self-loops. Under this model, the conditional expectation of the adjacency matrix A is E[A Z] = ZBZ T diag.zbz T /,.1/ where diag. / is a diagonal matrix, with diag. / ii =. / ii, i. A is symmetric and the edges are independent Bernoulli trials. Thus, the subgraph that is induced by the nodes in the ith cluster is simply an Erdó s Rényi graph. This stochastic equivalence criterion of a stochastic block model leads to tractable analysis and inference. One may naturally question this criterion while modelling a real network. Depending on the definition of a block, it is possible to have variations within a block based on covariates; however, once we stratify a block by using covariates, it is reasonable to assume that nodes belonging to a stratum of a block behave similarly in terms of making connections, i.e. have stochastic equivalence Degree-corrected stochastic block model A degree-corrected stochastic block model (Karrer and Newman, 211) is an extension of a block model which allows for heterogeneity of degrees within a block. To be concrete, in addition to the class membership vectors Z i present in a block model, this model has an extra set of degree parameters.θ i / n 1. Given these parameters, we have the relationship P.A ij = 1 θ, Z, B/ = θ i θ j Zi T BZ j:.2/ Zhao et al. (212) showed that, if the degree parameters θ take values from a finite discrete set, then a wide range of statistical methods for community detection are consistent. This added condition ensures that θ cannot take many different values, which would be more difficult to learn from a small data set. Under this condition, however, the degree-corrected model becomes a parametric submodel of the standard stochastic block model, i.e. within a block we have more blocks arising from varied degrees. In Section 4, we show that our hypothesis test can be applied to identify different degree strata of a degree-corrected block model with one block. We also show with simulations that for a latent position cluster model, even though the stochastic equivalence condition does not hold, if the mixture components have small variance then our algorithm can identify the clusters correctly. We conclude this section with a note on the applicability of stochastic block models for modelling real world networks.

4 256 P. J. Bickel and P. Sarkar 2.3. Stochastic block models as a histogram approximation While many extensions to a simple block model have been proposed (Karrer and Newman, 211; Airoldi et al., 28), a parallel line of research has been focused on obtaining approximations of real networks with block models with a growing number of clusters (Olhede and Wolfe, 214; Airoldi et al., 213). In particular, Olhede and Wolfe (214) showed that a block model approximation of an unlabelled network is analogous to the use of histograms as non-parametric summaries of an unknown probability distribution. Varying the number of communities or size of communities is in essence equivalent to varying the number of bins or the bandwidth. Their results imply that, under some mild regularity conditions on the limiting linkage probability function (which is also referred to as the graphon ), if we allow k to grow with n, then the block model approximation converges to the true graphon (in mean integrated squared error). In addition, Olhede and Wolfe (214) also showed that block model approximations of student friendship networks (Resnick et al., 1997) and political blog networks (Adamic and Glance, 25) bring out interesting structure. The political blogs network is a symmetrized network of 15 political blogs linking to each other, with two known (ground truth) clusters, i.e. the conservative and the liberal blogs. Olhede and Wolfe (214) showed that a block model approximation of this network returns mostly homogeneous (all liberal or all conservative) blocks. For the student friendship network, they showed that the clusters returned by a block model approximation are often homogeneous in terms of available covariate information, e.g. race and school year of the students. 3. The hypothesis test Deciding whether a stochastic block model has k or k +1 blocks can be thought of as inductively deciding whether there is one block or two. In essence, we develop a hypothesis test to determine whether a graph is generated from an Erdó s Rényi model with matching link probability or not. First we discuss some known properties of Erdó s Rényi graphs. Throughout this paper we assume that the number of clusters k and the edge probabilities are constant, whereas the number of nodes n is growing to. Thus the average degree is growing linearly with n Properties of Erdó s Rényi graphs Let A denote the adjacency matrix of an Erdó s Rényi (n, p) random graph, and let P := E[A]. We shall assume that there are no self-loops and hence A ii =, i. Under the Erdó s Rényi model, P is defined as follows: P = npee T pi,.3/ where e is a length n vector with e i = 1= n, i, and I is the n n identity matrix. We also introduce the normalized matrices A P à := :.4/ {.n 1/p.1 p/} The eigenvalues of à are denoted by λ 1 λ 2 ::: λ n. Let us also define the density of the semicircle law. In particular we have the following definition. Definition 1. Let ρ sc denote the density of the semicircle law, defined as ρ sc.x/ := 1.4 x 2 / + x R:.5/ 2π

5 Hypothesis Testing for Community Detection 257 For Wigner matrices with entries having a symmetric law, the limiting behaviour of the empirical distribution of the eigenvalues was established by Wigner (1958). This distribution converges weakly to the semicircle law defined in equation (5). Also, Tracy and Widom (1994) proved that, for Gaussian orthogonal ensembles (GOEs), λ 1 and λ n, after suitable shifting and scaling, converge to the Tracy Widom distribution with index 1 (TW 1 ). Soshnikov (1999) proved that this universal result at the edge of the spectrum also holds for more general distributions, provided that the random variables have symmetric laws of distribution, all their moments are finite and E[à m ij ].Cm/m for some constant C, and positive integers m. This shows that n 2=3.λ 1 2/ weakly converges to the limit distribution of GOE matrices, i.e. the Tracy Widom law with index 1 for p = 2 1. Recently, Erdó s et al. (212) have removed the symmetry condition and established the edge universality result for general Wigner ensembles. Further Lee and Yin (214) showed a necessary and sufficient condition for having the limiting Tracy Widom law, which shows that n 2=3.λ 1 2/ converges weakly to TW 1 also. If we know the true p, it would be easy to frame a hypothesis test which accepts or rejects the null hypothesis that a network is generated from an Erdó s Rényi graph. First we shall compute θ := n 2=3.λ 1 2/, and then estimate the p-value P.X θ/ from available tables of probabilities for the Tracy Widom distribution. We reject the null hypothesis if the p-value falls below a predefined significance level α. However, we do not know the true parameter p; we can only estimate it within O P.1=n/ error by computing the proportion of pairs of nodes that forms an edge. Let us denote this estimate by ˆp. Thus the matrix at hand is A ˆP, where ˆP is ˆP = nˆpee T ˆpI:.6/ In this paper, we show that the largest eigenvalue of A ˆP also follows the TW 1 -law after suitable shifting and scaling. Theorem 1. Let à := A ˆP {.n 1/ ˆp.1 ˆp/} :.7/ We have the following asymptotic distribution of our test statistic θ: θ := n 2=3 {λ 1.à / 2} d TW 1.8/ where TW 1 denotes the Tracy Widom law with index 1. This is also the limiting law of the largest eigenvalue of GOEs. We give a proof sketch in Appendix A; the details are deferred to the on-line supplementary material. For consistency we need to show that the above statistic θ does not have the Tracy Widom distribution when A is generated from a stochastic block model with k>1 blocks. We show that θ if A is generated from a stochastic block model, as long as the class probability matrix B is diagonally dominant. The diagonally dominant condition leads to clusters with more edges within than those across. A similar condition can be found in Zhao et al. (211) for proving asymptotic consistency of the extraction algorithm for stochastic block models with k =2. Further, Bickel and Chen (29) also noted that, for k =2, the Newman Girvan modularity is asymptotically consistent if diagonal dominance holds, although in general less is needed. Proposition 1. Let A be generated from a stochastic block model with hidden class assignment matrix Z, and probability matrix B (as in equation (1)) whose elements are constants with

6 258 P. J. Bickel and P. Sarkar Table 1. test Algorithm 1: preliminary hypothesis Step 1: A adjacency matrix of G Step 2: ˆp Σ i,j A ij ={n.n 1/} Step 3: Ã.A ˆP/= {.n 1/ ˆp.1 ˆp/} Step 4: θ n 2=3 {λ 1.Ã / 2} Step 5: pval P TW1.X > θ/ Table 2. Algorithm 2: hypothesis test with correction Step 1: ˆp = Σ ij A ij ={n.n 1/} Step 2: θ n 2=3.λ 1 [.A ˆP/= {.n 1/ ˆp.1 ˆp/}] 2/ Step 3: μ TW E TW1 [X] Step 4: σ TW var TW1.X/ Step 5: for i = 1,:::,5do Step 6: A i Erdó s Rényi.n, ˆp/ Step 7: θ i n 2=3 [λ 1.A ˆP/= {.n 1/ ˆp.1 ˆp/} 2] Step 8: ˆμ n,ˆp mean.{θ i }/ Step 9: ˆσ n,ˆp standard deviation.{θ i }/ Step 1: θ μ TW + {.θ ˆμ n,ˆp /= ˆσ n,ˆp }σ TW Step 11: pval P TW1.X > θ / respect to n. If, i, B ii > Σ j i B ij,wehave θ := n 2=3 {λ 1.Ã / 2} C n 7=6.9/ where C is a deterministic positive constant independent of n. On the basis of theorem 1, we present a preliminary version of our procedure for calculating the p-value of the test statistic (Table 1) A small sample correction Algorithm 1 uses the asymptotic distribution of the test statistic to obtain a p-value. Hence its performance on a finite network depends on how quickly the empirical distribution of the statistic approaches the limiting law. We performed simulation experiments to compare the speed of convergence of our test statistic with that of the scaled largest eigenvalue of GOE matrix ensembles. This is simply a reference point in our comparison, since Tracy Widom distributions were discovered for Gaussian random-matrix ensembles. Our empirical investigation shows that, whereas the largest eigenvalues of GOE matrices converge to the Tracy Widom distribution quite quickly, those of adjacency matrices do not. Moreover the convergence is even slower if p is small, which is so for sparse graphs. We elucidate this issue with some simulation experiments. We generate 1 GOE matrices M, where M ij N., 1/. In Fig. 1, we plot the empirical density of λ 1.M/= n against the true Tracy Widom density. In Figs 1(a) and 1(b), we plot the GOE cases with n equalling 5 and 1 respectively, whereas Figs 1(c) and 1(d) respectively show the Erdó s Rényi cases with n = 5, p = :5, and n = 5, p = :5. This suggests that computing the p-value by using the empirical distribution of λ 1 generated by using a parametric bootstrap step will be better than using the limiting Tracy Widom distribution. However, this will be computationally expensive, since it would have to be carried out at every level of the recursion in algorithm 3 later. Instead we note that, if one can learn the shift

7 Hypothesis Testing for Community Detection (a) (b) (c) (d) Fig. 1. Empirical distributions of largest eigenvalues plotted against the limiting Tracy Widom law ( ): (a) GOE matrices with nd5; (b) GOE matrices with nd5; (c) Erdó s Rényi graphs with nd5 and pd:5; (d) Erdó s Rényi graphs with n D 5 and p D :5. and scale of the bootstrapped empirical distribution, it can be well approximated by the limiting TW 1 -law. Hence we propose to do a few simulations to compute the mean and the variance of the distributions, and then shift and scale the test statistic to match the first two moments of the limiting TW 1 -law. In Fig. 2, we plot the empirical distribution of 1 bootstrap replicates. Figs 2(a) and 2(b) show how the empirical distribution of λ 1 differs from the limiting TW 1 -law. In Figs 2(c) and 2(d) we show the shifted and scaled version of this empirical distribution, where the mean and variance of the empirical distribution are estimated by using 1 samples drawn from the respective Erdó s Rényi models. We can see that Figs 2(c) and 2(d) are a much better fit to the Tracy Widom distribution. Finally, in Figs 2(e) and 2(f) we have the corrected empirical distributions where the mean and variance are estimated from 5 random samples. Although this is not as good a fit as Figs 2(c) and 2(d), it is not much worse. We note that these corrections are akin to Bartlett-type corrections (Bartlett, 1937) to likelihood ratio tests, which propose a family of limiting distributions, all scaled variants of the well-known χ 2 -limit, and estimate the best fit by using the data at hand. On the basis of this discussion, we now present algorithm 2 (Table 2), which is a modified version of algorithm 1.

8 26 P. J. Bickel and P. Sarkar (a) (c) (e) (b) (d) (f) Fig. 2. Corrected empirical distributions of largest eigenvalues computed using 1 bootstrap replicates from an Erdó s Rényi graph with matching parameters against the limiting Tracy Widom law ( ): (a), (b) original uncorrected empirical distribution; (c), (d) corrected version with shift and scale estimated by using 1 samples; (e), (f) corrected version with shift and scale estimated by using 5 samples; (a), (c), (e) generated from G 5,:5 ; (b), (d), (f) generated from G 2,:5

9 Hypothesis Testing for Community Detection 261 Table 3. Algorithm 3: recursive bipartitioning of networks by using Tracy Widom theory Step 1: function recursive bipartition.g, α/ Step 2: pval HypothesisTest.G/ Step 3: if pval < α then Step 4:.G 1, G 2 / bipartition.g/ Step 5: recursive bipartition.g 1, α/ Step 6: recursive bipartition.g 2, α/ 3.3. Recursive algorithm We are now ready to present the recursive clustering scheme in algorithm 3 (Table 3). For the fourth step in algorithm 3 we use the regularized version of spectral clustering that was introduced in Amini et al. (213). We want to emphasize that the choice of spectral clustering is not connected to the hypothesis test. One can use any other method for partitioning the graph Relationship to Zhao et al. (211) We conclude this section with a brief discussion of the similarities and differences of our work with the method in Zhao et al (211). The main difference is that they focused on finding and extracting communities which maximize a ratio-cut-type criterion. We in contrast do not prescribe a clustering algorithm. The clustering step in algorithm 3 is not tied to our hypothesis test and can easily be replaced by their community extraction algorithm. Computationally, our hypothesis testing step is faster, because we propose to estimate the mean and variance of the empirical distribution by using the bootstrap, not the distribution itself. This is possible because the limiting distribution is provably Tracy Widom, and small sample corrections can be made cheaply by generating fewer bootstrap samples. Further, their extraction step is based on a label switching algorithm, which is inherently much slower than a spectral bipartitioning step on the first two eigenvectors of the data matrix, which is what we use. This gives us another computational boost. Finally, another difference is that they did a sequential extraction; the hypothesis test is applied sequentially on the complement of the communities extracted so far. We, in contrast, find the communities recursively, thus leading to a natural hierarchical clustering. Thus, if there is a nested community structure inside an extracted community, this sequential strategy would miss that. There has been interesting work aimed at finding the number of blocks in a stochastic block model, which also does not look for nested structure (Chatterjee, 215; Lei, 214). Thus, our method has added advantage in a restrictive albeit important setting, since many networks naturally have hierarchical cluster structure. We also demonstrate this in our simulated experiments. 4. Experiments In this section, we present experiments on simulated data (Section 4.1), robustness to deviations from block model properties (Section 4.1.3) and real world networks (Section 4.2) Various types of block models Our simulations show two properties of our hypothesis test. First we show that it can differentiate an Erdó s Rényi graph from another with a small dense cluster planted in it. Secondly we show that, although our theory holds only for probability of linkage p fixed with respect to n, our algorithm works for sparse graphs as well.

10 262 P. J. Bickel and P. Sarkar Planted small cluster Using the same set-up as in Zhao et al. (211), we plant a densely connected small cluster in an Erdó s Rényi graph. In essence we are looking at a stochastic block model with n = 1, and n 1 nodes in cluster 1. The block model parameters are B 11 = :15 and B 22 = B 12. We plot error bars from 5 random runs on the p-values against increasing n 1 -values in Fig. 3(a) and p-values against increasing B 12 values in Fig. 3(b). A larger p-value simply means that the hypothesis test considers the graph to be close to an Erdó s Rényi graph. In Fig. 3(a) we see that the p-values decrease as n 1 increases from 3 to 1. This is expected since the planted cluster is easier to detect as n 1 grows. In contrast, in Fig. 3(b) we see that the p-values increase as P 12 is increased from.4 to.1. This is also expected since the graph is indeed losing its block structure Nested stochastic block models We present a nested stochastic block model, where the communities become increasingly dense. Specifically, B 11 = B 22 = ρa, B 12 = ρb, B 13 = B 23 = ρc and B 33 = ρd, where a = :2, b = :1 and c = :1. As we increase ρ from.5 to 5 in steps of.5, the average expected degree of an n = 1-node graph increases from 2:8to13:8. We plot error bars on p-values from 5 random runs. Similarly to Zhao et al. (211), we use the adjusted Rand index (referred to as RB), which is a well-known measure of closeness between two sets of clusterings with n 1 = n 2 = 2 and n 3 = 6. Fig. 4 shows that the adjusted Rand index grows as the average degree increases. This also demonstrates that, although theory holds only for fixed p with respect to n, in practice our recursive bipartitioning algorithm works for sparse graphs as well. We used a p-value cut-off of.1 for the simulation experiments. Finally, we compare our method with that of Zhao et al. (211). In Table 4 we show the adjusted Rand index score (referred to as E) obtained by using the E and RB algorithms for our nested block model setting with the largest expected degree. In this particular case, the E algorithm first extracts the community containing communities 1 and 2, and then tries to extract another community from the remainder of the graph, leading to poor performance. This accuracy can be improved by changing their sequential extraction strategy with a recursive strategy p value.6 p value n 1 B 12 (a) (b) Fig. 3. p-values computed by using algorithm 2 in simulated networks of n D 1: (a) B 11 D.15 and B 12 D B 22 D.5; (b) n 1 D 1

11 Hypothesis Testing for Community Detection Adjusted Rand Index Average degree Fig. 4. Adjusted Rand index averaged over 5 random runs: a higher value indicates that the estimated clustering is closer to the true clustering Table 4. Comparison with the community extraction algorithm E averaged over 5 random runs Algorithm Adjusted Rand index E.55 ±.3 RB.88 ± Robustness to deviation from block model assumptions We conclude our experimental section with some simulations to demonstrate robustness. We shall first demonstrate this with an example of a degree-corrected stochastic block model with one block, where stochastic equivalence holds within every degree stratum (nodes in the same block with identical expected degrees). Next, we shall show that even if stochastic equivalence does not hold exactly, i.e. the probability of linkage of all pairs in a block are not identical, but close, then our hypothesis test and the recursive partitioning scheme lead to accurate partitions Degree-corrected stochastic block model. Consider a degree-corrected block model (equation (2)) where θ takes three different values. We applied our recursive algorithm equipped with simple degree clustering to data generated from this model with one block and the parameters given in equation (1). For identifiability θ is arbitrary within a multiplicative constant, and we chose this constant such that the average degree is about 25. Our algorithm assigned 94% of the nodes correctly to their respective degree strata, averaged over 1 random runs. We show the hierarchical cluster structure that was obtained from our algorithm from one such network in Fig. 5. Note that this model is basically a parametric submodel of a block model with three blocks. However, the conditional expectation matrix of this model is rank 1, not rank 3. We give an intuitive explanation of why our algorithm works in this setting. It is well known (Füredi and

12 264 P. J. Bickel and P. Sarkar Fig. 5. Block structure of the adjacency matrix from a degree-corrected stochastic block model and the partitions made by the recursive algorithm (nz denotes twice the number of edges) Komlós, 1981) that the principal eigenvector of an Erdó s Rényi graph is closely approximated by the all-1s vector. Intuitively, centring the adjacency matrix removes the contribution of the principal eigenvector. Thus, λ 1.A/ with A Erdó s Rényi.n, p/ has largest eigenvalue equal to np + O P.1/, whereas all other eigenvalues are O P {.np/}. However, λ 1.A ˆp11 T / is O P {.np/}. Further we show that our test statistic (a scaled and centred version of the largest eigenvalue) is O P.1/. In contrast, the principal empirical eigenvector of the adjacency matrix A of the degree-corrected model specified above is not well approximated by the all-1s vector. Using standard concentration tools from random-matrix theory (Oliveira, 29), it can be shown to be close (in Frobenius norm) to the population eigenvector, which is a blockwise constant vector with three blocks arising from the three degree strata. Thus, centring the adjacency matrix does not remove the contribution of the principal eigenvector. Empirically we see the test statistic is of a larger order than it would be under the Erdó s Rényi model. As a result, the hypothesis test rejects the null hypothesis, i.e. the adjacency matrix generated from the degree-corrected model is an Erdó s Rényi graph. The test keeps splitting until we actually reach the subgraphs that are induced by the degree strata, which indeed are Erdó s Rényi graphs. We used n = 1 and k = 1 and the following model parameters: { 1 1 i 2, θ i 5 21 i 5,.1/ i 1: Latent space models. Next we apply our method to a model akin to latent position cluster models (Handcock et al., 27). In particular, node i is assigned a latent position ψ i in a two-dimensional space. The positions of the nodes in the ith cluster are generated from an N.4i, σ 2 / distribution. We specify P.A ij = 1 ψ i, ψ j / := exp. αd ij + β/={1 + exp. αd ij + β/}, where d ij = ψ i ψ j 2. Note that the linkage probabilities within a cluster are now no longer identical, but close, depending on the variance of a cluster σ. As σ increases, the disparity of linkage probabilities within a cluster increases, and we expect our algorithm to split a block

13 Hypothesis Testing for Community Detection 265 further to find homogeneous structure. Also, with increasing σ there is more overlap between clusters leading to harder clustering. Fig. 6(a) shows the clear block structure of the adjacency matrix for σ =:1, whereas Fig. 6(b) shows the diminished block structure for σ = 3:1. In both Fig. 6(a) and Fig. 6(b) the rows and columns are reordered so that all nodes with latent positions generated from the same Gaussian distribution are placed together. In Fig. 6(c) we show the classification error rate as σ grows. For small values of σ, the model is well approximated by a stochastic block model with four blocks, and our recursive partitioning algorithm finds the four clusters accurately. As σ grows, the performance deteriorates as the clustering algorithm finds additional structure to capture the overlaps between the clusters Real networks Now we present results on real world networks with known labels. We compare our algorithm s (a) (b).7.6 Classification Error σ (c) Fig. 6. Block structure of latent space graphs, where the latent positions are generated from four Gaussian distributions with identical variance σ 2 and separated means: (a), (b) block structures for networks generated with small.σ D.1/ and large.σ D3.1/ σ respectively (nz denotes twice the number of edges); (c) classification error computed against the data-generating cluster assignments as σ grows

14 266 P. J. Bickel and P. Sarkar performance with state of the art clustering methods on Facebook ego networks (Section 4.2.1) and karate club data and political books networks (Section 4.2.2) Facebook ego networks We show our results on ego networks manually collected and labelled by McAuley and Leskovec (212). Here we have a collection of nine networks which are induced subgraphs formed by neighbours of a node. The central node is called the ego node. The ground truth labels consist of overlapping cluster assignments, also known as circles. The hope is to identify social circles of the ego node by examining the network structure and features of nodes. Whereas McAuley and Leskovec s (212) work takes node features into account, we work only with the network structure. For every network we remove nodes with zero degree and cluster the remaining nodes. Since ground truth clusters are sometimes incomplete, in the sense that not all nodes are assigned to some cluster, we use the F-score for comparing two clusterings. Consider the ground truth cluster C and the computed cluster Ĉ. The F-measure between these is defined as follows: recall.c, Ĉ/= C Ĉ, C precision.c, Ĉ/= C Ĉ, Ĉ 2precision.C, Ĉ/recall.C, Ĉ/ F.C, Ĉ/= precision.c, Ĉ/+ recall.c, Ĉ/ : This was extended to hierarchical clusterings by Larsen and Aone (1999). For ground truth cluster C i, one computes x i = max j {F.C i, Ĉ j /}, where Ĉ j is obtained by flattening out the subtree for node j in the hierarchical clustering tree. Now the overall F-measure is obtained by computing a weighted average Σ i x i C i =Σ j C j. For the real data we use a cut-off (α in algorithm 3) of.1. We can also stop dividing the graph, when the subgraph size falls under a given number, say n β. Although we report results without any such stopping conditions added, we note that, for n β = 1, the F-measures are similar, whereas the numbers of clusters are fewer. In Table 5, we compare our recursive bipartitioning algorithm RB with McAuley and Leskovec s (212) by using the code that was kindly shared by Julian McAuley. We see that we obtain better or comparable F-measures for most of the ego networks. To visualize the cluster structure uncovered by algorithm RB, we present Fig. 7, in which we show Table 5. F -measure comparison on nine Facebook ego networks Nodes with Number of ground F-measure (McAuley Number of clusters F-measure non-zero degree truth clusters and Leskovec, 212) learned by RB

15 Hypothesis Testing for Community Detection 267 a density image of a matrix, whose rows and columns are ordered such that all nodes in the same subtree appear consecutively. Thus nodes in every subtree correspond to a diagonal block in Fig. 7(a). Also, a subtree belonging to a parent subtree will give rise to a diagonal block contained inside that of the parent subtree. This helps us to see the hierarchical structure. Further, we shade every diagonal block by using the ˆp computed from the subgraph induced by nodes in the subtree corresponding to it. In Fig. 7(a), we plot this matrix for one of the ego networks on the log-scale. The lighter the shading in a block is, the higher the corresponding ˆp. To match this image with the graph itself, we also plot the adjacency matrix with rows and columns ordered identically in Fig. 7(b). The density plot shows that the hierarchical splits find regions of varied densities Karate club and the political books network The karate club data are a well-known network which has 34 individuals belonging to a karate club. Later, the members split into two groups after a disagreement on class fees (Zachary, 1977). These two groups are considered the ground truth communities. In Fig. 8 we present the clusterings that are obtained by using the various algorithms. In particular, we show the clusterings that are obtained by using the extraction method (algorithm E) in Fig. 8(a), the (a) (b) Fig. 7. (a) Density plot for one ego network with rows ordered to have nodes from the same cluster consecutively and (b) adjacency matrix using the same order (a) (b) (c) (d) Fig. 8. Clusters obtained from the karate club network by using (a) community extraction, (b) pseudolikelihood, (c) recursive bipartitioning with a p-value cut-off of.1 and (d) recursive bipartitioning with a p-value cut-off of.1

16 268 P. J. Bickel and P. Sarkar (a) (b) Fig. 9. (a) Density plot of the karate club data with rows ordered to have nodes from the same cluster consecutively and (b) adjacency matrix using the same order pseudolikelihood method (algorithm PL) with k = 3 (Amini et al., 213) in Fig. 8(b), our recursive bipartitioning algorithm RB using a p-value cut-off of :1 in Fig. 8(c) and finally algorithm RB with a p-value cut-off of :1 in Fig. 8(d). These results were generated by using the code of Yunpeng Zhao and Aiyou Chen. We see that algorithm E finds the cores of the two communities, algorithm PL puts high degree nodes in one cluster (similarly to the Markov chain Monte Carlo method for fitting a stochastic block model in Zhao et al. (211)). Our method achieves perfect clustering for a p-value cut-off of.1. However, our statistic computed from the dark grey group has a p-value of about.3, which is why we also show the clustering with a larger cut-off. Here the dark grey community is broken further into a clique-like subset of nodes, and the rest. We also provide a density plot in Fig. 9(a) and an image of the adjacency matrix with rows and columns ordered similarly to the density plot in Fig. 9(b) to elucidate this issue. The political books network (Newman, 26) is an undirected network of 15 books. Two books are connected if they are co-purchased frequently on Amazon. Although the ground truth is not available on this data set, the common conjecture (Zhao et al., 211) is that some books are strongly political, i.e. liberal or conservative, and the others are somewhat in between. Zhao et al. (211) also showed that existing algorithms give reasonable results with k = 3 clusters, and algorithm E returned the cores of the communities with k = 2. We show clustering obtained by using algorithm PL with k =3 in Fig. 1(a), the two communities that are extracted by algorithm E in Fig. 1(b), clustering by algorithm RB in Fig. 1(c) and finally our density plot in Fig. 1(d). Algorithm E finds the core set of nodes from the medium grey and dark grey clusters found by algorithm PL. In contrast, algorithm RB breaks the graph into six parts. The first split is between the dark grey nodes with the rest. The second split separates the light grey nodes from the medium grey nodes. The next two splits divide the medium grey nodes and the dark grey nodes into further smaller clusters. We overlay the density plot with the row and column reordered adjacency matrix, so that the brightest pixels correspond to an edge. The ordering simply puts nodes from the same cluster consecutively, and clusters in the same subtree consecutively. Fig. 1 shows the hierarchically nested structure, where we pick up denser subgraphs.

17 Hypothesis Testing for Community Detection 269 (a) (b) (c) (d) Fig. 1. Clusterings of the political books data: (a) pseudolikelihood; (b) community extraction; (c) recursive bipartitioning; (d) subgraph density plot superimposed with the adjacency matrix 5. Discussion In this paper we have proposed an algorithm which provably detects the number of blocks in a graph that is generated from a stochastic block model. Using the largest eigenvalue of the suitably shifted and scaled adjacency matrix, we develop a hypothesis test to decide whether the graph is generated from a stochastic block model with more than one block. Our approach is significantly different from existing work because we theoretically establish the limiting distribution of the statistic under the null hypothesis, which in our case is that the graph is an Erdó s Rényi graph. We also propose to obtain small sample corrections on the limiting distribution, which, together with the known form of the limiting law, alleviates the need for expensive parametric bootstrap replicates. Using this hypothesis test we design the recursive bipartitioning algorithm RB which naturally yields a hierarchical cluster structure. Strictly speaking, we have proved the validity of our bipartitioning algorithm for k = 2 only.

18 27 P. J. Bickel and P. Sarkar The difficulty is that there is apparently no guarantee that, once we have rejected k = 1 and partitioned by using ordinary spectral biclustering, the resulting two partitions are disjoint unions of distinct sets of the true blocks. However, it can be shown that, in the dense regime, under diagonal dominance a slight modification of spectral biclustering can split the network into two partitions, such that each partition is a disjoint union of ground truth clusters with probability tending to 1. Hence, for a finite k, it should be possible to show that, after each test, with high probability we are testing on unions of disjoint sets of the true blocks. This will be argued elsewhere. For this paper, we demonstrate that ordinary spectral clustering works well. On nine real data sets with ground truth from Facebook, algorithm RB outperforms the existing method that has been shown to have the best performance among other state of the art algorithms for finding overlapping clusters. We also show the nested cluster structure of varied densities that is discovered by algorithm RB on the karate club data and the political books data. Our experiments on the karate club and political books network is not aimed at showing that we find better quality clusters, but that we find interesting structure matching with existing work without having to specify k. We choose spectral clustering because of its good theoretical properties in the context of block models (Rohe et al., 211) and its computational scalability. Acknowledgements We thank Elizaveta Levina, Yunpeng Zhao, Aiyou Chen and Julian McAuley for sharing their code. We are also grateful to Antti Knowles for pointing out the relevant literature for applying the result on isotropic delocalization of eigenvectors to our setting. This research was funded in part by National Science Foundation Focused Research Group on Networks grant DMS Appendix A: Proof of main result In this section we shall present proof sketches of theorem 1 and proposition 1. The complete proof with details is included in the on-line supplementary material. Our proof uses the following machinery developed in random-matrix theory in recent years. For ease of understanding, we shall state some results without rigorous statements, which will be given in detail in the supplementary material. We shall now present Weyl s interlacing inequality, which will be used heavily in our proof. A.1. Weyl s interlacing inequality Let B 1 be an n n real symmetric matrix and B 2 = B 1 + dxx T, where d> and x R n. Denoting the ith largest eigenvalue of matrix. / by λ i. / we have λ n.b 1 / λ n.b 2 / λ n 1.B 1 / ::: λ 2.B 2 / λ 1.B 1 / λ 1.B 2 /:.11/ An immediate corollary of this result is that, for d<, λ n.b 2 / λ n.b 1 / λ n 1.B 2 / ::: λ 2.B 1 / λ 1.B 2 / λ 1.B 1 /.12/ Let ˆp := Σ ij A ij ={n.n 1/}, and let e denote the normalized n 1 vector of all 1s. As in equation (6), ˆP is the empirical version of P (equation (3)). Lemma 1. Let à 1 := à + n.p ˆp/ee T = {.n 1/p.1 p/}. Also let λ 1 λ 2 ::: λ n be the eigenvalues of à and μ 1 μ 2 ::: μ n be the eigenvalues of à 1.Ifp is a constant with respect to n,wehave μ 1 λ 1 = o P.1=n/. A.2. Proof sketch of theorem 1 Let λ i and v i be the ith eigenvalues and eigenvectors of à respectively, where λ i λ i+1, i {1, :::, n 1}.

19 Hypothesis Testing for Community Detection 271 Also, let μ i and u i be the ith eigenvalues and eigenvectors of à 1 respectively, also arranged in decreasing order of μ i. Let G :=.à zi/ 1 and G 1 :=.à 1 zi/ 1 be the resolvents of à and à 1. Let n. ˆp p/ c n := : {p.1 p/} {n=.n 1/} We note that the matrices à and à 1 differ by a random multiple of the all-1s matrix: à = à 1 + c n ee T :.13/ This equation also gives λ 1 μ 1 c n =O P.1= n/,.14/ which is true because ˆp is the average of n.n 1/=2 independent and identically distributed Bernoulli coins, and thus c n =O P.1= n/ for p constant with respect to n. However, this error masks the n 2=3 -scale of the Tracy Widom law. Equation (13) also gives the identity e T.G.z/ G 1.z//e = c n.e T G.z/e/.e T G 1.z/e/, 1 e T G 1.z/e 1 e T G.z/e = c n: Since 1={e T G 1.μ 1 /e} =, we have e T G.μ 1 /e = 1=c n. Further, using Weyl s interlacing result in Appendix A.1 we see that the eigenvalues of à and à 1 interlace. Since G.z/ s eigenvalues and vectors are given by 1=.λ i z/ and v i respectively, we have 1 = e T G.μ 1 /e = c n i We shall now do a case-by-case analysis..e T v i / 2 λ i μ 1 :.15/ A.2.1. Case c n > In this case the interlacing result (equation (11)) tells us that λ 1 μ 1 λ i, i>1. Thus we have.e T v 1 / 2.e T v i / 2 = 1 λ 1 μ 1 c n.e T v 1 / 2 :.16/ λ 1 μ 1 i>1 μ 1 λ i c n A.2.2. Case c n < In this case the interlacing result (equation (12)) tells us that μ 1 λ 1 μ 1 λ i, i>1. We now divide the eigenvalues λ i into two groups: one with μ 1 λ i 2 c n (denoted by S cn ), and the other with μ 1 λ i >2 c n. Since Σ i.v T i e/2 = 1, we have 1 c n = i.e T v i / 2.e T v i / μ 1 λ i i S cn μ 1 λ i 2 c n : Further, since μ 1 λ 1 μ 1 λ i, i>1, μ 1 λ 1 2 c n.e T v i / 2 :.17/ i S cn Let cn = c n1.c n < /. Combining equations (16) and (17) we see that { λ 1 μ 1 c n max 2 }.e T v i / 2,.e T v 1 / 2 :.18/ i S c n We can now use the fact that, in the bulk, it is possible to estimate the empirical eigenvalue density of general Wigner ensembles by using the semicircle law of Erdó s et al. (212). Using probabilistic bounds on μ 1 and ˆp we can show that S c n =Õ P.n 1=4 /:.19/

20 272 P. J. Bickel and P. Sarkar c 1 Fig. 11. Semicircle distribution The details are presented in the on-line supplementary material. Now we shall use another result (theorem 2.16, Bloemendal et al. (214)) which shows that, under some broad conditions, the projection of a deterministic vector on any eigenvector of a symmetric Wigner ensemble is uniformly Õ P.1= n/. Here the Õ P.ξ/ notation denotes a sequence of random variables which are bounded in probability by some non-negative random variable ξ up to small powers of n. This yields i S c n.e T v i / 2 = Õ P.n 3=4 /:.2/ Since.e T v 1 / 2 = Õ P.1=n/ by using the aforementioned result, equation (18) in conjunction with equation (2) yields λ 1 μ 1 =Õ P.n 5=4 /. The precise definition of the Õ-notation ensures that Õ P.n 5=4 / is o P.1=n/ for sufficiently large n. We make this more precise in the on-line supplementary material. Now theorem 1 follows by using a series of simple algebraic manipulations, which we defer to the on-line supplementary material. A.3. Proof of proposition 1 If B ii > Σ j i B ij, then B is a positive definite matrix by diagonal dominance. Hence, ZBZ T is also positive definite. Since we are considering the dense regime of degrees, i.e. where the elements of B are constant with respect to n, the k largest eigenvalues of E[A Z] (equation (1)) are of the form C i n, where C i,1 i k, are positive constants. Oliveira (29) showed that λ i.a/ = λ i.e[a Z]/ + O P [ {n log.n/}]. Hence, with high probability, the k largest eigenvalues of A will be positive. Using Weyl s identity we have λ 2.A/ λ 1.A ˆP/ λ 1.A/. Thus, with high probability, λ 1.A ˆP/ Cn for some positive constant C. Thus, for large n, λ 1.Ã / C n with high probability and, since θ := n 2=3 {λ 1.Ã / 2}, the result is proved. References Adamic, L. A. and Glance, N. (25) The political blogosphere and the 24 U.S. election: divided they blog. In Proc. 3rd Int. Wrkshp Link Discovery. New York: Association for Computing Machinery. Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (28) Mixed membership stochastic blockmodels. J. Mach. Learn. Res., 9, Airoldi, E. M., Costa, T. B. and Chan, S. H. (213) Stochastic blockmodel approximation of a graphon: theory and consistent estimation. In Advances in Neural Information Processing Systems (eds C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Weinberger), vol. 26, pp Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (213) Pseudo-likelihood methods for community detection in large sparse networks. Ann. Statist., 41, Bartlett, M. S. (1937) Properties of sufficiency and statistical tests. Proc. R. Soc. Lond., 16, Bickel, P. J. and Chen, A. (29) A nonparametric view of network models and Newman Girvan and other modularities. Proc. Natn. Acad. Sci. USA, 16, Bloemendal, A., Erdó s, L., Knowles, A., Yau, H.-T. and Yin, J. (214) Isotropic local laws for sample covariance and generalized Wigner matrices. Electron. J. Probab., 19, Chatterjee, S. (215) Matrix estimation by universal singular value thresholding. Ann. Statist., 43, Erdó s, L., Yau, H.-T. and Yin, J. (212) Rigidity of eigenvalues of generalized Wigner matrices. Adv. Math., 229, Füredi, Z. and Komlós, J. (1981) The eigenvalues of random symmetric matrices. Combinatorica, 1,

arxiv: v1 [stat.ml] 29 Jul 2012

arxiv: v1 [stat.ml] 29 Jul 2012 arxiv:1207.6745v1 [stat.ml] 29 Jul 2012 Universally Consistent Latent Position Estimation and Vertex Classification for Random Dot Product Graphs Daniel L. Sussman, Minh Tang, Carey E. Priebe Johns Hopkins

More information

Extreme eigenvalues of Erdős-Rényi random graphs

Extreme eigenvalues of Erdős-Rényi random graphs Extreme eigenvalues of Erdős-Rényi random graphs Florent Benaych-Georges j.w.w. Charles Bordenave and Antti Knowles MAP5, Université Paris Descartes May 18, 2018 IPAM UCLA Inhomogeneous Erdős-Rényi random

More information

Statistical and Computational Phase Transitions in Planted Models

Statistical and Computational Phase Transitions in Planted Models Statistical and Computational Phase Transitions in Planted Models Jiaming Xu Joint work with Yudong Chen (UC Berkeley) Acknowledgement: Prof. Bruce Hajek November 4, 203 Cluster/Community structure in

More information

Theory and Methods for the Analysis of Social Networks

Theory and Methods for the Analysis of Social Networks Theory and Methods for the Analysis of Social Networks Alexander Volfovsky Department of Statistical Science, Duke University Lecture 1: January 16, 2018 1 / 35 Outline Jan 11 : Brief intro and Guest lecture

More information

A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities

A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities Journal of Advanced Statistics, Vol. 3, No. 2, June 2018 https://dx.doi.org/10.22606/jas.2018.32001 15 A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities Laala Zeyneb

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

Estimating network edge probabilities by neighbourhood smoothing

Estimating network edge probabilities by neighbourhood smoothing Biometrika (27), 4, 4,pp. 77 783 doi:.93/biomet/asx42 Printed in Great Britain Advance Access publication 5 September 27 Estimating network edge probabilities by neighbourhood smoothing BY YUAN ZHANG Department

More information

Invertibility of random matrices

Invertibility of random matrices University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks Shen HW, Cheng XQ, Wang YZ et al. A dimensionality reduction framework for detection of multiscale structure in heterogeneous networks. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(2): 341 357 Mar. 2012.

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Graph Detection and Estimation Theory

Graph Detection and Estimation Theory Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and

More information

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity CS322 Project Writeup Semih Salihoglu Stanford University 353 Serra Street Stanford, CA semih@stanford.edu

More information

On the concentration of eigenvalues of random symmetric matrices

On the concentration of eigenvalues of random symmetric matrices On the concentration of eigenvalues of random symmetric matrices Noga Alon Michael Krivelevich Van H. Vu April 23, 2012 Abstract It is shown that for every 1 s n, the probability that the s-th largest

More information

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY OLIVIER GUÉDON AND ROMAN VERSHYNIN Abstract. We present a simple and flexible method to prove consistency of semidefinite optimization

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

Reconstruction in the Generalized Stochastic Block Model

Reconstruction in the Generalized Stochastic Block Model Reconstruction in the Generalized Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign GDR

More information

The non-backtracking operator

The non-backtracking operator The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Grouping of correlated feature vectors using treelets

Grouping of correlated feature vectors using treelets Grouping of correlated feature vectors using treelets Jing Xiang Department of Machine Learning Carnegie Mellon University Pittsburgh, PA 15213 jingx@cs.cmu.edu Abstract In many applications, features

More information

A limit theorem for scaled eigenvectors of random dot product graphs

A limit theorem for scaled eigenvectors of random dot product graphs Sankhya A manuscript No. (will be inserted by the editor A limit theorem for scaled eigenvectors of random dot product graphs A. Athreya V. Lyzinski C. E. Priebe D. L. Sussman M. Tang D.J. Marchette the

More information

Two-sample hypothesis testing for random dot product graphs

Two-sample hypothesis testing for random dot product graphs Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms : Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

Learning latent structure in complex networks

Learning latent structure in complex networks Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016 A Random Dot Product Model for Weighted Networks arxiv:1611.02530v1 [stat.ap] 8 Nov 2016 Daryl R. DeFord 1 Daniel N. Rockmore 1,2,3 1 Department of Mathematics, Dartmouth College, Hanover, NH, USA 03755

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

1 Tridiagonal matrices

1 Tridiagonal matrices Lecture Notes: β-ensembles Bálint Virág Notes with Diane Holcomb 1 Tridiagonal matrices Definition 1. Suppose you have a symmetric matrix A, we can define its spectral measure (at the first coordinate

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University

More information

Communities, Spectral Clustering, and Random Walks

Communities, Spectral Clustering, and Random Walks Communities, Spectral Clustering, and Random Walks David Bindel Department of Computer Science Cornell University 26 Sep 2011 20 21 19 16 22 28 17 18 29 26 27 30 23 1 25 5 8 24 2 4 14 3 9 13 15 11 10 12

More information

arxiv: v1 [math.st] 16 Aug 2011

arxiv: v1 [math.st] 16 Aug 2011 Retaining positive definiteness in thresholded matrices Dominique Guillot Stanford University Bala Rajaratnam Stanford University August 17, 2011 arxiv:1108.3325v1 [math.st] 16 Aug 2011 Abstract Positive

More information

Network Cross-Validation for Determining the Number of Communities in Network Data

Network Cross-Validation for Determining the Number of Communities in Network Data Network Cross-Validation for Determining the Number of Communities in Network Data Kehui Chen and Jing Lei University of Pittsburgh and Carnegie Mellon University August 1, 2016 Abstract The stochastic

More information

An indicator for the number of clusters using a linear map to simplex structure

An indicator for the number of clusters using a linear map to simplex structure An indicator for the number of clusters using a linear map to simplex structure Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep Zuse Institute Berlin ZIB Takustraße 7, D-495 Berlin, Germany

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

Matrix estimation by Universal Singular Value Thresholding

Matrix estimation by Universal Singular Value Thresholding Matrix estimation by Universal Singular Value Thresholding Courant Institute, NYU Let us begin with an example: Suppose that we have an undirected random graph G on n vertices. Model: There is a real symmetric

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

A sequence of triangle-free pseudorandom graphs

A sequence of triangle-free pseudorandom graphs A sequence of triangle-free pseudorandom graphs David Conlon Abstract A construction of Alon yields a sequence of highly pseudorandom triangle-free graphs with edge density significantly higher than one

More information

CS224W: Social and Information Network Analysis

CS224W: Social and Information Network Analysis CS224W: Social and Information Network Analysis Reaction Paper Adithya Rao, Gautam Kumar Parai, Sandeep Sripada Keywords: Self-similar networks, fractality, scale invariance, modularity, Kronecker graphs.

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University Learning from Sensor Data: Set II Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University 1 6. Data Representation The approach for learning from data Probabilistic

More information

Network Representation Using Graph Root Distributions

Network Representation Using Graph Root Distributions Network Representation Using Graph Root Distributions Jing Lei Department of Statistics and Data Science Carnegie Mellon University 2018.04 Network Data Network data record interactions (edges) between

More information

Local Kesten McKay law for random regular graphs

Local Kesten McKay law for random regular graphs Local Kesten McKay law for random regular graphs Roland Bauerschmidt (with Jiaoyang Huang and Horng-Tzer Yau) University of Cambridge Weizmann Institute, January 2017 Random regular graph G N,d is the

More information

Random regular digraphs: singularity and spectrum

Random regular digraphs: singularity and spectrum Random regular digraphs: singularity and spectrum Nick Cook, UCLA Probability Seminar, Stanford University November 2, 2015 Universality Circular law Singularity probability Talk outline 1 Universality

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Random Matrices: Invertibility, Structure, and Applications

Random Matrices: Invertibility, Structure, and Applications Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37

More information

Modeling heterogeneity in random graphs

Modeling heterogeneity in random graphs Modeling heterogeneity in random graphs Catherine MATIAS CNRS, Laboratoire Statistique & Génome, Évry (Soon: Laboratoire de Probabilités et Modèles Aléatoires, Paris) http://stat.genopole.cnrs.fr/ cmatias

More information

A Generalization of Wigner s Law

A Generalization of Wigner s Law A Generalization of Wigner s Law Inna Zakharevich June 2, 2005 Abstract We present a generalization of Wigner s semicircle law: we consider a sequence of probability distributions (p, p 2,... ), with mean

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Spectral Clustering for Dynamic Block Models

Spectral Clustering for Dynamic Block Models Spectral Clustering for Dynamic Block Models Sharmodeep Bhattacharyya Department of Statistics Oregon State University January 23, 2017 Research Computing Seminar, OSU, Corvallis (Joint work with Shirshendu

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Near-domination in graphs

Near-domination in graphs Near-domination in graphs Bruce Reed Researcher, Projet COATI, INRIA and Laboratoire I3S, CNRS France, and Visiting Researcher, IMPA, Brazil Alex Scott Mathematical Institute, University of Oxford, Oxford

More information

A Bayesian Criterion for Clustering Stability

A Bayesian Criterion for Clustering Stability A Bayesian Criterion for Clustering Stability B. Clarke 1 1 Dept of Medicine, CCS, DEPH University of Miami Joint with H. Koepke, Stat. Dept., U Washington 26 June 2012 ISBA Kyoto Outline 1 Assessing Stability

More information

arxiv: v1 [stat.me] 6 Nov 2014

arxiv: v1 [stat.me] 6 Nov 2014 Network Cross-Validation for Determining the Number of Communities in Network Data Kehui Chen 1 and Jing Lei arxiv:1411.1715v1 [stat.me] 6 Nov 014 1 Department of Statistics, University of Pittsburgh Department

More information

Impact of regularization on Spectral Clustering

Impact of regularization on Spectral Clustering Impact of regularization on Spectral Clustering Antony Joseph and Bin Yu December 5, 2013 Abstract The performance of spectral clustering is considerably improved via regularization, as demonstrated empirically

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: Find coalitions in signed networks Incentives: European

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Isotropic local laws for random matrices

Isotropic local laws for random matrices Isotropic local laws for random matrices Antti Knowles University of Geneva With Y. He and R. Rosenthal Random matrices Let H C N N be a large Hermitian random matrix, normalized so that H. Some motivations:

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Discrete Signal Processing on Graphs: Sampling Theory

Discrete Signal Processing on Graphs: Sampling Theory IEEE TRANS. SIGNAL PROCESS. TO APPEAR. 1 Discrete Signal Processing on Graphs: Sampling Theory Siheng Chen, Rohan Varma, Aliaksei Sandryhaila, Jelena Kovačević arxiv:153.543v [cs.it] 8 Aug 15 Abstract

More information

Overlapping Communities

Overlapping Communities Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Preliminaries and Complexity Theory

Preliminaries and Complexity Theory Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley Jure Leskovec (@jure) Joint work with Jaewon Yang, Julian McAuley Given a network, find communities! Sets of nodes with common function, role or property 2 3 Q: How and why do communities form? A: Strength

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Algebraic Representation of Networks

Algebraic Representation of Networks Algebraic Representation of Networks 0 1 2 1 1 0 0 1 2 0 0 1 1 1 1 1 Hiroki Sayama sayama@binghamton.edu Describing networks with matrices (1) Adjacency matrix A matrix with rows and columns labeled by

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

Laplacian Integral Graphs with Maximum Degree 3

Laplacian Integral Graphs with Maximum Degree 3 Laplacian Integral Graphs with Maximum Degree Steve Kirkland Department of Mathematics and Statistics University of Regina Regina, Saskatchewan, Canada S4S 0A kirkland@math.uregina.ca Submitted: Nov 5,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Markov Chains and Spectral Clustering

Markov Chains and Spectral Clustering Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu

More information

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Dissertation Defense

Dissertation Defense Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018) GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018) Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, Jure Leskovec Presented by: Jesse Bettencourt and Harris Chan March 9, 2018 University

More information