Modularity and Graph Algorithms
|
|
- Ernest Burns
- 5 years ago
- Views:
Transcription
1 Modularity and Graph Algorithms David Bader Georgia Institute of Technology Joe McCloskey National Security Agency 12 July
2 Outline Modularity Optimization and the Clauset, Newman, and Moore Algorithm 2
3 Modularity Optimization Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs Let G represent a directed, unweighted graph with adjacency matrix A. Assume that L is the total number of edges in the network and that G has been partitioned into m disjoint sets of nodes M s, s = 1, 2,..., m. Consider the set M s and let l ss represent the number of observed transitions from M s to M s. Now define l s s to be the total number of transitions from within to outside the node set M s. The quantities l ss and l s s are defined analogously. Note that we have collapsed the adjacency matrix into a 2 2 table of transition counts where L = l ss + l s s + l ss + l s s. 3
4 Modularity Optimization Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs Let the row and column marginal sums of the 2 2 transition table be defined as l +s = l ss + l ss, and l s+ = l ss + l s s. We observe l ss transitions within node set M s. Under the hypothesis of independence the estimated number of transitions is as follows: ˆlss = L ( l+s L ) ( ) ls+ = l s+l +s L L. The residual is defined as R ss = l ss ˆl ss. If R ss > 0 then we observe more transitions than expected. If R ss is significantly larger than zero then we might suspect that M s is a module (cluster). 4
5 Modularity Optimization Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs Following CNM we let Q s = R ss /L. The set of nodes M s to said to be a module if and only if Q s > 0. Next we define the modularity function Q as follows: Q = m Q s = 1 L s=1 m R ss = 1 L s=1 m s=1 (l ss ˆl ss ). The goal of modularity optimization is to efficiently find a decomposition of G into a set of m modules M s, s = 1, 2,..., m that maximizes the modularity function Q. 5
6 Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs It follows that the residual R ss = l ss ˆl ss = l ss ls+l+s L only if l ss l s s l s s l ss > 0. Moreover, > 0 if and Q = 1 L 2 m s=1 (l ss l s s l s s l ss ), and M s is a module if and only if c 12 = lssl s s l ssl s s > 1, where c 12 is the cross-product ratio of a 2 2 contingency table. The expected value of c 12 equals one if and only if the rows and columns of the table are statistically independent. c 12 > 1 indicates positive dependence. 6
7 Modularity Optimization Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs Now let G represent an undirected graph. The adjacency matrix A is now symmetric and it follows that l s s = l ss, and L = l ss + l ss + l s s. Since there are L edges in the graph, the sum of all of the entries of the adjacency matrix is equal to N = 2L. Let X represent the 2 2 contingency table obtained by collapsing the adjacency matrix with respect to module M s. Because of symmetry x ss = 2l ss, x s s = 2l s s, and x ss = x s s = l ss = l s s. 7
8 Modularity Optimization Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs Let ˆm ij represent the estimated entries of table X under independence. The marginal sums of X with respect to module M s are x s = x +s = x s+ = x ss + x s s = 2l ss + l s s. The expected number of transitions within M s is seen to be ˆlss = ˆm ( ss 2 = (2L) x+s 2L As before Q = m Q s = 1 L s=1 ) ( xs+ 2L m R ss = 1 L s=1 ) ( ) 1 = (x s) 2 2 4L. m s=1 (l ss ˆl ss ). 8
9 Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs The residual R ss = l ss ˆl ss = l ss x2 s 4L, and M s is a module if and only if 4L ss > x 2 s. Equivalently, M s is a module if and only if 4l ss l s s l 2 s s > 0. The cross-product ratio becomes ( ) ( ) xss x s s c 12 = = x s s x s s ( 2lss l s s ) ( ) 2l s s l s s = 4l ssl s s ls s 2. 9
10 Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs The modularity function Q can be rewritten in the following useful form where the cross-product ratio is more apparent: Q = m Q s = 1 L s=1 m s=1 R ss = 1 4L 2 m s=1 ( 4lss l s s ls s 2 ). Clauset, Newman, and Moore (2004) have developed a scalable, agglomerative algorithm to maximize the modularity function. Their greedy algorithm makes use of efficient data structures and for certain networks can run in time O ( L log 2 L ). 10
11 The Resolution Limit The Most Modular Connected Network An Improved Resolution Limit A Different Resolution It is a-priori impossible to tell whether a module (large or small), detected through modularity optimization, is indeed a single module or a cluster of smaller modules. Fortunato and Barthelemy (2007) 11
12 The Resolution Limit The Most Modular Connected Network An Improved Resolution Limit A Different Resolution The Most Modular Connected Network Let G be composed of n cliques, n even, where each clique has m nodes. Each clique is connected by a single edge to an adjacent clique. The number of edges in this graph is equal to L = 1 2 nm(m 1) + n = n 2 (m(m 1) + 2). Consider two partitions of G into hypothesized modules. In the first partition every module is a clique, while in the second partition every module is a pair of adjacent cliques. Then Q 1 = 1 2 m(m 1) n, Q 1 2 = 1 m(m 1) n. 12
13 The Resolution Limit The Most Modular Connected Network An Improved Resolution Limit A Different Resolution The Most Modular Connected Network It follows that Q 2 > Q 1 if n > m(m 1) + 2. Since L = n 2 (m(m 1) + 2) this further implies that Q 2 > Q 1 if n 2 > 2L. If n = 25, and m = 5, then Q 2 = > Q 1 = and modularity optimization has not determined the best partition. Fortunato and Barthelemy have analyzed real world data sets that exhibit the property that the best partition does not have the maximum modularity score. 13
14 An Improved Resolution Limit The Resolution Limit The Most Modular Connected Network An Improved Resolution Limit A Different Resolution Berry, Hendrickson, Laviolette, and Phillips (2010) use the local topology of the graph to infer edge weights. Then they generalize the results of Fortunato and Barthelemy to show it is possible to resolve much smaller communities. They also suggest that the dendrogram constructed by the CNM algorithm can be mined to resolve smaller communities. This might be even more effective with their modified CNM algorithm. 14
15 A Different Resolution? The Resolution Limit The Most Modular Connected Network An Improved Resolution Limit A Different Resolution Claim: The failure of the CNM algorithm to resolve communities is due to the properties of the modularity function Q and not the information that it uses. Can CNM be salvaged? Is it possible to optimize a different objective function that uses the same information and resolves communities? 15
16 Two Equivalent Representations of Q Consider a graph with L edges and three modules. Let l ij represent the number of transitions from module M i to module M j. Then L = l 11 + l 12 + l 13 + l 22 + l 23 + l 33. What happens to the modularity function if modules M 1 and M 2 are merged together? We find two equivalent representations of the difference Q 1,2,3 Q 1 2,3 of the modularity functions due to the merge of M 1 and M 2. If Q 1 2,3 > Q 1,2,3 then CNM will merge M 1 and M 2. 16
17 Two Equivalent Representations of Q Use the three modules M 1, M 2 and M 3 to collapse the undirected graph into a 3 3 table X 1,2,3 of counts x ij. The x ij are determined from the transition counts l ij as follows: x jj = 2l jj, x ij = x ji = l ij for i j, x 1 = x 1+ = x +1 = 2l 11 + l 12 + l 13, x 2 = x 2+ = x +2 = l l 22 + l 23. Because of symmetry the total sample size of the x ij s is N = 2L. 17
18 Two Equivalent Representations of Q If we further merge M 1 and M 2 together then the collapsed graph has transition counts of the form l 11 = l 11 + l 12 + l 22, l 12 = l 21 = l 13 + l 23, l 22 = l 33. The resulting 2 2 symmetric table X 1 2,3 of counts x ij has entries x 11 = 2l l l 22, x 12 = x 21 = l 13 + l 23, x 22 = 2l
19 Properties of the Modularity Function Two Equivalent Representations of Q Let Q 1,2,3 be the modularity function when modules M 1, M 2, M 3 are considered to be separate modules, and Q 1 2,3 the modularity function when modules M 1 and M 2 are merged together. Let Q = Q 1,2,3 Q 1 2,3 be the difference of the two modularity functions. If Q 1 2,3 > Q 1,2,3 then Q < 0, and CNM will merge M 1 and M 2. To understand the properties of the modularity function we find two equivalent representations of Q = Q 1,2,3 Q 1 2,3. 19
20 Properties of the Modularity Function Two Equivalent Representations of Q Module M 3 makes the same contribution Q 3 = R 33 /L to each of the modularity functions Q 1,2,3 and Q 1 2,3. Thus, L Q = L (Q 1,2,3 Q 1 2,3 ) = LQ 1,2,3 LQ 1 2,3 = (R 11 + R 22 + R 33 ) (R 1 2,1 2 + R 33 ) = R 11 + R 22 R 1 2,1 2. We have left to compute R 1 2,1 2 = l 11 ˆl
21 Properties of the Modularity Function Two Equivalent Representations of Q Now ˆl 11 = 1 4L x 2 1, where x 1 = 2l l l 22 + l 13 + l 23. A straightforward calculation yields where we define 2L 2 Q = c 0 + c 1 + c 2 + c 3, c 0 = 4l 11 l 22 l 2 12, c 1 = 2l 11 l 23 l 12 l 13, c 2 = 2l 22 l 13 l 12 l 23, c 3 = l 13 l 23 2l 12 l 33. The c j s have cross-product interpretations as 2 2 subtables of X 1,2,3. 21
22 Properties of the Modularity Function Two Equivalent Representations of Q With this representation it is easy to manufacture examples where Q < 0 and modules M 1 and M 2 will be merged in error by the CNM algorithm. Note that l 33 is the number of transitions in the rest of the graph and recall that c 3 = l 13 l 23 l 12 l 33. If l 12 > 0 and l 33 is large then c 3 << 0 and Q < 0. The most modular connected network is a simple example where this happens. 22
23 Properties of the Modularity Function Two Equivalent Representations of Q A second representation of Q leads to a simple observation that addresses several flaws in the current CNM algorithm. Standard properties of symmetric contingency tables can be used to show that the residuals of the 3 3 contingency table X 1,2,3 satisfy 2R 11 + R 12 + R 13 = 0, R R 22 + R 23 = 0, R 13 + R R 33 = 0. Analogously, the residuals of the 2 2 collapsed contingency table X 1 2,3 satisfy R 1 2,1 2 = R
24 Properties of the Modularity Function Two Equivalent Representations of Q These equations lead to another representation of L Q. L Q = L (Q 1,2,3 Q 1 2,3 ) = R 11 + R 22 R 1 2,1 2 = R 11 + R 22 R 33 = 1 2 ( R 12 R 13 ) ( R 12 R 23 ) 1 2 ( R 13 R 23 ) = R 12. Hence, R 12 > 0 if and only if Q < 0. 24
25 Two Equivalent Representations of Q This last inequality means that two modules M 1 and M 2 are candidates to be merged by the CNM algorithm if their residual R 12 > 0. The greedy CNM algorithm merges the two modules M i and M j with the largest residual R ij > 0. CNM Modification: Before two modules M i and M j are merged their residual R ij > 0 should be deemed statistically significant. The residuals R ij are not comparable since their variances are not necessarily equal. Let S ij = be standardized residuals. R ij std(r ij) 25
26 Two Equivalent Representations of Q CNM Modification: Before two modules M i and M j are merged their standardized residual S ij > 0 should be deemed statistically significant. The greedy modified CNM algorithm will merge the two modules M i and M j with the largest statistically significant standardized residual S ij > 0. This adjustment to the CNM algorithm will alleviate the resolution limit problem, and the tendency for CNM to favor the merging of large modules. 26
27 Modularity optimization can fail to resolve communities. This failure is due to the properties of the modularity function. The CNM algorithm can be modified to avoid the resolution limit problem. Algorithms that use spectral information from the modularity matrix can detect modules when CNM fails. 27
The Trouble with Community Detection
The Trouble with Community Detection Aaron Clauset Santa Fe Institute 7 April 2010 Nonlinear Dynamics of Networks Workshop U. Maryland, College Park Thanks to National Science Foundation REU Program James
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationGraph Clustering Algorithms
PhD Course on Graph Mining Algorithms, Università di Pisa February, 2018 Clustering: Intuition to Formalization Task Partition a graph into natural groups so that the nodes in the same cluster are more
More informationNonparametric Bayesian Matrix Factorization for Assortative Networks
Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin
More informationLecture 12: May 09, Decomposable Graphs (continues from last time)
596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 00 Dept. of lectrical ngineering Lecture : May 09, 00 Lecturer: Jeff Bilmes Scribe: Hansang ho, Izhak Shafran(000).
More informationComparing linear modularization criteria using the relational notation
Comparing linear modularization criteria using the relational notation Université Pierre et Marie Curie Laboratoire de Statistique Théorique et Appliquée April 30th, 2014 1/35 Table of contents 1 Introduction
More informationA Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks
Shen HW, Cheng XQ, Wang YZ et al. A dimensionality reduction framework for detection of multiscale structure in heterogeneous networks. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(2): 341 357 Mar. 2012.
More informationGraph coloring, perfect graphs
Lecture 5 (05.04.2013) Graph coloring, perfect graphs Scribe: Tomasz Kociumaka Lecturer: Marcin Pilipczuk 1 Introduction to graph coloring Definition 1. Let G be a simple undirected graph and k a positive
More informationarxiv:cond-mat/ v1 [cond-mat.dis-nn] 27 Mar 2006
Statistical Mechanics of Community Detection arxiv:cond-mat/0603718v1 [cond-mat.dis-nn] 27 Mar 2006 Jörg Reichardt 1 and Stefan Bornholdt 1 1 Institute for Theoretical Physics, University of Bremen, Otto-Hahn-Allee,
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,
More informationStatistical mechanics of community detection
Statistical mechanics of community detection Jörg Reichardt and Stefan Bornholdt Institute for Theoretical Physics, University of Bremen, Otto-Hahn-Allee, D-28359 Bremen, Germany Received 22 December 2005;
More informationGraph Detection and Estimation Theory
Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and
More informationFinding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October
Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models I
MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:
More information3 Undirected Graphical Models
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected
More informationSemidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5
Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5 Instructor: Farid Alizadeh Scribe: Anton Riabov 10/08/2001 1 Overview We continue studying the maximum eigenvalue SDP, and generalize
More informationChapter 11. Matrix Algorithms and Graph Partitioning. M. E. J. Newman. June 10, M. E. J. Newman Chapter 11 June 10, / 43
Chapter 11 Matrix Algorithms and Graph Partitioning M. E. J. Newman June 10, 2016 M. E. J. Newman Chapter 11 June 10, 2016 1 / 43 Table of Contents 1 Eigenvalue and Eigenvector Eigenvector Centrality The
More informationDover- Sherborn High School Mathematics Curriculum Probability and Statistics
Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and
More informationIndependencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)
(Bayesian Networks) Undirected Graphical Models 2: Use d-separation to read off independencies in a Bayesian network Takes a bit of effort! 1 2 (Markov networks) Use separation to determine independencies
More informationQuestions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.
Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized
More informationarxiv: v1 [physics.soc-ph] 20 Oct 2010
arxiv:00.098v [physics.soc-ph] 20 Oct 200 Spectral methods for the detection of network community structure: A comparative analysis Hua-Wei Shen and Xue-Qi Cheng Institute of Computing Technology, Chinese
More informationGroups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA
Groups of vertices and Core-periphery structure By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA Mostly observed real networks have: Why? Heavy tail (powerlaw most
More informationON THE NP-COMPLETENESS OF SOME GRAPH CLUSTER MEASURES
ON THE NP-COMPLETENESS OF SOME GRAPH CLUSTER MEASURES JIŘÍ ŠÍMA AND SATU ELISA SCHAEFFER Academy of Sciences of the Czech Republic Helsinki University of Technology, Finland elisa.schaeffer@tkk.fi SOFSEM
More informationUndirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks
Undirected Graphical Models 4 ayesian Networks and Markov Networks 1 ayesian Networks to Markov Networks 2 1 Ns to MNs X Y Z Ns can represent independence constraints that MN cannot MNs can represent independence
More informationLecture 12 : Graph Laplacians and Cheeger s Inequality
CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful
More informationOn Resolution Limit of the Modularity in Community Detection
The Ninth International Symposium on Operations Research and Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 492 499 On Resolution imit of the
More informationLecture 3. 1 Polynomial-time algorithms for the maximum flow problem
ORIE 633 Network Flows August 30, 2007 Lecturer: David P. Williamson Lecture 3 Scribe: Gema Plaza-Martínez 1 Polynomial-time algorithms for the maximum flow problem 1.1 Introduction Let s turn now to considering
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationGraphical Models and Independence Models
Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern
More informationAn ADMM Algorithm for Clustering Partially Observed Networks
An ADMM Algorithm for Clustering Partially Observed Networks Necdet Serhat Aybat Industrial Engineering Penn State University 2015 SIAM International Conference on Data Mining Vancouver, Canada Problem
More informationSmall Forbidden Configurations III
Small Forbidden Configurations III R. P. Anstee and N. Kamoosi Mathematics Department The University of British Columbia Vancouver, B.C. Canada V6T Z anstee@math.ubc.ca Submitted: Nov, 005; Accepted: Nov
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationGraphs and Network Flows IE411. Lecture 12. Dr. Ted Ralphs
Graphs and Network Flows IE411 Lecture 12 Dr. Ted Ralphs IE411 Lecture 12 1 References for Today s Lecture Required reading Sections 21.1 21.2 References AMO Chapter 6 CLRS Sections 26.1 26.2 IE411 Lecture
More informationComplex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds
Complex Networks CSYS/MATH 303, Spring, 2011 Prof. Peter Dodds Department of Mathematics & Statistics Center for Complex Systems Vermont Advanced Computing Center University of Vermont Licensed under the
More informationSpectral Theorem for Self-adjoint Linear Operators
Notes for the undergraduate lecture by David Adams. (These are the notes I would write if I was teaching a course on this topic. I have included more material than I will cover in the 45 minute lecture;
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x
More informationSVD, discrepancy, and regular structure of contingency tables
SVD, discrepancy, and regular structure of contingency tables Marianna Bolla Institute of Mathematics Technical University of Budapest (Research supported by the TÁMOP-4.2.2.B-10/1-2010-0009 project) marib@math.bme.hu
More informationThis section is an introduction to the basic themes of the course.
Chapter 1 Matrices and Graphs 1.1 The Adjacency Matrix This section is an introduction to the basic themes of the course. Definition 1.1.1. A simple undirected graph G = (V, E) consists of a non-empty
More informationCS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory
CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory Tim Roughgarden & Gregory Valiant May 2, 2016 Spectral graph theory is the powerful and beautiful theory that arises from
More information44.2. Two-Way Analysis of Variance. Introduction. Prerequisites. Learning Outcomes
Two-Way Analysis of Variance 44 Introduction In the one-way analysis of variance (Section 441) we consider the effect of one factor on the values taken by a variable Very often, in engineering investigations,
More informationCommunities, Spectral Clustering, and Random Walks
Communities, Spectral Clustering, and Random Walks David Bindel Department of Computer Science Cornell University 26 Sep 2011 20 21 19 16 22 28 17 18 29 26 27 30 23 1 25 5 8 24 2 4 14 3 9 13 15 11 10 12
More informationLaplacian Integral Graphs with Maximum Degree 3
Laplacian Integral Graphs with Maximum Degree Steve Kirkland Department of Mathematics and Statistics University of Regina Regina, Saskatchewan, Canada S4S 0A kirkland@math.uregina.ca Submitted: Nov 5,
More informationThe doubly negative matrix completion problem
The doubly negative matrix completion problem C Mendes Araújo, Juan R Torregrosa and Ana M Urbano CMAT - Centro de Matemática / Dpto de Matemática Aplicada Universidade do Minho / Universidad Politécnica
More informationFaloutsos, Tong ICDE, 2009
Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity
More informationMatching Polynomials of Graphs
Spectral Graph Theory Lecture 25 Matching Polynomials of Graphs Daniel A Spielman December 7, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class The notes
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer
More information10.3 Matroids and approximation
10.3 Matroids and approximation 137 10.3 Matroids and approximation Given a family F of subsets of some finite set X, called the ground-set, and a weight function assigning each element x X a non-negative
More informationProf. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course
Course on Bayesian Networks, winter term 2007 0/31 Bayesian Networks Bayesian Networks I. Bayesian Networks / 1. Probabilistic Independence and Separation in Graphs Prof. Dr. Lars Schmidt-Thieme, L. B.
More information1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u)
CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh (rezab@stanford.edu) Final Review Session 03/20/17 1. Let G = (V, E) be an unweighted, undirected graph. Let λ 1 be the maximum eigenvalue
More informationORIE 4741: Learning with Big Messy Data. Spectral Graph Theory
ORIE 4741: Learning with Big Messy Data Spectral Graph Theory Mika Sumida Operations Research and Information Engineering Cornell September 15, 2017 1 / 32 Outline Graph Theory Spectral Graph Theory Laplacian
More informationContainment restrictions
Containment restrictions Tibor Szabó Extremal Combinatorics, FU Berlin, WiSe 207 8 In this chapter we switch from studying constraints on the set operation intersection, to constraints on the set relation
More informationClass President: A Network Approach to Popularity. Due July 18, 2014
Class President: A Network Approach to Popularity Due July 8, 24 Instructions. Due Fri, July 8 at :59 PM 2. Work in groups of up to 3 3. Type up the report, and submit as a pdf on D2L 4. Attach the code
More informationEffective Resistance and Schur Complements
Spectral Graph Theory Lecture 8 Effective Resistance and Schur Complements Daniel A Spielman September 28, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Non-overlapping vs. overlapping communities 11/10/2010 Jure Leskovec, Stanford CS224W: Social
More informationMarkov Chains and Spectral Clustering
Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationCSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68
CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 References 1 L. Freeman, Centrality in Social Networks: Conceptual Clarification, Social Networks, Vol. 1, 1978/1979, pp. 215 239. 2 S. Wasserman
More informationMining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationCommunity Detection. Data Analytics - Community Detection Module
Community Detection Data Analytics - Community Detection Module Zachary s karate club Members of a karate club (observed for 3 years). Edges represent interactions outside the activities of the club. Community
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: Find coalitions in signed networks Incentives: European
More informationAnalysis of Variance: Part 1
Analysis of Variance: Part 1 Oneway ANOVA When there are more than two means Each time two means are compared the probability (Type I error) =α. When there are more than two means Each time two means are
More informationLECTURE 5 HYPOTHESIS TESTING
October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.
More informationROOT SYSTEMS AND DYNKIN DIAGRAMS
ROOT SYSTEMS AND DYNKIN DIAGRAMS DAVID MEHRLE In 1969, Murray Gell-Mann won the nobel prize in physics for his contributions and discoveries concerning the classification of elementary particles and their
More informationApplications to network analysis: Eigenvector centrality indices Lecture notes
Applications to network analysis: Eigenvector centrality indices Lecture notes Dario Fasino, University of Udine (Italy) Lecture notes for the second part of the course Nonnegative and spectral matrix
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationCategorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationDefinition (The carefully thought-out calculus version based on limits).
4.1. Continuity and Graphs Definition 4.1.1 (Intuitive idea used in algebra based on graphing). A function, f, is continuous on the interval (a, b) if the graph of y = f(x) can be drawn over the interval
More informationLecture 3: Simulating social networks
Lecture 3: Simulating social networks Short Course on Social Interactions and Networks, CEMFI, May 26-28th, 2014 Bryan S. Graham 28 May 2014 Consider a network with adjacency matrix D = d and corresponding
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationLecture 13: Spectral Graph Theory
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationhypothesis a claim about the value of some parameter (like p)
Testing hypotheses hypothesis a claim about the value of some parameter (like p) significance test procedure to assess the strength of evidence provided by a sample of data against the claim of a hypothesized
More informationFour new upper bounds for the stability number of a graph
Four new upper bounds for the stability number of a graph Miklós Ujvári Abstract. In 1979, L. Lovász defined the theta number, a spectral/semidefinite upper bound on the stability number of a graph, which
More informationK 4 -free graphs with no odd holes
K 4 -free graphs with no odd holes Maria Chudnovsky 1 Columbia University, New York NY 10027 Neil Robertson 2 Ohio State University, Columbus, Ohio 43210 Paul Seymour 3 Princeton University, Princeton
More informationLINEAR SPACES. Define a linear space to be a near linear space in which any two points are on a line.
LINEAR SPACES Define a linear space to be a near linear space in which any two points are on a line. A linear space is an incidence structure I = (P, L) such that Axiom LS1: any line is incident with at
More informationLecture 10 February 4, 2013
UBC CPSC 536N: Sparse Approximations Winter 2013 Prof Nick Harvey Lecture 10 February 4, 2013 Scribe: Alexandre Fréchette This lecture is about spanning trees and their polyhedral representation Throughout
More informationDRAGAN STEVANOVIĆ. Key words. Modularity matrix; Community structure; Largest eigenvalue; Complete multipartite
A NOTE ON GRAPHS WHOSE LARGEST EIGENVALUE OF THE MODULARITY MATRIX EQUALS ZERO SNJEŽANA MAJSTOROVIĆ AND DRAGAN STEVANOVIĆ Abstract. Informally, a community within a graph is a subgraph whose vertices are
More informationACO Comprehensive Exam March 17 and 18, Computability, Complexity and Algorithms
1. Computability, Complexity and Algorithms (a) Let G(V, E) be an undirected unweighted graph. Let C V be a vertex cover of G. Argue that V \ C is an independent set of G. (b) Minimum cardinality vertex
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationModular representations of symmetric groups: An Overview
Modular representations of symmetric groups: An Overview Bhama Srinivasan University of Illinois at Chicago Regina, May 2012 Bhama Srinivasan (University of Illinois at Chicago) Modular Representations
More informationarxiv: v2 [physics.soc-ph] 13 Jun 2009
Alternative approach to community detection in networks A.D. Medus and C.O. Dorso Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Pabellón 1, Ciudad Universitaria,
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 3 1 Gaussian Graphical Models: Schur s Complement Consider
More informationCocliques in the Kneser graph on line-plane flags in PG(4, q)
Cocliques in the Kneser graph on line-plane flags in PG(4, q) A. Blokhuis & A. E. Brouwer Abstract We determine the independence number of the Kneser graph on line-plane flags in the projective space PG(4,
More informationParameterized Algorithms for the H-Packing with t-overlap Problem
Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 18, no. 4, pp. 515 538 (2014) DOI: 10.7155/jgaa.00335 Parameterized Algorithms for the H-Packing with t-overlap Problem Alejandro López-Ortiz
More information1 Primals and Duals: Zero Sum Games
CS 124 Section #11 Zero Sum Games; NP Completeness 4/15/17 1 Primals and Duals: Zero Sum Games We can represent various situations of conflict in life in terms of matrix games. For example, the game shown
More informationData Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs
Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationFirst we look at some terms to be used in this section.
8 Hypothesis Testing 8.1 Introduction MATH1015 Biostatistics Week 8 In Chapter 7, we ve studied the estimation of parameters, point or interval estimates. The construction of CI relies on the sampling
More informationChapter 12 Comparing Two or More Means
12.1 Introduction 277 Chapter 12 Comparing Two or More Means 12.1 Introduction In Chapter 8 we considered methods for making inferences about the relationship between two population distributions based
More informationCMPSCI 611: Advanced Algorithms
CMPSCI 611: Advanced Algorithms Lecture 12: Network Flow Part II Andrew McGregor Last Compiled: December 14, 2017 1/26 Definitions Input: Directed Graph G = (V, E) Capacities C(u, v) > 0 for (u, v) E and
More informationEnumeration Schemes for Words Avoiding Permutations
Enumeration Schemes for Words Avoiding Permutations Lara Pudwell November 27, 2007 Abstract The enumeration of permutation classes has been accomplished with a variety of techniques. One wide-reaching
More informationRandom Lifts of Graphs
27th Brazilian Math Colloquium, July 09 Plan of this talk A brief introduction to the probabilistic method. A quick review of expander graphs and their spectrum. Lifts, random lifts and their properties.
More informationRao s degree sequence conjecture
Rao s degree sequence conjecture Maria Chudnovsky 1 Columbia University, New York, NY 10027 Paul Seymour 2 Princeton University, Princeton, NJ 08544 July 31, 2009; revised December 10, 2013 1 Supported
More informationLink Analysis Ranking
Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query
More informationLecture 13 Spectral Graph Algorithms
COMS 995-3: Advanced Algorithms March 6, 7 Lecture 3 Spectral Graph Algorithms Instructor: Alex Andoni Scribe: Srikar Varadaraj Introduction Today s topics: Finish proof from last lecture Example of random
More informationRecitation 9: Loopy BP
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,
More information