Modularity and Graph Algorithms

Size: px
Start display at page:

Download "Modularity and Graph Algorithms"

Transcription

1 Modularity and Graph Algorithms David Bader Georgia Institute of Technology Joe McCloskey National Security Agency 12 July

2 Outline Modularity Optimization and the Clauset, Newman, and Moore Algorithm 2

3 Modularity Optimization Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs Let G represent a directed, unweighted graph with adjacency matrix A. Assume that L is the total number of edges in the network and that G has been partitioned into m disjoint sets of nodes M s, s = 1, 2,..., m. Consider the set M s and let l ss represent the number of observed transitions from M s to M s. Now define l s s to be the total number of transitions from within to outside the node set M s. The quantities l ss and l s s are defined analogously. Note that we have collapsed the adjacency matrix into a 2 2 table of transition counts where L = l ss + l s s + l ss + l s s. 3

4 Modularity Optimization Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs Let the row and column marginal sums of the 2 2 transition table be defined as l +s = l ss + l ss, and l s+ = l ss + l s s. We observe l ss transitions within node set M s. Under the hypothesis of independence the estimated number of transitions is as follows: ˆlss = L ( l+s L ) ( ) ls+ = l s+l +s L L. The residual is defined as R ss = l ss ˆl ss. If R ss > 0 then we observe more transitions than expected. If R ss is significantly larger than zero then we might suspect that M s is a module (cluster). 4

5 Modularity Optimization Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs Following CNM we let Q s = R ss /L. The set of nodes M s to said to be a module if and only if Q s > 0. Next we define the modularity function Q as follows: Q = m Q s = 1 L s=1 m R ss = 1 L s=1 m s=1 (l ss ˆl ss ). The goal of modularity optimization is to efficiently find a decomposition of G into a set of m modules M s, s = 1, 2,..., m that maximizes the modularity function Q. 5

6 Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs It follows that the residual R ss = l ss ˆl ss = l ss ls+l+s L only if l ss l s s l s s l ss > 0. Moreover, > 0 if and Q = 1 L 2 m s=1 (l ss l s s l s s l ss ), and M s is a module if and only if c 12 = lssl s s l ssl s s > 1, where c 12 is the cross-product ratio of a 2 2 contingency table. The expected value of c 12 equals one if and only if the rows and columns of the table are statistically independent. c 12 > 1 indicates positive dependence. 6

7 Modularity Optimization Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs Now let G represent an undirected graph. The adjacency matrix A is now symmetric and it follows that l s s = l ss, and L = l ss + l ss + l s s. Since there are L edges in the graph, the sum of all of the entries of the adjacency matrix is equal to N = 2L. Let X represent the 2 2 contingency table obtained by collapsing the adjacency matrix with respect to module M s. Because of symmetry x ss = 2l ss, x s s = 2l s s, and x ss = x s s = l ss = l s s. 7

8 Modularity Optimization Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs Let ˆm ij represent the estimated entries of table X under independence. The marginal sums of X with respect to module M s are x s = x +s = x s+ = x ss + x s s = 2l ss + l s s. The expected number of transitions within M s is seen to be ˆlss = ˆm ( ss 2 = (2L) x+s 2L As before Q = m Q s = 1 L s=1 ) ( xs+ 2L m R ss = 1 L s=1 ) ( ) 1 = (x s) 2 2 4L. m s=1 (l ss ˆl ss ). 8

9 Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs The residual R ss = l ss ˆl ss = l ss x2 s 4L, and M s is a module if and only if 4L ss > x 2 s. Equivalently, M s is a module if and only if 4l ss l s s l 2 s s > 0. The cross-product ratio becomes ( ) ( ) xss x s s c 12 = = x s s x s s ( 2lss l s s ) ( ) 2l s s l s s = 4l ssl s s ls s 2. 9

10 Modularity Optimization: Directed Graphs Modularity Optimization: Undirected Graphs The modularity function Q can be rewritten in the following useful form where the cross-product ratio is more apparent: Q = m Q s = 1 L s=1 m s=1 R ss = 1 4L 2 m s=1 ( 4lss l s s ls s 2 ). Clauset, Newman, and Moore (2004) have developed a scalable, agglomerative algorithm to maximize the modularity function. Their greedy algorithm makes use of efficient data structures and for certain networks can run in time O ( L log 2 L ). 10

11 The Resolution Limit The Most Modular Connected Network An Improved Resolution Limit A Different Resolution It is a-priori impossible to tell whether a module (large or small), detected through modularity optimization, is indeed a single module or a cluster of smaller modules. Fortunato and Barthelemy (2007) 11

12 The Resolution Limit The Most Modular Connected Network An Improved Resolution Limit A Different Resolution The Most Modular Connected Network Let G be composed of n cliques, n even, where each clique has m nodes. Each clique is connected by a single edge to an adjacent clique. The number of edges in this graph is equal to L = 1 2 nm(m 1) + n = n 2 (m(m 1) + 2). Consider two partitions of G into hypothesized modules. In the first partition every module is a clique, while in the second partition every module is a pair of adjacent cliques. Then Q 1 = 1 2 m(m 1) n, Q 1 2 = 1 m(m 1) n. 12

13 The Resolution Limit The Most Modular Connected Network An Improved Resolution Limit A Different Resolution The Most Modular Connected Network It follows that Q 2 > Q 1 if n > m(m 1) + 2. Since L = n 2 (m(m 1) + 2) this further implies that Q 2 > Q 1 if n 2 > 2L. If n = 25, and m = 5, then Q 2 = > Q 1 = and modularity optimization has not determined the best partition. Fortunato and Barthelemy have analyzed real world data sets that exhibit the property that the best partition does not have the maximum modularity score. 13

14 An Improved Resolution Limit The Resolution Limit The Most Modular Connected Network An Improved Resolution Limit A Different Resolution Berry, Hendrickson, Laviolette, and Phillips (2010) use the local topology of the graph to infer edge weights. Then they generalize the results of Fortunato and Barthelemy to show it is possible to resolve much smaller communities. They also suggest that the dendrogram constructed by the CNM algorithm can be mined to resolve smaller communities. This might be even more effective with their modified CNM algorithm. 14

15 A Different Resolution? The Resolution Limit The Most Modular Connected Network An Improved Resolution Limit A Different Resolution Claim: The failure of the CNM algorithm to resolve communities is due to the properties of the modularity function Q and not the information that it uses. Can CNM be salvaged? Is it possible to optimize a different objective function that uses the same information and resolves communities? 15

16 Two Equivalent Representations of Q Consider a graph with L edges and three modules. Let l ij represent the number of transitions from module M i to module M j. Then L = l 11 + l 12 + l 13 + l 22 + l 23 + l 33. What happens to the modularity function if modules M 1 and M 2 are merged together? We find two equivalent representations of the difference Q 1,2,3 Q 1 2,3 of the modularity functions due to the merge of M 1 and M 2. If Q 1 2,3 > Q 1,2,3 then CNM will merge M 1 and M 2. 16

17 Two Equivalent Representations of Q Use the three modules M 1, M 2 and M 3 to collapse the undirected graph into a 3 3 table X 1,2,3 of counts x ij. The x ij are determined from the transition counts l ij as follows: x jj = 2l jj, x ij = x ji = l ij for i j, x 1 = x 1+ = x +1 = 2l 11 + l 12 + l 13, x 2 = x 2+ = x +2 = l l 22 + l 23. Because of symmetry the total sample size of the x ij s is N = 2L. 17

18 Two Equivalent Representations of Q If we further merge M 1 and M 2 together then the collapsed graph has transition counts of the form l 11 = l 11 + l 12 + l 22, l 12 = l 21 = l 13 + l 23, l 22 = l 33. The resulting 2 2 symmetric table X 1 2,3 of counts x ij has entries x 11 = 2l l l 22, x 12 = x 21 = l 13 + l 23, x 22 = 2l

19 Properties of the Modularity Function Two Equivalent Representations of Q Let Q 1,2,3 be the modularity function when modules M 1, M 2, M 3 are considered to be separate modules, and Q 1 2,3 the modularity function when modules M 1 and M 2 are merged together. Let Q = Q 1,2,3 Q 1 2,3 be the difference of the two modularity functions. If Q 1 2,3 > Q 1,2,3 then Q < 0, and CNM will merge M 1 and M 2. To understand the properties of the modularity function we find two equivalent representations of Q = Q 1,2,3 Q 1 2,3. 19

20 Properties of the Modularity Function Two Equivalent Representations of Q Module M 3 makes the same contribution Q 3 = R 33 /L to each of the modularity functions Q 1,2,3 and Q 1 2,3. Thus, L Q = L (Q 1,2,3 Q 1 2,3 ) = LQ 1,2,3 LQ 1 2,3 = (R 11 + R 22 + R 33 ) (R 1 2,1 2 + R 33 ) = R 11 + R 22 R 1 2,1 2. We have left to compute R 1 2,1 2 = l 11 ˆl

21 Properties of the Modularity Function Two Equivalent Representations of Q Now ˆl 11 = 1 4L x 2 1, where x 1 = 2l l l 22 + l 13 + l 23. A straightforward calculation yields where we define 2L 2 Q = c 0 + c 1 + c 2 + c 3, c 0 = 4l 11 l 22 l 2 12, c 1 = 2l 11 l 23 l 12 l 13, c 2 = 2l 22 l 13 l 12 l 23, c 3 = l 13 l 23 2l 12 l 33. The c j s have cross-product interpretations as 2 2 subtables of X 1,2,3. 21

22 Properties of the Modularity Function Two Equivalent Representations of Q With this representation it is easy to manufacture examples where Q < 0 and modules M 1 and M 2 will be merged in error by the CNM algorithm. Note that l 33 is the number of transitions in the rest of the graph and recall that c 3 = l 13 l 23 l 12 l 33. If l 12 > 0 and l 33 is large then c 3 << 0 and Q < 0. The most modular connected network is a simple example where this happens. 22

23 Properties of the Modularity Function Two Equivalent Representations of Q A second representation of Q leads to a simple observation that addresses several flaws in the current CNM algorithm. Standard properties of symmetric contingency tables can be used to show that the residuals of the 3 3 contingency table X 1,2,3 satisfy 2R 11 + R 12 + R 13 = 0, R R 22 + R 23 = 0, R 13 + R R 33 = 0. Analogously, the residuals of the 2 2 collapsed contingency table X 1 2,3 satisfy R 1 2,1 2 = R

24 Properties of the Modularity Function Two Equivalent Representations of Q These equations lead to another representation of L Q. L Q = L (Q 1,2,3 Q 1 2,3 ) = R 11 + R 22 R 1 2,1 2 = R 11 + R 22 R 33 = 1 2 ( R 12 R 13 ) ( R 12 R 23 ) 1 2 ( R 13 R 23 ) = R 12. Hence, R 12 > 0 if and only if Q < 0. 24

25 Two Equivalent Representations of Q This last inequality means that two modules M 1 and M 2 are candidates to be merged by the CNM algorithm if their residual R 12 > 0. The greedy CNM algorithm merges the two modules M i and M j with the largest residual R ij > 0. CNM Modification: Before two modules M i and M j are merged their residual R ij > 0 should be deemed statistically significant. The residuals R ij are not comparable since their variances are not necessarily equal. Let S ij = be standardized residuals. R ij std(r ij) 25

26 Two Equivalent Representations of Q CNM Modification: Before two modules M i and M j are merged their standardized residual S ij > 0 should be deemed statistically significant. The greedy modified CNM algorithm will merge the two modules M i and M j with the largest statistically significant standardized residual S ij > 0. This adjustment to the CNM algorithm will alleviate the resolution limit problem, and the tendency for CNM to favor the merging of large modules. 26

27 Modularity optimization can fail to resolve communities. This failure is due to the properties of the modularity function. The CNM algorithm can be modified to avoid the resolution limit problem. Algorithms that use spectral information from the modularity matrix can detect modules when CNM fails. 27

The Trouble with Community Detection

The Trouble with Community Detection The Trouble with Community Detection Aaron Clauset Santa Fe Institute 7 April 2010 Nonlinear Dynamics of Networks Workshop U. Maryland, College Park Thanks to National Science Foundation REU Program James

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Graph Clustering Algorithms

Graph Clustering Algorithms PhD Course on Graph Mining Algorithms, Università di Pisa February, 2018 Clustering: Intuition to Formalization Task Partition a graph into natural groups so that the nodes in the same cluster are more

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

Lecture 12: May 09, Decomposable Graphs (continues from last time)

Lecture 12: May 09, Decomposable Graphs (continues from last time) 596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 00 Dept. of lectrical ngineering Lecture : May 09, 00 Lecturer: Jeff Bilmes Scribe: Hansang ho, Izhak Shafran(000).

More information

Comparing linear modularization criteria using the relational notation

Comparing linear modularization criteria using the relational notation Comparing linear modularization criteria using the relational notation Université Pierre et Marie Curie Laboratoire de Statistique Théorique et Appliquée April 30th, 2014 1/35 Table of contents 1 Introduction

More information

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks Shen HW, Cheng XQ, Wang YZ et al. A dimensionality reduction framework for detection of multiscale structure in heterogeneous networks. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(2): 341 357 Mar. 2012.

More information

Graph coloring, perfect graphs

Graph coloring, perfect graphs Lecture 5 (05.04.2013) Graph coloring, perfect graphs Scribe: Tomasz Kociumaka Lecturer: Marcin Pilipczuk 1 Introduction to graph coloring Definition 1. Let G be a simple undirected graph and k a positive

More information

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 27 Mar 2006

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 27 Mar 2006 Statistical Mechanics of Community Detection arxiv:cond-mat/0603718v1 [cond-mat.dis-nn] 27 Mar 2006 Jörg Reichardt 1 and Stefan Bornholdt 1 1 Institute for Theoretical Physics, University of Bremen, Otto-Hahn-Allee,

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

Statistical mechanics of community detection

Statistical mechanics of community detection Statistical mechanics of community detection Jörg Reichardt and Stefan Bornholdt Institute for Theoretical Physics, University of Bremen, Otto-Hahn-Allee, D-28359 Bremen, Germany Received 22 December 2005;

More information

Graph Detection and Estimation Theory

Graph Detection and Estimation Theory Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

MATH 829: Introduction to Data Mining and Analysis Graphical Models I MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:

More information

3 Undirected Graphical Models

3 Undirected Graphical Models Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5 Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5 Instructor: Farid Alizadeh Scribe: Anton Riabov 10/08/2001 1 Overview We continue studying the maximum eigenvalue SDP, and generalize

More information

Chapter 11. Matrix Algorithms and Graph Partitioning. M. E. J. Newman. June 10, M. E. J. Newman Chapter 11 June 10, / 43

Chapter 11. Matrix Algorithms and Graph Partitioning. M. E. J. Newman. June 10, M. E. J. Newman Chapter 11 June 10, / 43 Chapter 11 Matrix Algorithms and Graph Partitioning M. E. J. Newman June 10, 2016 M. E. J. Newman Chapter 11 June 10, 2016 1 / 43 Table of Contents 1 Eigenvalue and Eigenvector Eigenvector Centrality The

More information

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and

More information

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks) (Bayesian Networks) Undirected Graphical Models 2: Use d-separation to read off independencies in a Bayesian network Takes a bit of effort! 1 2 (Markov networks) Use separation to determine independencies

More information

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6. Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized

More information

arxiv: v1 [physics.soc-ph] 20 Oct 2010

arxiv: v1 [physics.soc-ph] 20 Oct 2010 arxiv:00.098v [physics.soc-ph] 20 Oct 200 Spectral methods for the detection of network community structure: A comparative analysis Hua-Wei Shen and Xue-Qi Cheng Institute of Computing Technology, Chinese

More information

Groups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA

Groups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA Groups of vertices and Core-periphery structure By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA Mostly observed real networks have: Why? Heavy tail (powerlaw most

More information

ON THE NP-COMPLETENESS OF SOME GRAPH CLUSTER MEASURES

ON THE NP-COMPLETENESS OF SOME GRAPH CLUSTER MEASURES ON THE NP-COMPLETENESS OF SOME GRAPH CLUSTER MEASURES JIŘÍ ŠÍMA AND SATU ELISA SCHAEFFER Academy of Sciences of the Czech Republic Helsinki University of Technology, Finland elisa.schaeffer@tkk.fi SOFSEM

More information

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks Undirected Graphical Models 4 ayesian Networks and Markov Networks 1 ayesian Networks to Markov Networks 2 1 Ns to MNs X Y Z Ns can represent independence constraints that MN cannot MNs can represent independence

More information

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Lecture 12 : Graph Laplacians and Cheeger s Inequality CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful

More information

On Resolution Limit of the Modularity in Community Detection

On Resolution Limit of the Modularity in Community Detection The Ninth International Symposium on Operations Research and Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 492 499 On Resolution imit of the

More information

Lecture 3. 1 Polynomial-time algorithms for the maximum flow problem

Lecture 3. 1 Polynomial-time algorithms for the maximum flow problem ORIE 633 Network Flows August 30, 2007 Lecturer: David P. Williamson Lecture 3 Scribe: Gema Plaza-Martínez 1 Polynomial-time algorithms for the maximum flow problem 1.1 Introduction Let s turn now to considering

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Graphical Models and Independence Models

Graphical Models and Independence Models Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern

More information

An ADMM Algorithm for Clustering Partially Observed Networks

An ADMM Algorithm for Clustering Partially Observed Networks An ADMM Algorithm for Clustering Partially Observed Networks Necdet Serhat Aybat Industrial Engineering Penn State University 2015 SIAM International Conference on Data Mining Vancouver, Canada Problem

More information

Small Forbidden Configurations III

Small Forbidden Configurations III Small Forbidden Configurations III R. P. Anstee and N. Kamoosi Mathematics Department The University of British Columbia Vancouver, B.C. Canada V6T Z anstee@math.ubc.ca Submitted: Nov, 005; Accepted: Nov

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Graphs and Network Flows IE411. Lecture 12. Dr. Ted Ralphs

Graphs and Network Flows IE411. Lecture 12. Dr. Ted Ralphs Graphs and Network Flows IE411 Lecture 12 Dr. Ted Ralphs IE411 Lecture 12 1 References for Today s Lecture Required reading Sections 21.1 21.2 References AMO Chapter 6 CLRS Sections 26.1 26.2 IE411 Lecture

More information

Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds

Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds Complex Networks CSYS/MATH 303, Spring, 2011 Prof. Peter Dodds Department of Mathematics & Statistics Center for Complex Systems Vermont Advanced Computing Center University of Vermont Licensed under the

More information

Spectral Theorem for Self-adjoint Linear Operators

Spectral Theorem for Self-adjoint Linear Operators Notes for the undergraduate lecture by David Adams. (These are the notes I would write if I was teaching a course on this topic. I have included more material than I will cover in the 45 minute lecture;

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

SVD, discrepancy, and regular structure of contingency tables

SVD, discrepancy, and regular structure of contingency tables SVD, discrepancy, and regular structure of contingency tables Marianna Bolla Institute of Mathematics Technical University of Budapest (Research supported by the TÁMOP-4.2.2.B-10/1-2010-0009 project) marib@math.bme.hu

More information

This section is an introduction to the basic themes of the course.

This section is an introduction to the basic themes of the course. Chapter 1 Matrices and Graphs 1.1 The Adjacency Matrix This section is an introduction to the basic themes of the course. Definition 1.1.1. A simple undirected graph G = (V, E) consists of a non-empty

More information

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory Tim Roughgarden & Gregory Valiant May 2, 2016 Spectral graph theory is the powerful and beautiful theory that arises from

More information

44.2. Two-Way Analysis of Variance. Introduction. Prerequisites. Learning Outcomes

44.2. Two-Way Analysis of Variance. Introduction. Prerequisites. Learning Outcomes Two-Way Analysis of Variance 44 Introduction In the one-way analysis of variance (Section 441) we consider the effect of one factor on the values taken by a variable Very often, in engineering investigations,

More information

Communities, Spectral Clustering, and Random Walks

Communities, Spectral Clustering, and Random Walks Communities, Spectral Clustering, and Random Walks David Bindel Department of Computer Science Cornell University 26 Sep 2011 20 21 19 16 22 28 17 18 29 26 27 30 23 1 25 5 8 24 2 4 14 3 9 13 15 11 10 12

More information

Laplacian Integral Graphs with Maximum Degree 3

Laplacian Integral Graphs with Maximum Degree 3 Laplacian Integral Graphs with Maximum Degree Steve Kirkland Department of Mathematics and Statistics University of Regina Regina, Saskatchewan, Canada S4S 0A kirkland@math.uregina.ca Submitted: Nov 5,

More information

The doubly negative matrix completion problem

The doubly negative matrix completion problem The doubly negative matrix completion problem C Mendes Araújo, Juan R Torregrosa and Ana M Urbano CMAT - Centro de Matemática / Dpto de Matemática Aplicada Universidade do Minho / Universidad Politécnica

More information

Faloutsos, Tong ICDE, 2009

Faloutsos, Tong ICDE, 2009 Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity

More information

Matching Polynomials of Graphs

Matching Polynomials of Graphs Spectral Graph Theory Lecture 25 Matching Polynomials of Graphs Daniel A Spielman December 7, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class The notes

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms : Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer

More information

10.3 Matroids and approximation

10.3 Matroids and approximation 10.3 Matroids and approximation 137 10.3 Matroids and approximation Given a family F of subsets of some finite set X, called the ground-set, and a weight function assigning each element x X a non-negative

More information

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course Course on Bayesian Networks, winter term 2007 0/31 Bayesian Networks Bayesian Networks I. Bayesian Networks / 1. Probabilistic Independence and Separation in Graphs Prof. Dr. Lars Schmidt-Thieme, L. B.

More information

1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u)

1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u) CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh (rezab@stanford.edu) Final Review Session 03/20/17 1. Let G = (V, E) be an unweighted, undirected graph. Let λ 1 be the maximum eigenvalue

More information

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory ORIE 4741: Learning with Big Messy Data Spectral Graph Theory Mika Sumida Operations Research and Information Engineering Cornell September 15, 2017 1 / 32 Outline Graph Theory Spectral Graph Theory Laplacian

More information

Containment restrictions

Containment restrictions Containment restrictions Tibor Szabó Extremal Combinatorics, FU Berlin, WiSe 207 8 In this chapter we switch from studying constraints on the set operation intersection, to constraints on the set relation

More information

Class President: A Network Approach to Popularity. Due July 18, 2014

Class President: A Network Approach to Popularity. Due July 18, 2014 Class President: A Network Approach to Popularity Due July 8, 24 Instructions. Due Fri, July 8 at :59 PM 2. Work in groups of up to 3 3. Type up the report, and submit as a pdf on D2L 4. Attach the code

More information

Effective Resistance and Schur Complements

Effective Resistance and Schur Complements Spectral Graph Theory Lecture 8 Effective Resistance and Schur Complements Daniel A Spielman September 28, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Non-overlapping vs. overlapping communities 11/10/2010 Jure Leskovec, Stanford CS224W: Social

More information

Markov Chains and Spectral Clustering

Markov Chains and Spectral Clustering Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 References 1 L. Freeman, Centrality in Social Networks: Conceptual Clarification, Social Networks, Vol. 1, 1978/1979, pp. 215 239. 2 S. Wasserman

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Community Detection. Data Analytics - Community Detection Module

Community Detection. Data Analytics - Community Detection Module Community Detection Data Analytics - Community Detection Module Zachary s karate club Members of a karate club (observed for 3 years). Edges represent interactions outside the activities of the club. Community

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: Find coalitions in signed networks Incentives: European

More information

Analysis of Variance: Part 1

Analysis of Variance: Part 1 Analysis of Variance: Part 1 Oneway ANOVA When there are more than two means Each time two means are compared the probability (Type I error) =α. When there are more than two means Each time two means are

More information

LECTURE 5 HYPOTHESIS TESTING

LECTURE 5 HYPOTHESIS TESTING October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.

More information

ROOT SYSTEMS AND DYNKIN DIAGRAMS

ROOT SYSTEMS AND DYNKIN DIAGRAMS ROOT SYSTEMS AND DYNKIN DIAGRAMS DAVID MEHRLE In 1969, Murray Gell-Mann won the nobel prize in physics for his contributions and discoveries concerning the classification of elementary particles and their

More information

Applications to network analysis: Eigenvector centrality indices Lecture notes

Applications to network analysis: Eigenvector centrality indices Lecture notes Applications to network analysis: Eigenvector centrality indices Lecture notes Dario Fasino, University of Udine (Italy) Lecture notes for the second part of the course Nonnegative and spectral matrix

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Definition (The carefully thought-out calculus version based on limits).

Definition (The carefully thought-out calculus version based on limits). 4.1. Continuity and Graphs Definition 4.1.1 (Intuitive idea used in algebra based on graphing). A function, f, is continuous on the interval (a, b) if the graph of y = f(x) can be drawn over the interval

More information

Lecture 3: Simulating social networks

Lecture 3: Simulating social networks Lecture 3: Simulating social networks Short Course on Social Interactions and Networks, CEMFI, May 26-28th, 2014 Bryan S. Graham 28 May 2014 Consider a network with adjacency matrix D = d and corresponding

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Lecture 13: Spectral Graph Theory

Lecture 13: Spectral Graph Theory CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

hypothesis a claim about the value of some parameter (like p)

hypothesis a claim about the value of some parameter (like p) Testing hypotheses hypothesis a claim about the value of some parameter (like p) significance test procedure to assess the strength of evidence provided by a sample of data against the claim of a hypothesized

More information

Four new upper bounds for the stability number of a graph

Four new upper bounds for the stability number of a graph Four new upper bounds for the stability number of a graph Miklós Ujvári Abstract. In 1979, L. Lovász defined the theta number, a spectral/semidefinite upper bound on the stability number of a graph, which

More information

K 4 -free graphs with no odd holes

K 4 -free graphs with no odd holes K 4 -free graphs with no odd holes Maria Chudnovsky 1 Columbia University, New York NY 10027 Neil Robertson 2 Ohio State University, Columbus, Ohio 43210 Paul Seymour 3 Princeton University, Princeton

More information

LINEAR SPACES. Define a linear space to be a near linear space in which any two points are on a line.

LINEAR SPACES. Define a linear space to be a near linear space in which any two points are on a line. LINEAR SPACES Define a linear space to be a near linear space in which any two points are on a line. A linear space is an incidence structure I = (P, L) such that Axiom LS1: any line is incident with at

More information

Lecture 10 February 4, 2013

Lecture 10 February 4, 2013 UBC CPSC 536N: Sparse Approximations Winter 2013 Prof Nick Harvey Lecture 10 February 4, 2013 Scribe: Alexandre Fréchette This lecture is about spanning trees and their polyhedral representation Throughout

More information

DRAGAN STEVANOVIĆ. Key words. Modularity matrix; Community structure; Largest eigenvalue; Complete multipartite

DRAGAN STEVANOVIĆ. Key words. Modularity matrix; Community structure; Largest eigenvalue; Complete multipartite A NOTE ON GRAPHS WHOSE LARGEST EIGENVALUE OF THE MODULARITY MATRIX EQUALS ZERO SNJEŽANA MAJSTOROVIĆ AND DRAGAN STEVANOVIĆ Abstract. Informally, a community within a graph is a subgraph whose vertices are

More information

ACO Comprehensive Exam March 17 and 18, Computability, Complexity and Algorithms

ACO Comprehensive Exam March 17 and 18, Computability, Complexity and Algorithms 1. Computability, Complexity and Algorithms (a) Let G(V, E) be an undirected unweighted graph. Let C V be a vertex cover of G. Argue that V \ C is an independent set of G. (b) Minimum cardinality vertex

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

Modular representations of symmetric groups: An Overview

Modular representations of symmetric groups: An Overview Modular representations of symmetric groups: An Overview Bhama Srinivasan University of Illinois at Chicago Regina, May 2012 Bhama Srinivasan (University of Illinois at Chicago) Modular Representations

More information

arxiv: v2 [physics.soc-ph] 13 Jun 2009

arxiv: v2 [physics.soc-ph] 13 Jun 2009 Alternative approach to community detection in networks A.D. Medus and C.O. Dorso Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Pabellón 1, Ciudad Universitaria,

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 3 1 Gaussian Graphical Models: Schur s Complement Consider

More information

Cocliques in the Kneser graph on line-plane flags in PG(4, q)

Cocliques in the Kneser graph on line-plane flags in PG(4, q) Cocliques in the Kneser graph on line-plane flags in PG(4, q) A. Blokhuis & A. E. Brouwer Abstract We determine the independence number of the Kneser graph on line-plane flags in the projective space PG(4,

More information

Parameterized Algorithms for the H-Packing with t-overlap Problem

Parameterized Algorithms for the H-Packing with t-overlap Problem Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 18, no. 4, pp. 515 538 (2014) DOI: 10.7155/jgaa.00335 Parameterized Algorithms for the H-Packing with t-overlap Problem Alejandro López-Ortiz

More information

1 Primals and Duals: Zero Sum Games

1 Primals and Duals: Zero Sum Games CS 124 Section #11 Zero Sum Games; NP Completeness 4/15/17 1 Primals and Duals: Zero Sum Games We can represent various situations of conflict in life in terms of matrix games. For example, the game shown

More information

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

First we look at some terms to be used in this section.

First we look at some terms to be used in this section. 8 Hypothesis Testing 8.1 Introduction MATH1015 Biostatistics Week 8 In Chapter 7, we ve studied the estimation of parameters, point or interval estimates. The construction of CI relies on the sampling

More information

Chapter 12 Comparing Two or More Means

Chapter 12 Comparing Two or More Means 12.1 Introduction 277 Chapter 12 Comparing Two or More Means 12.1 Introduction In Chapter 8 we considered methods for making inferences about the relationship between two population distributions based

More information

CMPSCI 611: Advanced Algorithms

CMPSCI 611: Advanced Algorithms CMPSCI 611: Advanced Algorithms Lecture 12: Network Flow Part II Andrew McGregor Last Compiled: December 14, 2017 1/26 Definitions Input: Directed Graph G = (V, E) Capacities C(u, v) > 0 for (u, v) E and

More information

Enumeration Schemes for Words Avoiding Permutations

Enumeration Schemes for Words Avoiding Permutations Enumeration Schemes for Words Avoiding Permutations Lara Pudwell November 27, 2007 Abstract The enumeration of permutation classes has been accomplished with a variety of techniques. One wide-reaching

More information

Random Lifts of Graphs

Random Lifts of Graphs 27th Brazilian Math Colloquium, July 09 Plan of this talk A brief introduction to the probabilistic method. A quick review of expander graphs and their spectrum. Lifts, random lifts and their properties.

More information

Rao s degree sequence conjecture

Rao s degree sequence conjecture Rao s degree sequence conjecture Maria Chudnovsky 1 Columbia University, New York, NY 10027 Paul Seymour 2 Princeton University, Princeton, NJ 08544 July 31, 2009; revised December 10, 2013 1 Supported

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

Lecture 13 Spectral Graph Algorithms

Lecture 13 Spectral Graph Algorithms COMS 995-3: Advanced Algorithms March 6, 7 Lecture 3 Spectral Graph Algorithms Instructor: Alex Andoni Scribe: Srikar Varadaraj Introduction Today s topics: Finish proof from last lecture Example of random

More information

Recitation 9: Loopy BP

Recitation 9: Loopy BP Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,

More information