Cohesive subgroups. MGT 780 Spring 2014 Steve Borgatti
|
|
- George Montgomery
- 6 years ago
- Views:
Transcription
1 Cohesive subgroups MGT 780 Spring 2014 Steve Borgatti
2 What are cohesive subgroups? Loosely speaking, network clumps Regions of high density and low distance Initially conceived of as formalizations of fundamental sociological concepts Primary groups Emergent groups Now typically thought of in terms of a technique for identifying groups within networks
3 Terminology Groups are sociological constructs Typically assumed to be recognized by their members A sense of legitimate membership - boundaries Some sense of voluntarity Internal cohesion Classes are categories of actors, such as socio-economic classes or genders Defined by similarity rather than cohesion Clusters are mathematical objects Sets of similar books, being inanimate objects, form clusters rather than groups Communities are what physicists call clusters that are intended to correspond to groups None of these distinctions is hard and fast 8 March 2013 Copyright (c) 2013 by Steve Borgatti. All rights reserved. 3
4 So do clusters Amazon co-purchasing of books in 2008 Thanks to Valdis Krebs 8 March 2013 Copyright (c) 2013 by Steve Borgatti. All rights reserved. 4
5 Partitions and classes Partition is an exhaustive division of items into sets of mutually exclusive classes Not all subgroup methods yield partitions Partitions are analytically convenient, but real groups c an be overlapping This partition has three classes, identified by color
6 Key output of subgroup analysis Group membership a nodelevel variable that indicates, for each node in the network, which group they belong to (if any) Used as independent or dependent var in subsequent analysis Node Gender Role Group HOLLY BRAZEY CAROL PAM MICHAEL BILL HARRY GERY STEVE BERT RUSS 2 2 3
7 Why we care They exist! Real networks often clumpy. Clumps are subgroups Number of groups can predict ability of network to accomplish something as a whole Need for data reduction Can analyze each group separately Can aggregate subgroups into supernodes Build simplified model of complex network (blockmodeling) They affect social processes we care about Groups are powerful influencers/constrainers of members Create homogeneity with respect to attitudes, behaviors. i.e., contagion Identify roles such as inner group member or boundary spanner They are the outcomes of social processes we care about Emergent patterns from local rules (e.g., balance theory)
8 Canonical Hypothesis Members of group will have similar outcomes (ideas, attitudes, illnesses, behaviors) Due to interpersonal transmission transference Influence / persuasion Co-construction of beliefs & practices As in communities of practice
9 Network Regions Weakly cohesive sections of a network that may contain subgroups Subgroups don t span different regions, they are wholly within regions Components Maximal sets of nodes that are mutually reachable weak and strong K-cores Maximal sets of nodes such that each member is connected to at least k other members of the k-core
10 11 K-cores Areas in which each actor is connected to k other actors all of whom have degree at least k Not cohesive but cohesive groups are contained within k-cores Finds large regions within which cohesive subgroups may be found Identifies fault lines across which cohesive subgroups do not span Useful for large datasets 2-core 7 3-core
11 Black are in 1-core only, red are 1&2-core only, blue are 1,2&3-core, grey are 1,2,3,4-core
12 K-Core more technical def A maximal set of nodes S such that for all u in S, α(u,s) >= k α(u,s) is number of ties node u has with members of set S G is 1-core & 2-core; {1..8} is 3-core There is no 4-core or higher 2-core 7 3-core
13 Components A subgraph S of a graph G is a component if S is maximal and connected If G is a digraph, then S is a weak component if it is a component of the underlying (undirected) graph S is a strong component if for all u,v in S, there is a path from u to v
14 Notes on Components Finding components is often first step in analysis of large graphs Often analyze each component separately, or discard very small components
15 Two approaches to subgroups Means Algorithmic / process / evolutionary Define a process/recipe/algorithm for extracting groups Most methods drawn from multivariate stats are of this type Ends Formal / mathematical / ideal type Define what you mean be a group, then leave it up to the researcher to construct an algorithm to find them Hybrids Have a formal definition of an ideal clustering Are defined by the (heuristic) algorithm used to get there
16 Or, four approaches to subgroups Subgroups Network analysis Multivariate stats Outcome -- clique Process -- Newman- Girvan -- Negopy Process -Hiclus -Kmeans -Newman Outcome -- factions -- comb opt Distance - n-clique - n-clan - n-club Density - k-core - k-plex - ls-set - lambda set - component
17 Typology of Subgroups Network / Graph theory Multivariate / Clustering Process Newman-Girvan Johnson s Hierarchical clustering; k- means; scaling Outcome Clique, n-clique, n-clan, n-club, k-plex, ls-set, lambda-set, k- core combinatorial optimization
18 From multivariate statistics Johnson s Hierarchical Clustering Output is a set of nested partitions, starting with identity partition and ending with the complete partition Partition represented as vector that associates each node with one and only one group (mutually exclusive) Different flavors based on how distance from a cluster to outside point/node is defined Single linkage; connectedness; minimum Complete linkage; diameter; maximum Average, median, etc.
19 MIAMI 0.20 LA SF DENVER CHICAGO DC NY BOSTON SEATTLE BOS NY DC MIA CHI SEA SF LA DEN BOS NY DC MIA CHI SEA SF LA DEN Closest distance is NY-BOS = 206, so merge these.
20 MIAMI 0.20 LA SF DENVER CHICAGO DC NY BOSTON SEATTLE BOS NY DC MIA CHI SEA SF LA DEN BOS/ NY DC MIA CHI SEA SF LA DEN Closest pair is DC to BOSNY 233. So merge these.
21 MIAMI 0.20 LA SF DENVER CHICAGO DC NY BOSTON SEATTLE BOS/NY DC BOS/ NY/ DC MIA CHI SEA SF LA DEN MIA CHI SEA SF LA DEN
22 MIAMI 0.20 LA SF DENVER CHICAGO DC NY BOSTON SEATTLE BOS/ NY/DC MIA CHI SEA SF/LA DEN BOS/NY/DC MIA CHI SEA SF/LA DEN
23 MIAMI 0.20 LA SF DENVER CHICAGO DC NY BOSTON SEATTLE BOS/ NY/D C/ CHI MIA SEA SF/L A DEN BOS/NY/DC/C HI MIA SEA SF/LA DEN
24 MIAMI 0.20 LA SF DENVER CHICAGO DC NY BOSTON SEATTLE BOS/ NY/D C/C HI MIA SF/L A/SE A DEN BOS/NY/DC/ CHI MIA SF/LA/SEA DEN
25 MIAMI 0.20 LA SF DENVER CHICAGO DC NY BOSTON SEATTLE BOS/ NY/D C/CHI /DEN MIA SF/LA /SEA BOS/NY/DC/ CHI/DEN MIA SF/LA/SEA
26 MIAMI 0.20 LA SF DENVER CHICAGO DC NY BOSTON SEATTLE BOS/ NY/D C/CH I/DE N/SF/ LA/S EA MIA BOS/NY/DC/CHI/DEN/SF/L A/SEA MIA Done!
27 Applying HiClus to Network Data Wants symmetric data Doesn t work well with matrix of 0s and 1s not enough variation to play with Compute geodesic distances first, then cluster the distance matrix Or cluster some other valued matrix derived from network Friends in common Geodesic Distances H B C P P J P A M B L D J H G S B R HOLLY BRAZEY CAROL PAM PAT JENNIE PAULINE ANN MICHAEL BILL LEE DON JOHN HARRY GERY STEVE BERT RUSS
28 Hiclus of geo distances P M A J I B C U H E C H R S A L O N B H A B A T G J R P R I P L N A I A R D L E Z E E O U A O N A L I N L E R O E R E V R H S T L E M Y E N L L Y N E T Y E Y N S Level XXXXX XXX XXX XXXXXXX XXXXXXX XXXXX XXXXX XXXXXXX XXXXXXX XXXXXXX XXXXX XXXXX XXXXXXX XXXXXXX XXXXXXXXXXXXX XXXXXXXXXXXXX XXXXXXX XXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
29 Measuring cluster adequacy Several off-the-shelf measures available All implicitly assume a concept of what subgroups are or what an optimal partition is E.g., more ties within group than between groups Within this criterion, multiple measures Newman Q modularity Eta coefficient (analysis of variance) Krackhardt & Stern EI index 12,000 others
30 Variations on hierarchical clustering Different flavors based on how distance from a cluster to outside point/node is defined Single linkage; connectedness; minimum Complete linkage; diameter; maximum Average, median, etc. (also, newman community detection, ncd)
31 Newman community detection (ncd) Built for 1/0 data specifically, networks A variation on hierarchical clustering At each point, join the pair of nodes that would result in the biggest Q modularity score Q measures the proportion of ties that are within group, minus the proportion expected by chance given group sizes
32 ncd Nod e ID Level XXX XXXXX XXX XXXXX XXXXX XXXXX XXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXXXX XXXXX XXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXX Q =
33 Factions Partition the data into a pre determined number of groups Has a fit function which measures how well this is done and applies combinatorial optimization to improve the fit Computationally difficult User has to define the number of groups No overlap
34 Factions Combinatorial optimization (tabu search) to find partition that minimizes Hamming distance between observed adjacency matrix and ideal matrix i.e., counts errors error Ideal errors Best fitting
35 Factions, 3 & 4 cluster solutions 3 cluster: 8 errors, Q = cluster: 9 errors, Q = In this case, Q seems less intuitive
36 Campnet Example Group Assignments: 1: HOLLY MICHAEL BILL DON HARRY 2: CAROL PAM PAT JENNIE PAULINE ANN 3: BRAZEY LEE JOHN GERY STEVE BERT RUSS BILL H B D H M P J A P P C L J B G S B R HOLLY BILL DON HARRY MICHAEL PAM JENNIE ANN PAULINE PAT CAROL LEE JOHN BRAZEY GERY STEVE BERT RUSS LEE BRAZEY STEVE BERT RUSS GERY DON HARRY MICHAEL HOLLY PAT PAM JENNI JOHN PAULINE CAROL ANN
37 Subgroups networks Multivariate stats Outcome -- clique Process -- Newman- Girvan -- Negopy Process -Hiclus -Kmeans -Newman Outcome -- factions -- comb opt Distance - n-clique - n-clan - n-club Density - k-core - k-plex - ls-set - lambda set - component
38 Newman-Girvan Calculate edge-betweenness for graph Remove edge with highest edge betweenness If number of components increases, record partition Recalculate edge betweenness & repeat until all nodes are isolates or maximum number of clusters reached/exceeded
39 Subgroups graph clustering Outcome -- clique Process -- Newman- Girvan -- Negopy Process -Hiclus -Kmeans -Newman Outcome -- factions -- comb opt Distance - n-clique - n-clan - n-club Density - k-core - k-plex - ls-set - lambda set - component
40 10 cliques found. Clique A maximal complete subgraph Everyone is adjacent to everyone else Avg distance & Diameter is 1 Density is 1 Limitations Undirected Issues Multiple, overlapping Too strict LEE STEVE 1: HOLLY MICHAEL DON HARRY 2: BRAZEY LEE STEVE BERT 3: CAROL PAT PAULINE 4: CAROL PAM PAULINE 5: PAM JENNIE ANN 6: PAM PAULINE ANN 7: MICHAEL BILL DON HARRY 8: JOHN GERY RUSS 9: GERY STEVE RUSS 10: STEVE BERT RUSS GERY BILL MICHAEL DON HARRY HOLLY PAM PAT JENNI BRAZEY RUSS JOHN PAULINE CAROL BERT ANN
41 Overlap A graph with 21 vertices can have as many as 2187 cliques Secondary analysis looks at the overlap matrices, e.g., form a node-by-node matrix O such that o ij gives number of cliques i and j are both in co-membership matrix Run hierarchical clustering on O
42 Types of Relaxations Distance (length of paths) N-clique, n-clan, n-club Density (number of ties) K-plex, ls-set, lambda set, k-core, component
43 N-cliques Definition Maximal subset s.t. for all u,v in S, d(u,v) <= n Distance among members less than specified maximum When n = 1, we have a clique Properties Relaxes notion of clique Avg distance can be greater than 1 Is {a,b,c,f,e} a 2-clique?
44 Issues with N-Cliques Overlapping {a,b,c,f,e} and {b,c,d,f,e} are both 2-cliques a b c d Membership criterion satisfiable through non-members f e Even 2-cliques can be fairly non-cohesive Red nodes belong to same 2- clique but none are adjacent
45 Subgraphs Set of nodes Is just a set of nodes b c A subgraph Is set of nodes together with ties among them An induced subgraph Subgraph defined by a set of nodes Like pulling the nodes and ties out of the original graph a a f b e c d d f e Subgraph induced by {a,b,c,f,e}
46 N-Clan Definition An n-clique S whose diameter in the subgraph induced by S is <= n Members of set within n links of each other without using outsiders Properties More cohesive than n-cliques a b c d f e Is {a,b,c,f,e} a 2-clan?
47 N-Clan Issues n-clique membership a bother Is {a,b,c,f} a 2-clan? List all 2-clans few found in data overlapping b c a d f e
48 N-Club Definition A maximal subset S whose diameter in the subgraph induced by S is <= n No n-clique requirement Properties Painful to compute More plentiful than n-clans overlapping a b c d Is {a,b,c,f} a 2-club? f e
49 Subgroups graph clustering Outcome -- clique Process -- Newman- Girvan -- Negopy Process -Hiclus -Kmeans Outcome -- factions -- comb opt Distance - n-clique - n-clan - n-club Density - k-core - k-plex - ls-set - lambda set - component
50 K-plex Maximal groups such that each member is connected to all but k others in the group 7, is a 2-plex each member connected to 4 minus 2 other members 1,2,3,4,5 is a 3-plex each member connected 5 minus 3 others
51 K-Plex a b c e d Is {a,b,d,e} a 2-plex? Is {a,b,c,d,e} a 2-plex? A 3-plex
52 K-Plexes technical def. Definition A k-plex is a [maximal] subset S s.t. for all u in S, α(u,s) >= S -k, where S is size of set S Properties Subsets of k-plexes are k-plexes Limited diameter (i.e., get distance as freebie) If k < ( S +2)/2 then diameter <= 2 Very numerous & overlapping Better match to intuition
53 LS-Sets Definition Let H be a subset of V, K be a proper subset of H H is LS if α(k,h-k) > α(k,v-h) Subsets more connected to other LS members than outsiders of LS set or H is LS if α(k) > α(h) Subsets better off joining LS set This one s usually easier to compute
54 LS-Sets H is LS if α(k,h-k) > α(k,v-h) Use when K is large or H is LS if α(k) > α(h) Use when K is small 4 3
55 LS-Sets Properties very cohesive Wholly nested or disjoint: no partial overlaps More ties within than between (doesn t just consider density inside density) Contain no minimum weight cutsets (lie on either side of fault lines ) Multiple edge-independent paths within High edge-connectivity
56 Lambda Operator Let λ(u,v) be the number of edge-independent paths from node u to node v λ(u,v) is also the minimum number of ties that must be removed from the network in order to disconnect u and v
57 Lambda Sets Definition A set of nodes S is a lambda set if for all a,b,c in S and d not in S, λ(a,b) > λ(c,d) More independent paths to other group members than to outsiders Properties Robust very difficult to disconnect even with intelligent attack Mutually exclusive or wholly inclusive No partially overlapping groups Pure like n-clubs, defined on a single attribute
58 Lambda Sets Non-Trivial LS-Sets {1,2,3,4} {1,2,3,4,5,6,7,8} {9,10,11,12} Non-Trivial Lambda Sets {1,2,3,4} {1,2,3,4,5,6,7,8} {9,10,11,12} {5,6,7,8} 11
59 Subgroups graph clustering Outcome -- clique Process -- Newman- Girvan -- Negopy Process -Hiclus -Kmeans -Newman Outcome -- factions -- comb opt Distance - n-clique - n-clan - n-club Density - k-core - k-plex - ls-set - lambda set - component
60 Newman-Girvan Successively deleting the tie with the most edge betweenness, and identifying components Yields a hierarchical clustering
61 Cohesion family of concepts HA RR Y MI CH AEL PA ULI NE PAT HO BIL DO PA JEN AN JO GE STE BE LLY L N M NIE N LEE HN RY VE RT RUSS HOLLY BILL DON HARRY MICHAEL PAM JENNIE ANN PAULINE PAT CAROL LEE JOHN BRAZEY GERY STEVE BERT RUSS Sum Copyright 2006 Steve Borgatti. All rights reserved. CA RO L BR AZ EY Sum Concept: Relational Centrality Subgroups Network (average )
Cohesive Subgroups. October 4, Based on slides by Steve Borgatti. Modifications (many sensible) by Rich DeJordy
Cohesive Subgroups October 4, 26 Based on slides by Steve Borgatti. Modifications (many sensible) by Rich DeJordy Before We Start Any questions on cohesion? Why do we care about Cohesive Subgroups 1. They
More informationWorking with Node Attributes
Working with Node Attributes Steve Borgatti 2009 LINKS Center Workshop ADVANCED Session Egonet Composition Concept Characterize the alters that an ego has ties a given type of ties with What proportion
More informationPeripheries of Cohesive Subsets
Peripheries of Cohesive Subsets Martin G. Everett University of Greenwich School of Computing and Mathematical Sciences 30 Park Row London SE10 9LS Tel: (0181) 331-8716 Fax: (181) 331-8665 E-mail: m.g.everett@gre.ac.uk
More informationPeripheries of cohesive subsets
Ž. Social Networks 21 1999 397 407 www.elsevier.comrlocatersocnet Peripheries of cohesive subsets Martin G. Everett a,), Stephen P. Borgatti b,1 a UniÕersity of Greenwich School of Computing and Mathematical
More informationTesting Network Hypotheses
Slide 1 Testing Network Hypotheses Thursday Afternoon Whether we are using Social Network Analysis as part of a consulting project or in support of academic research, it is important to know if the measures
More informationElementary Graph Theory & Matrix Algebra
Slide 1 Elementary Graph Theory & Matrix Algebra 2008 LINKS Center Summer SNA Workshops Steve Borgatti, Rich DeJordy, & Dan Halgin Slide 2 Introduction In social network analysis, we draw on three major
More informationMathematical Foundations of Social Network Analysis
Mathematical Foundations of Social Network Analysis Steve Borgatti Revised Jan 2008 for MGT 780 Three Representations Network/relational data typically represented in one of three ways Graphs Graphs vs
More informationSocial Network Notation
Social Network Notation Wasserman & Faust (1994) Chapters 3 & 4 [pp. 67 166] Marsden (1987) Core Discussion Networks of Americans Junesoo, Xiaohui & Stephen Monday, February 8th, 2010 Wasserman & Faust
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Non-overlapping vs. overlapping communities 11/10/2010 Jure Leskovec, Stanford CS224W: Social
More informationMining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationCommunities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices
Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into
More informationCohesive Subgroups and Core- Periphery Structure. Xiao Hyo-shin
Cohesive Subgroups and Core- Periphery Structure Xiao Hyo-shin Wasserman and Faust, chapter7 Cohesive Subgroups: Theoretical motivation Assess Directional relations and valued relations Interpretation
More informationEfficient Reassembling of Graphs, Part 1: The Linear Case
Efficient Reassembling of Graphs, Part 1: The Linear Case Assaf Kfoury Boston University Saber Mirzaei Boston University Abstract The reassembling of a simple connected graph G = (V, E) is an abstraction
More informationGroups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA
Groups of vertices and Core-periphery structure By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA Mostly observed real networks have: Why? Heavy tail (powerlaw most
More informationIntroduction to Social Network Analysis PSU Quantitative Methods Seminar, June 15
Introduction to Social Network Analysis PSU Quantitative Methods Seminar, June 15 Jeffrey A. Smith University of Nebraska-Lincoln Department of Sociology Course Website https://sites.google.com/site/socjasmith/teaching2/psu_social_networks_seminar
More informationOverlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach
Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Author: Jaewon Yang, Jure Leskovec 1 1 Venue: WSDM 2013 Presenter: Yupeng Gu 1 Stanford University 1 Background Community
More informationMore on Unsupervised Learning
More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data
More information8 Influence Networks. 8.1 The basic model. last revised: 23 February 2011
last revised: 23 February 2011 8 Influence Networks Social psychologists have long been interested in social influence processes, and whether these processes will lead over time to the convergence (or
More informationTheorem (Special Case of Ramsey s Theorem) R(k, l) is finite. Furthermore, it satisfies,
Math 16A Notes, Wee 6 Scribe: Jesse Benavides Disclaimer: These notes are not nearly as polished (and quite possibly not nearly as correct) as a published paper. Please use them at your own ris. 1. Ramsey
More informationJure Leskovec Joint work with Jaewon Yang, Julian McAuley
Jure Leskovec (@jure) Joint work with Jaewon Yang, Julian McAuley Given a network, find communities! Sets of nodes with common function, role or property 2 3 Q: How and why do communities form? A: Strength
More informationProtein function prediction via analysis of interactomes
Protein function prediction via analysis of interactomes Elena Nabieva Mona Singh Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics January 22, 2008 1 Introduction Genome
More informationChapter 7. Network Flow. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 7 Network Flow Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 7.5 Bipartite Matching Bipartite Matching Bipartite matching. Input: undirected, bipartite graph
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationA Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks
Shen HW, Cheng XQ, Wang YZ et al. A dimensionality reduction framework for detection of multiscale structure in heterogeneous networks. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(2): 341 357 Mar. 2012.
More informationRegression Clustering
Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form
More informationI N S T I T U T F Ü R I N F O R M A T I K. Local Density. Sven Kosub T E C H N I S C H E U N I V E R S I T Ä T M Ü N C H E N
T U M I N S T I T U T F Ü R I N F O R M A T I K Local Density Sven Kosub TUM-I0421 Dezember 2004 T E C H N I S C H E U N I V E R S I T Ä T M Ü N C H E N TUM-INFO-12-I0421-0/1.-FI Alle Rechte vorbehalten
More informationGraph Theory. Thomas Bloom. February 6, 2015
Graph Theory Thomas Bloom February 6, 2015 1 Lecture 1 Introduction A graph (for the purposes of these lectures) is a finite set of vertices, some of which are connected by a single edge. Most importantly,
More informationLecture 10 February 4, 2013
UBC CPSC 536N: Sparse Approximations Winter 2013 Prof Nick Harvey Lecture 10 February 4, 2013 Scribe: Alexandre Fréchette This lecture is about spanning trees and their polyhedral representation Throughout
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationCS 350 Algorithms and Complexity
CS 350 Algorithms and Complexity Winter 2019 Lecture 15: Limitations of Algorithmic Power Introduction to complexity theory Andrew P. Black Department of Computer Science Portland State University Lower
More informationCS 350 Algorithms and Complexity
1 CS 350 Algorithms and Complexity Fall 2015 Lecture 15: Limitations of Algorithmic Power Introduction to complexity theory Andrew P. Black Department of Computer Science Portland State University Lower
More informationApplying cluster analysis to 2011 Census local authority data
Applying cluster analysis to 2011 Census local authority data Kitty.Lymperopoulou@manchester.ac.uk SPSS User Group Conference November, 10 2017 Outline Basic ideas of cluster analysis How to choose variables
More informationThe Strength of Indirect Relations in Social Networks
The Strength of in Social Networks Department of Mathematics University of Patras Greece Moses.Boudourides@gmail.com Draft paper available at: http://nicomedia.math.upatras.gr/sn/dirisn_0.pdf May 28, 2011
More informationChapter 9: Relations Relations
Chapter 9: Relations 9.1 - Relations Definition 1 (Relation). Let A and B be sets. A binary relation from A to B is a subset R A B, i.e., R is a set of ordered pairs where the first element from each pair
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: Find coalitions in signed networks Incentives: European
More informationMultivariate Statistics: Hierarchical and k-means cluster analysis
Multivariate Statistics: Hierarchical and k-means cluster analysis Steffen Unkel Department of Medical Statistics University Medical Center Goettingen, Germany Summer term 217 1/43 What is a cluster? Proximity
More informationChapter 7. Network Flow. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 7 Network Flow Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E is a matching
More informationConditional Marginalization for Exponential Random Graph Models
Conditional Marginalization for Exponential Random Graph Models Tom A.B. Snijders January 21, 2010 To be published, Journal of Mathematical Sociology University of Oxford and University of Groningen; this
More informationTriangle-free graphs with no six-vertex induced path
Triangle-free graphs with no six-vertex induced path Maria Chudnovsky 1, Paul Seymour 2, Sophie Spirkl Princeton University, Princeton, NJ 08544 Mingxian Zhong Columbia University, New York, NY 10027 June
More informationTree sets. Reinhard Diestel
1 Tree sets Reinhard Diestel Abstract We study an abstract notion of tree structure which generalizes treedecompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked
More informationHW Graph Theory SOLUTIONS (hbovik) - Q
1, Diestel 3.5: Deduce the k = 2 case of Menger s theorem (3.3.1) from Proposition 3.1.1. Let G be 2-connected, and let A and B be 2-sets. We handle some special cases (thus later in the induction if these
More informationPrice of Stability in Survivable Network Design
Noname manuscript No. (will be inserted by the editor) Price of Stability in Survivable Network Design Elliot Anshelevich Bugra Caskurlu Received: January 2010 / Accepted: Abstract We study the survivable
More informationClaw-Free Graphs With Strongly Perfect Complements. Fractional and Integral Version.
Claw-Free Graphs With Strongly Perfect Complements. Fractional and Integral Version. Part II. Nontrivial strip-structures Maria Chudnovsky Department of Industrial Engineering and Operations Research Columbia
More informationLearning latent structure in complex networks
Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann
More informationTopic 4 Probability. Terminology. Sample Space and Event
Topic 4 Probability The Sample Space is the collection of all possible outcomes Experimental outcome An outcome from a sample space with one characteristic Event May involve two or more outcomes simultaneously
More informationGraph Clustering Algorithms
PhD Course on Graph Mining Algorithms, Università di Pisa February, 2018 Clustering: Intuition to Formalization Task Partition a graph into natural groups so that the nodes in the same cluster are more
More informationUsing Laplacian Eigenvalues and Eigenvectors in the Analysis of Frequency Assignment Problems
Using Laplacian Eigenvalues and Eigenvectors in the Analysis of Frequency Assignment Problems Jan van den Heuvel and Snežana Pejić Department of Mathematics London School of Economics Houghton Street,
More informationGraph Partitioning Algorithms and Laplacian Eigenvalues
Graph Partitioning Algorithms and Laplacian Eigenvalues Luca Trevisan Stanford Based on work with Tsz Chiu Kwok, Lap Chi Lau, James Lee, Yin Tat Lee, and Shayan Oveis Gharan spectral graph theory Use linear
More informationTree-width and planar minors
Tree-width and planar minors Alexander Leaf and Paul Seymour 1 Princeton University, Princeton, NJ 08544 May 22, 2012; revised March 18, 2014 1 Supported by ONR grant N00014-10-1-0680 and NSF grant DMS-0901075.
More informationCommunities, Spectral Clustering, and Random Walks
Communities, Spectral Clustering, and Random Walks David Bindel Department of Computer Science Cornell University 26 Sep 2011 20 21 19 16 22 28 17 18 29 26 27 30 23 1 25 5 8 24 2 4 14 3 9 13 15 11 10 12
More informationLectures of STA 231: Biostatistics
Lectures of STA 231: Biostatistics Second Semester Academic Year 2016/2017 Text Book Biostatistics: Basic Concepts and Methodology for the Health Sciences (10 th Edition, 2014) By Wayne W. Daniel Prepared
More informationarxiv:cond-mat/ v1 [cond-mat.dis-nn] 27 Mar 2006
Statistical Mechanics of Community Detection arxiv:cond-mat/0603718v1 [cond-mat.dis-nn] 27 Mar 2006 Jörg Reichardt 1 and Stefan Bornholdt 1 1 Institute for Theoretical Physics, University of Bremen, Otto-Hahn-Allee,
More informationA Model-Theoretic Coreference Scoring Scheme
Model-Theoretic oreference Scoring Scheme Marc Vilain, John urger, John berdeen, ennis onnolly, Lynette Hirschman The MITR orporation This note describes a scoring scheme for the coreference task in MU6.
More informationUndirected Graphs. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { 1-2, 1-3, 2-3, 2-4, 2-5, 3-5, 3-7, 3-8, 4-5, 5-6 } n = 8 m = 11
Undirected Graphs Undirected graph. G = (V, E) V = nodes. E = edges between pairs of nodes. Captures pairwise relationship between objects. Graph size parameters: n = V, m = E. V = {, 2, 3,,,, 7, 8 } E
More informationBECOMING FAMILIAR WITH SOCIAL NETWORKS
1 BECOMING FAMILIAR WITH SOCIAL NETWORKS Each one of us has our own social networks, and it is easiest to start understanding social networks through thinking about our own. So what social networks do
More informationExtremal Graphs Having No Stable Cutsets
Extremal Graphs Having No Stable Cutsets Van Bang Le Institut für Informatik Universität Rostock Rostock, Germany le@informatik.uni-rostock.de Florian Pfender Department of Mathematics and Statistics University
More informationMarkov Chains and Spectral Clustering
Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu
More informationFundamentals of Probability CE 311S
Fundamentals of Probability CE 311S OUTLINE Review Elementary set theory Probability fundamentals: outcomes, sample spaces, events Outline ELEMENTARY SET THEORY Basic probability concepts can be cast in
More informationUNAVOIDABLE INDUCED SUBGRAPHS IN LARGE GRAPHS WITH NO HOMOGENEOUS SETS
UNAVOIDABLE INDUCED SUBGRAPHS IN LARGE GRAPHS WITH NO HOMOGENEOUS SETS MARIA CHUDNOVSKY, RINGI KIM, SANG-IL OUM, AND PAUL SEYMOUR Abstract. An n-vertex graph is prime if it has no homogeneous set, that
More informationPolarization and Bipolar Probabilistic Argumentation Frameworks
Polarization and Bipolar Probabilistic Argumentation Frameworks Carlo Proietti Lund University Abstract. Discussion among individuals about a given issue often induces polarization and bipolarization effects,
More informationSummary of Chapters 7-9
Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two
More informationNonparametric Bayesian Matrix Factorization for Assortative Networks
Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin
More informationLecture 23 Branch-and-Bound Algorithm. November 3, 2009
Branch-and-Bound Algorithm November 3, 2009 Outline Lecture 23 Modeling aspect: Either-Or requirement Special ILPs: Totally unimodular matrices Branch-and-Bound Algorithm Underlying idea Terminology Formal
More informationLOCAL SEARCH ALGORITHMS FOR THE MAXIMUM K-PLEX PROBLEM
LOCAL SEARCH ALGORITHMS FOR THE MAXIMUM K-PLEX PROBLEM By ERIKA J. SHORT A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE
More informationMATHEMATICAL ENGINEERING TECHNICAL REPORTS. Boundary cliques, clique trees and perfect sequences of maximal cliques of a chordal graph
MATHEMATICAL ENGINEERING TECHNICAL REPORTS Boundary cliques, clique trees and perfect sequences of maximal cliques of a chordal graph Hisayuki HARA and Akimichi TAKEMURA METR 2006 41 July 2006 DEPARTMENT
More informationClaw-free Graphs. III. Sparse decomposition
Claw-free Graphs. III. Sparse decomposition Maria Chudnovsky 1 and Paul Seymour Princeton University, Princeton NJ 08544 October 14, 003; revised May 8, 004 1 This research was conducted while the author
More informationTheory and Methods for the Analysis of Social Networks
Theory and Methods for the Analysis of Social Networks Alexander Volfovsky Department of Statistical Science, Duke University Lecture 1: January 16, 2018 1 / 35 Outline Jan 11 : Brief intro and Guest lecture
More informationOn Resolution Limit of the Modularity in Community Detection
The Ninth International Symposium on Operations Research and Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 492 499 On Resolution imit of the
More informationHeuristics for The Whitehead Minimization Problem
Heuristics for The Whitehead Minimization Problem R.M. Haralick, A.D. Miasnikov and A.G. Myasnikov November 11, 2004 Abstract In this paper we discuss several heuristic strategies which allow one to solve
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationFinding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October
Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find
More informationICS 252 Introduction to Computer Design
ICS 252 fall 2006 Eli Bozorgzadeh Computer Science Department-UCI References and Copyright Textbooks referred [Mic94] G. De Micheli Synthesis and Optimization of Digital Circuits McGraw-Hill, 1994. [CLR90]
More informationk-symmetry Model: A General Framework To Achieve Identity Anonymization In Social Networks
k-symmetry Model: A General Framework To Achieve Identity Anonymization In Social Networks Wentao Wu School of Computer Science and Technology, Fudan University, Shanghai, China 1 Introduction Social networks
More informationAIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)
AIM HIGH SCHOOL Curriculum Map 2923 W. 12 Mile Road Farmington Hills, MI 48334 (248) 702-6922 www.aimhighschool.com COURSE TITLE: Statistics DESCRIPTION OF COURSE: PREREQUISITES: Algebra 2 Students will
More informationStatistical mechanics of community detection
Statistical mechanics of community detection Jörg Reichardt and Stefan Bornholdt Institute for Theoretical Physics, University of Bremen, Otto-Hahn-Allee, D-28359 Bremen, Germany Received 22 December 2005;
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationModularity and Graph Algorithms
Modularity and Graph Algorithms David Bader Georgia Institute of Technology Joe McCloskey National Security Agency 12 July 2010 1 Outline Modularity Optimization and the Clauset, Newman, and Moore Algorithm
More informationSpecification and estimation of exponential random graph models for social (and other) networks
Specification and estimation of exponential random graph models for social (and other) networks Tom A.B. Snijders University of Oxford March 23, 2009 c Tom A.B. Snijders (University of Oxford) Models for
More informationAn Introduction to Exponential-Family Random Graph Models
An Introduction to Exponential-Family Random Graph Models Luo Lu Feb.8, 2011 1 / 11 Types of complications in social network Single relationship data A single relationship observed on a set of nodes at
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationCHAPTER 3. THE IMPERFECT CUMULATIVE SCALE
CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE 3.1 Model Violations If a set of items does not form a perfect Guttman scale but contains a few wrong responses, we do not necessarily need to discard it. A wrong
More informationPartial cubes: structures, characterizations, and constructions
Partial cubes: structures, characterizations, and constructions Sergei Ovchinnikov San Francisco State University, Mathematics Department, 1600 Holloway Ave., San Francisco, CA 94132 Abstract Partial cubes
More informationChapter 11. Matrix Algorithms and Graph Partitioning. M. E. J. Newman. June 10, M. E. J. Newman Chapter 11 June 10, / 43
Chapter 11 Matrix Algorithms and Graph Partitioning M. E. J. Newman June 10, 2016 M. E. J. Newman Chapter 11 June 10, 2016 1 / 43 Table of Contents 1 Eigenvalue and Eigenvector Eigenvector Centrality The
More informationThe minimum G c cut problem
The minimum G c cut problem Abstract In this paper we define and study the G c -cut problem. Given a complete undirected graph G = (V ; E) with V = n, edge weighted by w(v i, v j ) 0 and an undirected
More informationUndirected Graphical Models
Readings: K&F 4. 4.2 4.3 4.4 Undirected Graphical Models Lecture 4 pr 6 20 SE 55 Statistical Methods Spring 20 Instructor: Su-In Lee University of Washington Seattle ayesian Network Representation irected
More informationUNCORRECTED PROOF. A Graph-theoretic perspective on centrality. Stephen P. Borgatti a,, Martin G. Everett b,1. 1. Introduction
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Abstract Social Networks xxx (2005) xxx xxx A Graph-theoretic perspective on centrality Stephen P. Borgatti a,, Martin G. Everett
More informationThe Lefthanded Local Lemma characterizes chordal dependency graphs
The Lefthanded Local Lemma characterizes chordal dependency graphs Wesley Pegden March 30, 2012 Abstract Shearer gave a general theorem characterizing the family L of dependency graphs labeled with probabilities
More informationProject in Computational Game Theory: Communities in Social Networks
Project in Computational Game Theory: Communities in Social Networks Eldad Rubinstein November 11, 2012 1 Presentation of the Original Paper 1.1 Introduction In this section I present the article [1].
More informationData Exploration and Unsupervised Learning with Clustering
Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationONE-THREE JOIN: A GRAPH OPERATION AND ITS CONSEQUENCES
Discussiones Mathematicae Graph Theory 37 (2017) 633 647 doi:10.7151/dmgt.1948 ONE-THREE JOIN: A GRAPH OPERATION AND ITS CONSEQUENCES M.A. Shalu and S. Devi Yamini IIITD & M, Kancheepuram, Chennai-600127,
More information3.2 Configuration model
3.2 Configuration model 3.2.1 Definition. Basic properties Assume that the vector d = (d 1,..., d n ) is graphical, i.e., there exits a graph on n vertices such that vertex 1 has degree d 1, vertex 2 has
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering
More informationPreliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)}
Preliminaries Graphs G = (V, E), V : set of vertices E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) 1 2 3 5 4 V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)} 1 Directed Graph (Digraph)
More informationMy data doesn t look like that..
Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing
More informationLecture 12: May 09, Decomposable Graphs (continues from last time)
596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 00 Dept. of lectrical ngineering Lecture : May 09, 00 Lecturer: Jeff Bilmes Scribe: Hansang ho, Izhak Shafran(000).
More informationK 4 -free graphs with no odd holes
K 4 -free graphs with no odd holes Maria Chudnovsky 1 Columbia University, New York NY 10027 Neil Robertson 2 Ohio State University, Columbus, Ohio 43210 Paul Seymour 3 Princeton University, Princeton
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationThe Lovász Local Lemma : A constructive proof
The Lovász Local Lemma : A constructive proof Andrew Li 19 May 2016 Abstract The Lovász Local Lemma is a tool used to non-constructively prove existence of combinatorial objects meeting a certain conditions.
More information3.4 Relaxations and bounds
3.4 Relaxations and bounds Consider a generic Discrete Optimization problem z = min{c(x) : x X} with an optimal solution x X. In general, the algorithms generate not only a decreasing sequence of upper
More information