Graph Detection and Estimation Theory

Size: px
Start display at page:

Download "Graph Detection and Estimation Theory"

Transcription

1 Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and Applied Sciences Department of Statistics, Harvard University sisl.seas.harvard.edu Graph Exploitation Symposium MIT Lincoln Laboratory, 13 April 2010 Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

2 Introduction Detection Estimation Outline Graph detection and estimation theory 1 Introduction Identifying structure in network data Erdős-Rényi and related models Network modularity as residuals analysis 2 Residuals-Based Detection Network test statistics Subgraph detection Simulation study 3 Estimation for Point Process Graphs (Perry & W, 2010) Point process graph model Parameter estimation Data analysis example Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

3 Introduction Detection Estimation Structure Erdős-Rényi Modularity Introduction to Graphs and Networks A brief overview Network data are increasingly prevalent across fields, yet even basic analyses prove computationally demanding Yet though random graph theory has been put to use in algorithms and combinatorics, we typically lack a detection and estimation theory for classes of popular graph models In this talk we will extend a simple model due to Chung & Lu, and investigate a popular method of residuals-based analysis as a form of testing for graph structure Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

4 Introduction Detection Estimation Structure Erdős-Rényi Modularity Introduction to Graphs and Networks A brief overview Typically, network topology is considered a function, intentional or otherwise, of the data acquisition procedure However, one may think of a graph-valued data set itself as a random instantiation of the network structure A concern of practitioners is to identify such structure through heterogeneity of the observed data set For example, networks of people may cluster depending on hobbies, political leaning, and so on Given a (potentially massive) network or some sub-network thereof, how can we identify the existence of structure? Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

5 Introduction Detection Estimation Structure Erdős-Rényi Modularity The Mechanics of Working with Graph-Valued Data Adjacency matrices and the like An (undirected, unweighted) graph is a set G = (V, E) of vertices and edges. The order n of the graph is its number of vertices, and the number of edges is called its size. We may represent a graph G via its adjacency matrix, a symmetric matrix whose ijth element is 1 if vertices i and j share an edge, and 0 otherwise (see Figure) Example Adjacency Matrix One may define a suitable Laplacian operator whose spectrum contains much important information about the graph. Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

6 Introduction Detection Estimation Structure Erdős-Rényi Modularity An Example: Erdős-Rényi A model exhibiting no structure In the classical Erdős-Rényi model, links are formed independently with (global) probability p Rows/columns of the adjacency matrix thus represent independent random samples of Bernoulli random variables (as in the previous figure) The degree, or number of edges, emanating from any vertex, is a Binomial(n 1, p) random variable Such a graph is said to be simple: self-loops and multiple edges are disallowed Random graph theory tells us much about this classical model... Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

7 Introduction Detection Estimation Structure Erdős-Rényi Modularity Properties of the Erdős-Rényi Model All hinges on the growth of (n 1)p, the expected number of edges per vertex Almost-Regularity: The number of edges e(u) of a fixed subgraph ( of order u is such that, if np > 144 log n, then P e(u) p ( ) u 7p log n n 2) < u 1 Connectivity: If np grows faster than 1 2 log n, then a.e. graph is connected. The complement is also true Degrees: If np n but np n 0, then a.e. graph has close to 1 2 n2 p vertices of degree 1, and the rest of degree 0 For M close to np, the set of graphs having M edges, taken equiprobably, behaves similarly However, in these cases, np is growing rapidly something not necessarily evident in natural data sets Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

8 Introduction Detection Estimation Structure Erdős-Rényi Modularity Related Models Generalizations of Erdős-Rényi With only a single parameter, the Erdős-Rényi model is rarely a good fit for real-world data Numerous generalizations have been proposed, often in the process of trying to match a particular data set via its degree sequence (k 1, k 2,..., k n ): Configuration model (Bender & Canfield, 1978): randomly rewire a given graph to preserve its degree sequence Given-expected-degrees model (Chung & Lu, 2002): Edges are (conditionally) independent, with P(A ij = 1) k i k j As constructed, neither of these graphs are simple they admit self-loops and/or multiple edges. Even counting the number of fixed-degree simple graphs for a given graphical degree sequence is nontrivial (Blitzstein & Diaconis) The Chung-Lu model has the advantage that it retains dyadic independence though it now depends on n parameters Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

9 Introduction Detection Estimation Structure Erdős-Rényi Modularity The Given-Expected-Degrees Model A generative model retaining dyadic independence It is easy to show that the given-expected-degrees model does in fact achieve expected degrees (k 1, k 2,..., k n ) more on that point later However, it is somewhat more general: Any positive sequence (k 1, k 2,..., k n ), such that max i k i n j=1 k j will serve as a generator for the Chung-Lu model The model also extends directly to the case of directed graphs, with P(A ij = 1) ki out kj in and consequently a specification in terms of 2n parameters This model is often fitted to data in the context of what is termed network modularity... Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

10 Introduction Detection Estimation Structure Erdős-Rényi Modularity Modularity and Community Detection Newman s clustering approach In community detection, Newman s concept of maximizing network modularity Q is often invoked: n n ( ) 1 Q A ij n i =1 k k i k j δ(i, j), i i=1 j=1 with δ(i, j) = 1 iff vertices i and j are in the same community In a proper likelihood-based formulation of the Chung-Lu model, this boils down to a graph-based residuals analysis in terms of observed-minus-expected degrees, with A 1 n i=1 k i kkt the so-called modularity matrix Maximizing modularity Q is thus equivalent to a community assignment that maximizes the signed residuals relative to the Chung-Lu model! Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

11 Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Outline Graph detection and estimation theory 1 Introduction Identifying structure in network data Erdős-Rényi and related models Network modularity as residuals analysis 2 Residuals-Based Detection Network test statistics Subgraph detection Simulation study 3 Estimation for Point Process Graphs (Perry & W, 2010) Point process graph model Parameter estimation Data analysis example Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

12 Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Formulating a graph detection problem The search for good test statistics Our earlier observation enables us to develop tests for graphs (or embedded subgraphs) whose variability is left mostly unexplained by the Chung-Lu model We ll limit ourselves here to the specific task of subgraph detection: Given a background graph corresponding to some generative model or indeed a real-world data set how detectable is a foreground object such as a clique? In general, densely structured subgraphs are correspondingly unlikely under the basic assumption of dyadic independence Consequently, dense embedded subgraphs should be detectable First, however, we ll investigate how difficult the problem appears to be... Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

13 Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Degrees as Summary Statistics Erdős-Rényi model Figure: Adjacency matrix (left) and degree distribution (right) of a 1024-vertex Erdős-Rényi graph Large cliques are easily detectable when embedded in Erdős-Rényi graphs, owing to their high degrees Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

14 Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Degrees as Summary Statistics R-MAT model Figure: Adjacency matrix (left) and degree distribution (right) of a 1024-vertex R-MAT graph An R-MAT graph (Chakrabarti et al., 2004) is instead endowed with independent edge probabilities obtained as Kronecker products of edge probabilities: ( log n ) [ p 1 p 2 ] p 3 p 4 Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

15 Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Detecting Embedded Subgraphs A question of p-values Figure: Clique (left) and background R-MAT graph (right) combine to yield a detection task Locating (with high probability) an embedded clique in given random graph relies on its low likelihood under the background model It is nontrivial to detect such embeddings directly via the empirical degree distribution Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

16 Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Algorithmic Approach Subgraph detection via the modularity matrix Figure: Without (top) & with (bottom) embedding We may observe the embedding of vertices induced by the first two principal eigenvectors of the modularity matrix A Chi-squared test on the expected proportion of vertices embedded into each of the 4 quadrants yields good performance Maximizing over rotation angle in the plane can further improve test power Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

17 Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Brief Simulation Study Detection results across various subgraph densities Probability of Detection ROC: R MAT Background 70% 75% 80% 85% 90% 95% 100% Probability of False Alarm Sample Proportion R MAT w/ Embedding: Distribution of Test Statistics 500 Background Alone 450 Background and Foreground χ 2 max Operating characteristics of the subgraph detection test, shown for various subgraph densities (left), with empirical sampling distributions of the test statistic shown for the case of a 12-vertex clique (right) Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

18 Introduction Detection Estimation Point Process Estimation Data Analysis Outline Graph detection and estimation theory 1 Introduction Identifying structure in network data Erdős-Rényi and related models Network modularity as residuals analysis 2 Residuals-Based Detection Network test statistics Subgraph detection Simulation study 3 Estimation for Point Process Graphs (Perry & W, 2010) Point process graph model Parameter estimation Data analysis example Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

19 Introduction Detection Estimation Point Process Estimation Data Analysis A Point Process Model for Graphs Likelihood-based formulation In many applications, a data matrix X comes in the form of counts with associated time stamps: for example, pairwise exchanges, text messages, or phone conversations Here, given a set of individuals labeled 1, 2,..., n under observation for times t (0, T ], we choose to model the interactions between individuals i and j as a point process ξ ij (t) = #{s (0, t] : node i interacts with node j at time s} For s < t we set ξ ij (s, t] ξ ij (t) ξ ij (s) to be the number of times that i and j interact in (s, t], and we assume the interactions to be instantaneous and dyadically independent Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

20 Introduction Detection Estimation Point Process Estimation Data Analysis A Point Process Model for Graphs Likelihood-based formulation Under mild assumptions, the intensity exists at all times t (0, T ] λ ij (t) lim δ 0 E [ξ ij (t + δ) ξ ij (t)] δ Independence of interactions in non-overlapping intervals results in ξ ij (s, t] being a Poisson random variable. We refer to its rate parameter simply by λ ij The total number of counts X ij ξ ij (T ) in a given interval is thus a sufficient statistic for λ ij, and elements of the data matrix X are independent with P(X ij = x ij ) = (λ ijt ) x ij x ij! e λ ij T Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

21 Introduction Detection Estimation Point Process Estimation Data Analysis A Log-Linear Model for Intensities Related to the gravity model and network traffic analysis In this modeling context, the adjacency matrix A is simply a right-censored version of the data matrix X Under our assumptions, then, A ij is a Bernoulli random variable with P(A ij = 1) = 1 e λ ij T A simple log-linear model is given by log λ ij = η + α i + α j, i j with the identifiability constraint that i α i = 0 In general, it is possible to prove consistency and asymptotic normality of ML estimators for this model as T, and derive the explicit form of the Fisher information Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

22 Introduction Detection Estimation Point Process Estimation Data Analysis Maximum-Likelihood Inference Closed-form solution and a connection to Chung-Lu If self-loops are allowed, then maximum-likelihood estimates are obtainable in closed form, via the standard exponential family parameterization for a Poisson mean In fact, such estimates are easily seen to be ˆλ ij = 1 T k i k j i k i, showing the relation to our earlier Chung-Lu model! A simple Taylor argument shows that as long as ˆλ ij is small, then the corresponding adjacency matrix has P(A ij = 1) k ik j i k i Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

23 Introduction Detection Estimation Point Process Estimation Data Analysis Maximum-Likelihood Inference Structural zeros Asymptotic results are also available for fixed T, with n So-called structural zeros (as they arise in contingency table analysis) complicate matters for instance, the prohibition of self-loops We can estimate such zeros directly from the observed graph data or, they may come directly from the application context These typically preclude a closed-form MLE, but our earlier method of Chung-Lu fitting can be used to initialize a sparse solver. We are currently building an R package for the general case of directed graph fitting Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

24 Introduction Detection Estimation Point Process Estimation Data Analysis Enron Data Per-week volume Number of s Week Figure: volume per week from the Enron corpus (left) and a suitably time-homogeneous period (right) The Enron corpus comprises 189 weeks of exchanges amongst 156 employees Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

25 Introduction Detection Estimation Point Process Estimation Data Analysis Estimated Parameters Point process model fitted to the Enron corpus Estimated receive parameter Normal quantile Estimated send parameter Figure: Estimated parameters for the Enron corpus 2 2 Normal quantile Estimate θ = (η, α, β) via maximum likelihood, using 90 identifiability constraints on = 313 parameters, giving = 223 d.o.f. For an arbitrary node, i, there is no apparent relationship between the estimated send parameter, ˆα i, and receive parameter, ˆβ i. In this modeling framework, it is also possible to include covariates (for instance, employee organizational hierarchy) Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

26 Introduction Detection Estimation Point Process Estimation Data Analysis Summary Graph detection and estimation theory 1 Introduction Identifying structure in network data Erdős-Rényi and related models Network modularity as residuals analysis 2 Residuals-Based Detection Network test statistics Subgraph detection Simulation study 3 Estimation for Point Process Graphs (Perry & W, 2010) Point process graph model Parameter estimation Data analysis example MIT Lincoln Laboratory, NSF-DMS/MSBS/CISE, DARPA, and ARO PECASE support is gratefully acknowledged Wolfe (Harvard University) Graph Detection and Estimation Theory April / 26

Spectral Methods for Subgraph Detection

Spectral Methods for Subgraph Detection Spectral Methods for Subgraph Detection Nadya T. Bliss & Benjamin A. Miller Embedded and High Performance Computing Patrick J. Wolfe Statistics and Information Laboratory Harvard University 12 July 2010

More information

Subgraph Detection Using Eigenvector L1 Norms

Subgraph Detection Using Eigenvector L1 Norms Subgraph Detection Using Eigenvector L1 Norms Benjamin A. Miller Lincoln Laboratory Massachusetts Institute of Technology Lexington, MA 02420 bamiller@ll.mit.edu Nadya T. Bliss Lincoln Laboratory Massachusetts

More information

Statistical Evaluation of Spectral Methods for Anomaly Detection in Static Networks

Statistical Evaluation of Spectral Methods for Anomaly Detection in Static Networks Statistical Evaluation of Spectral Methods for Anomaly Detection in Static Networks Tomilayo Komolafe 1, A. Valeria Quevedo 1,2, Srijan Sengupta 1 and William H. Woodall 1 arxiv:1711.01378v3 [stat.ap]

More information

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

6.207/14.15: Networks Lecture 12: Generalized Random Graphs 6.207/14.15: Networks Lecture 12: Generalized Random Graphs 1 Outline Small-world model Growing random networks Power-law degree distributions: Rich-Get-Richer effects Models: Uniform attachment model

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

Consistency Under Sampling of Exponential Random Graph Models

Consistency Under Sampling of Exponential Random Graph Models Consistency Under Sampling of Exponential Random Graph Models Cosma Shalizi and Alessandro Rinaldo Summary by: Elly Kaizar Remember ERGMs (Exponential Random Graph Models) Exponential family models Sufficient

More information

An Introduction to Exponential-Family Random Graph Models

An Introduction to Exponential-Family Random Graph Models An Introduction to Exponential-Family Random Graph Models Luo Lu Feb.8, 2011 1 / 11 Types of complications in social network Single relationship data A single relationship observed on a set of nodes at

More information

Szemerédi s regularity lemma revisited. Lewis Memorial Lecture March 14, Terence Tao (UCLA)

Szemerédi s regularity lemma revisited. Lewis Memorial Lecture March 14, Terence Tao (UCLA) Szemerédi s regularity lemma revisited Lewis Memorial Lecture March 14, 2008 Terence Tao (UCLA) 1 Finding models of large dense graphs Suppose we are given a large dense graph G = (V, E), where V is a

More information

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = = Exact Inference I Mark Peot In this lecture we will look at issues associated with exact inference 10 Queries The objective of probabilistic inference is to compute a joint distribution of a set of query

More information

Ground states for exponential random graphs

Ground states for exponential random graphs Ground states for exponential random graphs Mei Yin Department of Mathematics, University of Denver March 5, 2018 We propose a perturbative method to estimate the normalization constant in exponential

More information

Networks: Lectures 9 & 10 Random graphs

Networks: Lectures 9 & 10 Random graphs Networks: Lectures 9 & 10 Random graphs Heather A Harrington Mathematical Institute University of Oxford HT 2017 What you re in for Week 1: Introduction and basic concepts Week 2: Small worlds Week 3:

More information

Inference for Graphs and Networks: Extending Classical Tools to Modern Data

Inference for Graphs and Networks: Extending Classical Tools to Modern Data Submitted Manuscript Inference for Graphs and Networks: Extending Classical Tools to Modern Data Benjamin P. Olding and Patrick J. Wolfe arxiv:0906.4980v1 [stat.me] 26 Jun 2009 Abstract. Graphs and networks

More information

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Hyokun Yun (work with S.V.N. Vishwanathan) Department of Statistics Purdue Machine Learning Seminar November 9, 2011 Overview

More information

TOPOLOGY FOR GLOBAL AVERAGE CONSENSUS. Soummya Kar and José M. F. Moura

TOPOLOGY FOR GLOBAL AVERAGE CONSENSUS. Soummya Kar and José M. F. Moura TOPOLOGY FOR GLOBAL AVERAGE CONSENSUS Soummya Kar and José M. F. Moura Department of Electrical and Computer Engineering Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail:{moura}@ece.cmu.edu)

More information

Lecture 4: Graph Limits and Graphons

Lecture 4: Graph Limits and Graphons Lecture 4: Graph Limits and Graphons 36-781, Fall 2016 3 November 2016 Abstract Contents 1 Reprise: Convergence of Dense Graph Sequences 1 2 Calculating Homomorphism Densities 3 3 Graphons 4 4 The Cut

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Algebraic Representation of Networks

Algebraic Representation of Networks Algebraic Representation of Networks 0 1 2 1 1 0 0 1 2 0 0 1 1 1 1 1 Hiroki Sayama sayama@binghamton.edu Describing networks with matrices (1) Adjacency matrix A matrix with rows and columns labeled by

More information

Simultaneous and sequential detection of multiple interacting change points

Simultaneous and sequential detection of multiple interacting change points Simultaneous and sequential detection of multiple interacting change points Long Nguyen Department of Statistics University of Michigan Joint work with Ram Rajagopal (Stanford University) 1 Introduction

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 3 1 Gaussian Graphical Models: Schur s Complement Consider

More information

Modularity and Graph Algorithms

Modularity and Graph Algorithms Modularity and Graph Algorithms David Bader Georgia Institute of Technology Joe McCloskey National Security Agency 12 July 2010 1 Outline Modularity Optimization and the Clauset, Newman, and Moore Algorithm

More information

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 References 1 L. Freeman, Centrality in Social Networks: Conceptual Clarification, Social Networks, Vol. 1, 1978/1979, pp. 215 239. 2 S. Wasserman

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Statistical ranking problems and Hodge decompositions of graphs and skew-symmetric matrices

Statistical ranking problems and Hodge decompositions of graphs and skew-symmetric matrices Statistical ranking problems and Hodge decompositions of graphs and skew-symmetric matrices Lek-Heng Lim and Yuan Yao 2008 Workshop on Algorithms for Modern Massive Data Sets June 28, 2008 (Contains joint

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

The large deviation principle for the Erdős-Rényi random graph

The large deviation principle for the Erdős-Rényi random graph The large deviation principle for the Erdős-Rényi random graph (Courant Institute, NYU) joint work with S. R. S. Varadhan Main objective: how to count graphs with a given property Only consider finite

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Statistical and Computational Phase Transitions in Planted Models

Statistical and Computational Phase Transitions in Planted Models Statistical and Computational Phase Transitions in Planted Models Jiaming Xu Joint work with Yudong Chen (UC Berkeley) Acknowledgement: Prof. Bruce Hajek November 4, 203 Cluster/Community structure in

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

1 Complex Networks - A Brief Overview

1 Complex Networks - A Brief Overview Power-law Degree Distributions 1 Complex Networks - A Brief Overview Complex networks occur in many social, technological and scientific settings. Examples of complex networks include World Wide Web, Internet,

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Eigenvalues of Random Power Law Graphs (DRAFT)

Eigenvalues of Random Power Law Graphs (DRAFT) Eigenvalues of Random Power Law Graphs (DRAFT) Fan Chung Linyuan Lu Van Vu January 3, 2003 Abstract Many graphs arising in various information networks exhibit the power law behavior the number of vertices

More information

Network models: random graphs

Network models: random graphs Network models: random graphs Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis

More information

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due

More information

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

A New Random Graph Model with Self-Optimizing Nodes: Connectivity and Diameter

A New Random Graph Model with Self-Optimizing Nodes: Connectivity and Diameter A New Random Graph Model with Self-Optimizing Nodes: Connectivity and Diameter Richard J. La and Maya Kabkab Abstract We introduce a new random graph model. In our model, n, n 2, vertices choose a subset

More information

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics EECS 281A / STAT 241A Statistical Learning Theory Solutions to Problem Set 2 Fall 2011 Issued: Wednesday,

More information

Theory and Methods for the Analysis of Social Networks

Theory and Methods for the Analysis of Social Networks Theory and Methods for the Analysis of Social Networks Alexander Volfovsky Department of Statistical Science, Duke University Lecture 1: January 16, 2018 1 / 35 Outline Jan 11 : Brief intro and Guest lecture

More information

1 Mechanistic and generative models of network structure

1 Mechanistic and generative models of network structure 1 Mechanistic and generative models of network structure There are many models of network structure, and these largely can be divided into two classes: mechanistic models and generative or probabilistic

More information

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity CS322 Project Writeup Semih Salihoglu Stanford University 353 Serra Street Stanford, CA semih@stanford.edu

More information

Spectral Generative Models for Graphs

Spectral Generative Models for Graphs Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known

More information

Statistical analysis of biological networks.

Statistical analysis of biological networks. Statistical analysis of biological networks. Assessing the exceptionality of network motifs S. Schbath Jouy-en-Josas/Evry/Paris, France http://genome.jouy.inra.fr/ssb/ Colloquium interactions math/info,

More information

Algebraic Methods in Combinatorics

Algebraic Methods in Combinatorics Algebraic Methods in Combinatorics Po-Shen Loh 27 June 2008 1 Warm-up 1. (A result of Bourbaki on finite geometries, from Răzvan) Let X be a finite set, and let F be a family of distinct proper subsets

More information

Mini course on Complex Networks

Mini course on Complex Networks Mini course on Complex Networks Massimo Ostilli 1 1 UFSC, Florianopolis, Brazil September 2017 Dep. de Fisica Organization of The Mini Course Day 1: Basic Topology of Equilibrium Networks Day 2: Percolation

More information

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016 A Random Dot Product Model for Weighted Networks arxiv:1611.02530v1 [stat.ap] 8 Nov 2016 Daryl R. DeFord 1 Daniel N. Rockmore 1,2,3 1 Department of Mathematics, Dartmouth College, Hanover, NH, USA 03755

More information

HIERARCHICAL GRAPHS AND OSCILLATOR DYNAMICS.

HIERARCHICAL GRAPHS AND OSCILLATOR DYNAMICS. HIERARCHICAL GRAPHS AND OSCILLATOR DYNAMICS. H J DORRIAN PhD 2015 HIERARCHICAL GRAPHS AND OSCILLATOR DYNAMICS. HENRY JOSEPH DORRIAN A thesis submitted in partial fulfilment of the requirements of The Manchester

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Essential facts about NP-completeness:

Essential facts about NP-completeness: CMPSCI611: NP Completeness Lecture 17 Essential facts about NP-completeness: Any NP-complete problem can be solved by a simple, but exponentially slow algorithm. We don t have polynomial-time solutions

More information

Applications of the Lopsided Lovász Local Lemma Regarding Hypergraphs

Applications of the Lopsided Lovász Local Lemma Regarding Hypergraphs Regarding Hypergraphs Ph.D. Dissertation Defense April 15, 2013 Overview The Local Lemmata 2-Coloring Hypergraphs with the Original Local Lemma Counting Derangements with the Lopsided Local Lemma Lopsided

More information

The Trouble with Community Detection

The Trouble with Community Detection The Trouble with Community Detection Aaron Clauset Santa Fe Institute 7 April 2010 Nonlinear Dynamics of Networks Workshop U. Maryland, College Park Thanks to National Science Foundation REU Program James

More information

Statistical Inference for Networks. Peter Bickel

Statistical Inference for Networks. Peter Bickel Statistical Inference for Networks 4th Lehmann Symposium, Rice University, May 2011 Peter Bickel Statistics Dept. UC Berkeley (Joint work with Aiyou Chen, Google, E. Levina, U. Mich, S. Bhattacharyya,

More information

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1 Lecture I jacques@ucsd.edu Notation: Throughout, P denotes probability and E denotes expectation. Denote (X) (r) = X(X 1)... (X r + 1) and let G n,p denote the Erdős-Rényi model of random graphs. 10 Random

More information

An Algorithmist s Toolkit September 10, Lecture 1

An Algorithmist s Toolkit September 10, Lecture 1 18.409 An Algorithmist s Toolkit September 10, 2009 Lecture 1 Lecturer: Jonathan Kelner Scribe: Jesse Geneson (2009) 1 Overview The class s goals, requirements, and policies were introduced, and topics

More information

NP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness

NP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness Lecture 19 NP-Completeness I 19.1 Overview In the past few lectures we have looked at increasingly more expressive problems that we were able to solve using efficient algorithms. In this lecture we introduce

More information

NP-Hardness reductions

NP-Hardness reductions NP-Hardness reductions Definition: P is the class of problems that can be solved in polynomial time, that is n c for a constant c Roughly, if a problem is in P then it's easy, and if it's not in P then

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Random Networks. Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds

Random Networks. Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds Complex Networks CSYS/MATH 303, Spring, 2011 Prof. Peter Dodds Department of Mathematics & Statistics Center for Complex Systems Vermont Advanced Computing Center University of Vermont Licensed under the

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Supporting Statistical Hypothesis Testing Over Graphs

Supporting Statistical Hypothesis Testing Over Graphs Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012 CS-62 Theory Gems October 24, 202 Lecture Lecturer: Aleksander Mądry Scribes: Carsten Moldenhauer and Robin Scheibler Introduction In Lecture 0, we introduced a fundamental object of spectral graph theory:

More information

Average Distance, Diameter, and Clustering in Social Networks with Homophily

Average Distance, Diameter, and Clustering in Social Networks with Homophily arxiv:0810.2603v1 [physics.soc-ph] 15 Oct 2008 Average Distance, Diameter, and Clustering in Social Networks with Homophily Matthew O. Jackson September 24, 2008 Abstract I examine a random network model

More information

Game Theory and Algorithms Lecture 7: PPAD and Fixed-Point Theorems

Game Theory and Algorithms Lecture 7: PPAD and Fixed-Point Theorems Game Theory and Algorithms Lecture 7: PPAD and Fixed-Point Theorems March 17, 2011 Summary: The ultimate goal of this lecture is to finally prove Nash s theorem. First, we introduce and prove Sperner s

More information

3 Undirected Graphical Models

3 Undirected Graphical Models Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

This section is an introduction to the basic themes of the course.

This section is an introduction to the basic themes of the course. Chapter 1 Matrices and Graphs 1.1 The Adjacency Matrix This section is an introduction to the basic themes of the course. Definition 1.1.1. A simple undirected graph G = (V, E) consists of a non-empty

More information

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions Summary of the previous lecture Recall that we mentioned the following topics: P: is the set of decision problems (or languages) that are solvable in polynomial time. NP: is the set of decision problems

More information

Networks and Their Spectra

Networks and Their Spectra Networks and Their Spectra Victor Amelkin University of California, Santa Barbara Department of Computer Science victor@cs.ucsb.edu December 4, 2017 1 / 18 Introduction Networks (= graphs) are everywhere.

More information

Information Propagation Analysis of Social Network Using the Universality of Random Matrix

Information Propagation Analysis of Social Network Using the Universality of Random Matrix Information Propagation Analysis of Social Network Using the Universality of Random Matrix Yusuke Sakumoto, Tsukasa Kameyama, Chisa Takano and Masaki Aida Tokyo Metropolitan University, 6-6 Asahigaoka,

More information

Parameter estimators of sparse random intersection graphs with thinned communities

Parameter estimators of sparse random intersection graphs with thinned communities Parameter estimators of sparse random intersection graphs with thinned communities Lasse Leskelä Aalto University Johan van Leeuwaarden Eindhoven University of Technology Joona Karjalainen Aalto University

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

The expansion of random regular graphs

The expansion of random regular graphs The expansion of random regular graphs David Ellis Introduction Our aim is now to show that for any d 3, almost all d-regular graphs on {1, 2,..., n} have edge-expansion ratio at least c d d (if nd is

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Lecture 5: January 30

Lecture 5: January 30 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 5: January 30 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

The Probabilistic Method

The Probabilistic Method The Probabilistic Method In Graph Theory Ehssan Khanmohammadi Department of Mathematics The Pennsylvania State University February 25, 2010 What do we mean by the probabilistic method? Why use this method?

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

A nonparametric test for path dependence in discrete panel data

A nonparametric test for path dependence in discrete panel data A nonparametric test for path dependence in discrete panel data Maximilian Kasy Department of Economics, University of California - Los Angeles, 8283 Bunche Hall, Mail Stop: 147703, Los Angeles, CA 90095,

More information

Assessing Goodness of Fit of Exponential Random Graph Models

Assessing Goodness of Fit of Exponential Random Graph Models International Journal of Statistics and Probability; Vol. 2, No. 4; 2013 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Assessing Goodness of Fit of Exponential Random

More information

MATH 829: Introduction to Data Mining and Analysis Clustering II

MATH 829: Introduction to Data Mining and Analysis Clustering II his lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 829: Introduction to Data Mining and Analysis Clustering II Dominique Guillot Departments

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Ling 289 Contingency Table Statistics

Ling 289 Contingency Table Statistics Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,

More information

CS281A/Stat241A Lecture 19

CS281A/Stat241A Lecture 19 CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723

More information

Cases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem

Cases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem Cases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem Soheil Feizi, Muriel Médard RLE at MIT Emails: {sfeizi,medard}@mit.edu Abstract In this paper, we

More information

ON THE NP-COMPLETENESS OF SOME GRAPH CLUSTER MEASURES

ON THE NP-COMPLETENESS OF SOME GRAPH CLUSTER MEASURES ON THE NP-COMPLETENESS OF SOME GRAPH CLUSTER MEASURES JIŘÍ ŠÍMA AND SATU ELISA SCHAEFFER Academy of Sciences of the Czech Republic Helsinki University of Technology, Finland elisa.schaeffer@tkk.fi SOFSEM

More information

The Number of Independent Sets in a Regular Graph

The Number of Independent Sets in a Regular Graph Combinatorics, Probability and Computing (2010) 19, 315 320. c Cambridge University Press 2009 doi:10.1017/s0963548309990538 The Number of Independent Sets in a Regular Graph YUFEI ZHAO Department of Mathematics,

More information

Decomposable Graphical Gaussian Models

Decomposable Graphical Gaussian Models CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}

More information

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random Susceptible-Infective-Removed Epidemics and Erdős-Rényi random graphs MSR-Inria Joint Centre October 13, 2015 SIR epidemics: the Reed-Frost model Individuals i [n] when infected, attempt to infect all

More information

Maximum likelihood in log-linear models

Maximum likelihood in log-linear models Graphical Models, Lecture 4, Michaelmas Term 2010 October 22, 2010 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs Let A denote an arbitrary set of subsets

More information

The structure of bull-free graphs I three-edge-paths with centers and anticenters

The structure of bull-free graphs I three-edge-paths with centers and anticenters The structure of bull-free graphs I three-edge-paths with centers and anticenters Maria Chudnovsky Columbia University, New York, NY 10027 USA May 6, 2006; revised March 29, 2011 Abstract The bull is the

More information

Claw-free Graphs. III. Sparse decomposition

Claw-free Graphs. III. Sparse decomposition Claw-free Graphs. III. Sparse decomposition Maria Chudnovsky 1 and Paul Seymour Princeton University, Princeton NJ 08544 October 14, 003; revised May 8, 004 1 This research was conducted while the author

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016)

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016) FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016) The final exam will be on Thursday, May 12, from 8:00 10:00 am, at our regular class location (CSI 2117). It will be closed-book and closed-notes, except

More information

The structure of bull-free graphs II elementary trigraphs

The structure of bull-free graphs II elementary trigraphs The structure of bull-free graphs II elementary trigraphs Maria Chudnovsky Columbia University, New York, NY 10027 USA May 6, 2006; revised April 25, 2011 Abstract The bull is a graph consisting of a triangle

More information

6.207/14.15: Networks Lecture 3: Erdös-Renyi graphs and Branching processes

6.207/14.15: Networks Lecture 3: Erdös-Renyi graphs and Branching processes 6.207/14.15: Networks Lecture 3: Erdös-Renyi graphs and Branching processes Daron Acemoglu and Asu Ozdaglar MIT September 16, 2009 1 Outline Erdös-Renyi random graph model Branching processes Phase transitions

More information