Graph Detection and Estimation Theory

Similar documents
Spectral Methods for Subgraph Detection

Subgraph Detection Using Eigenvector L1 Norms

Statistical Evaluation of Spectral Methods for Anomaly Detection in Static Networks

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

Notes 6 : First and second moment methods

CPSC 540: Machine Learning

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Consistency Under Sampling of Exponential Random Graph Models

An Introduction to Exponential-Family Random Graph Models

Szemerédi s regularity lemma revisited. Lewis Memorial Lecture March 14, Terence Tao (UCLA)

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Ground states for exponential random graphs

Networks: Lectures 9 & 10 Random graphs

Inference for Graphs and Networks: Extending Classical Tools to Modern Data

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs

TOPOLOGY FOR GLOBAL AVERAGE CONSENSUS. Soummya Kar and José M. F. Moura

Lecture 4: Graph Limits and Graphons

CSC 412 (Lecture 4): Undirected Graphical Models

Algebraic Representation of Networks

Simultaneous and sequential detection of multiple interacting change points

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Modularity and Graph Algorithms

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

Undirected Graphical Models

Statistical ranking problems and Hodge decompositions of graphs and skew-symmetric matrices

A brief introduction to Conditional Random Fields

The large deviation principle for the Erdős-Rényi random graph

Data Mining and Analysis: Fundamental Concepts and Algorithms

Statistical and Computational Phase Transitions in Planted Models

Review: Directed Models (Bayes Nets)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

1 Complex Networks - A Brief Overview

10708 Graphical Models: Homework 2

Eigenvalues of Random Power Law Graphs (DRAFT)

Network models: random graphs

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

A New Random Graph Model with Self-Optimizing Nodes: Connectivity and Diameter

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Theory and Methods for the Analysis of Social Networks

1 Mechanistic and generative models of network structure

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity

Spectral Generative Models for Graphs

Statistical analysis of biological networks.

Algebraic Methods in Combinatorics

Mini course on Complex Networks

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016

HIERARCHICAL GRAPHS AND OSCILLATOR DYNAMICS.

Chapter 17: Undirected Graphical Models

Essential facts about NP-completeness:

Applications of the Lopsided Lovász Local Lemma Regarding Hypergraphs

The Trouble with Community Detection

Statistical Inference for Networks. Peter Bickel

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1

An Algorithmist s Toolkit September 10, Lecture 1

NP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness

NP-Hardness reductions

Support Vector Machine (SVM) and Kernel Methods

Random Networks. Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Supporting Statistical Hypothesis Testing Over Graphs

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

Average Distance, Diameter, and Clustering in Social Networks with Homophily

Game Theory and Algorithms Lecture 7: PPAD and Fixed-Point Theorems

3 Undirected Graphical Models

Learning MN Parameters with Approximation. Sargur Srihari

This section is an introduction to the basic themes of the course.

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions

Networks and Their Spectra

Information Propagation Analysis of Social Network Using the Universality of Random Matrix

Parameter estimators of sparse random intersection graphs with thinned communities

Cheng Soon Ong & Christian Walder. Canberra February June 2018

The expansion of random regular graphs

Undirected Graphical Models

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Lecture 5: January 30

The Probabilistic Method

Graphical Model Inference with Perfect Graphs

A nonparametric test for path dependence in discrete panel data

Assessing Goodness of Fit of Exponential Random Graph Models

MATH 829: Introduction to Data Mining and Analysis Clustering II

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

X X (2) X Pr(X = x θ) (3)

Ling 289 Contingency Table Statistics

CS281A/Stat241A Lecture 19

Cases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem

ON THE NP-COMPLETENESS OF SOME GRAPH CLUSTER MEASURES

The Number of Independent Sets in a Regular Graph

Decomposable Graphical Gaussian Models

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random

Maximum likelihood in log-linear models

The structure of bull-free graphs I three-edge-paths with centers and anticenters

Claw-free Graphs. III. Sparse decomposition

Alternative Parameterizations of Markov Networks. Sargur Srihari

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

STAT 200C: High-dimensional Statistics

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016)

The structure of bull-free graphs II elementary trigraphs

6.207/14.15: Networks Lecture 3: Erdös-Renyi graphs and Branching processes

Transcription:

Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and Applied Sciences Department of Statistics, Harvard University sisl.seas.harvard.edu Graph Exploitation Symposium MIT Lincoln Laboratory, 13 April 2010 Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 1 / 26

Introduction Detection Estimation Outline Graph detection and estimation theory 1 Introduction Identifying structure in network data Erdős-Rényi and related models Network modularity as residuals analysis 2 Residuals-Based Detection Network test statistics Subgraph detection Simulation study 3 Estimation for Point Process Graphs (Perry & W, 2010) Point process graph model Parameter estimation Data analysis example Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 2 / 26

Introduction Detection Estimation Structure Erdős-Rényi Modularity Introduction to Graphs and Networks A brief overview Network data are increasingly prevalent across fields, yet even basic analyses prove computationally demanding Yet though random graph theory has been put to use in algorithms and combinatorics, we typically lack a detection and estimation theory for classes of popular graph models In this talk we will extend a simple model due to Chung & Lu, and investigate a popular method of residuals-based analysis as a form of testing for graph structure Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 3 / 26

Introduction Detection Estimation Structure Erdős-Rényi Modularity Introduction to Graphs and Networks A brief overview Typically, network topology is considered a function, intentional or otherwise, of the data acquisition procedure However, one may think of a graph-valued data set itself as a random instantiation of the network structure A concern of practitioners is to identify such structure through heterogeneity of the observed data set For example, networks of people may cluster depending on hobbies, political leaning, and so on Given a (potentially massive) network or some sub-network thereof, how can we identify the existence of structure? Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 4 / 26

Introduction Detection Estimation Structure Erdős-Rényi Modularity The Mechanics of Working with Graph-Valued Data Adjacency matrices and the like An (undirected, unweighted) graph is a set G = (V, E) of vertices and edges. The order n of the graph is its number of vertices, and the number of edges is called its size. We may represent a graph G via its adjacency matrix, a symmetric matrix whose ijth element is 1 if vertices i and j share an edge, and 0 otherwise (see Figure) 10 20 30 40 50 60 70 80 90 100 Example Adjacency Matrix 10 20 30 40 50 60 70 80 90 100 One may define a suitable Laplacian operator whose spectrum contains much important information about the graph. Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 5 / 26

Introduction Detection Estimation Structure Erdős-Rényi Modularity An Example: Erdős-Rényi A model exhibiting no structure In the classical Erdős-Rényi model, links are formed independently with (global) probability p Rows/columns of the adjacency matrix thus represent independent random samples of Bernoulli random variables (as in the previous figure) The degree, or number of edges, emanating from any vertex, is a Binomial(n 1, p) random variable Such a graph is said to be simple: self-loops and multiple edges are disallowed Random graph theory tells us much about this classical model... Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 6 / 26

Introduction Detection Estimation Structure Erdős-Rényi Modularity Properties of the Erdős-Rényi Model All hinges on the growth of (n 1)p, the expected number of edges per vertex Almost-Regularity: The number of edges e(u) of a fixed subgraph ( of order u is such that, if np > 144 log n, then P e(u) p ( ) u 7p log n n 2) < u 1 Connectivity: If np grows faster than 1 2 log n, then a.e. graph is connected. The complement is also true Degrees: If np n but np n 0, then a.e. graph has close to 1 2 n2 p vertices of degree 1, and the rest of degree 0 For M close to np, the set of graphs having M edges, taken equiprobably, behaves similarly However, in these cases, np is growing rapidly something not necessarily evident in natural data sets Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 7 / 26

Introduction Detection Estimation Structure Erdős-Rényi Modularity Related Models Generalizations of Erdős-Rényi With only a single parameter, the Erdős-Rényi model is rarely a good fit for real-world data Numerous generalizations have been proposed, often in the process of trying to match a particular data set via its degree sequence (k 1, k 2,..., k n ): Configuration model (Bender & Canfield, 1978): randomly rewire a given graph to preserve its degree sequence Given-expected-degrees model (Chung & Lu, 2002): Edges are (conditionally) independent, with P(A ij = 1) k i k j As constructed, neither of these graphs are simple they admit self-loops and/or multiple edges. Even counting the number of fixed-degree simple graphs for a given graphical degree sequence is nontrivial (Blitzstein & Diaconis) The Chung-Lu model has the advantage that it retains dyadic independence though it now depends on n parameters Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 8 / 26

Introduction Detection Estimation Structure Erdős-Rényi Modularity The Given-Expected-Degrees Model A generative model retaining dyadic independence It is easy to show that the given-expected-degrees model does in fact achieve expected degrees (k 1, k 2,..., k n ) more on that point later However, it is somewhat more general: Any positive sequence (k 1, k 2,..., k n ), such that max i k i n j=1 k j will serve as a generator for the Chung-Lu model The model also extends directly to the case of directed graphs, with P(A ij = 1) ki out kj in and consequently a specification in terms of 2n parameters This model is often fitted to data in the context of what is termed network modularity... Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 9 / 26

Introduction Detection Estimation Structure Erdős-Rényi Modularity Modularity and Community Detection Newman s clustering approach In community detection, Newman s concept of maximizing network modularity Q is often invoked: n n ( ) 1 Q A ij n i =1 k k i k j δ(i, j), i i=1 j=1 with δ(i, j) = 1 iff vertices i and j are in the same community In a proper likelihood-based formulation of the Chung-Lu model, this boils down to a graph-based residuals analysis in terms of observed-minus-expected degrees, with A 1 n i=1 k i kkt the so-called modularity matrix Maximizing modularity Q is thus equivalent to a community assignment that maximizes the signed residuals relative to the Chung-Lu model! Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 10 / 26

Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Outline Graph detection and estimation theory 1 Introduction Identifying structure in network data Erdős-Rényi and related models Network modularity as residuals analysis 2 Residuals-Based Detection Network test statistics Subgraph detection Simulation study 3 Estimation for Point Process Graphs (Perry & W, 2010) Point process graph model Parameter estimation Data analysis example Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 11 / 26

Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Formulating a graph detection problem The search for good test statistics Our earlier observation enables us to develop tests for graphs (or embedded subgraphs) whose variability is left mostly unexplained by the Chung-Lu model We ll limit ourselves here to the specific task of subgraph detection: Given a background graph corresponding to some generative model or indeed a real-world data set how detectable is a foreground object such as a clique? In general, densely structured subgraphs are correspondingly unlikely under the basic assumption of dyadic independence Consequently, dense embedded subgraphs should be detectable First, however, we ll investigate how difficult the problem appears to be... Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 12 / 26

Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Degrees as Summary Statistics Erdős-Rényi model Figure: Adjacency matrix (left) and degree distribution (right) of a 1024-vertex Erdős-Rényi graph Large cliques are easily detectable when embedded in Erdős-Rényi graphs, owing to their high degrees Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 13 / 26

Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Degrees as Summary Statistics R-MAT model Figure: Adjacency matrix (left) and degree distribution (right) of a 1024-vertex R-MAT graph An R-MAT graph (Chakrabarti et al., 2004) is instead endowed with independent edge probabilities obtained as Kronecker products of edge probabilities: ( log n ) [ p 1 p 2 ] p 3 p 4 Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 14 / 26

Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Detecting Embedded Subgraphs A question of p-values Figure: Clique (left) and background R-MAT graph (right) combine to yield a detection task Locating (with high probability) an embedded clique in given random graph relies on its low likelihood under the background model It is nontrivial to detect such embeddings directly via the empirical degree distribution Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 15 / 26

Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Algorithmic Approach Subgraph detection via the modularity matrix Figure: Without (top) & with (bottom) embedding We may observe the embedding of vertices induced by the first two principal eigenvectors of the modularity matrix A Chi-squared test on the expected proportion of vertices embedded into each of the 4 quadrants yields good performance Maximizing over rotation angle in the plane can further improve test power Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 16 / 26

Introduction Detection Estimation Test Statistics Subgraph Detection Simulation Study Brief Simulation Study Detection results across various subgraph densities Probability of Detection 1 0.8 0.6 0.4 0.2 0 ROC: R MAT Background 70% 75% 80% 85% 90% 95% 100% 0 0.2 0.4 0.6 0.8 1 Probability of False Alarm Sample Proportion R MAT w/ Embedding: Distribution of Test Statistics 500 Background Alone 450 Background and Foreground 400 350 300 250 200 150 100 50 0 0 50 100 150 200 250 300 350 χ 2 max Operating characteristics of the subgraph detection test, shown for various subgraph densities (left), with empirical sampling distributions of the test statistic shown for the case of a 12-vertex clique (right) Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 17 / 26

Introduction Detection Estimation Point Process Estimation Data Analysis Outline Graph detection and estimation theory 1 Introduction Identifying structure in network data Erdős-Rényi and related models Network modularity as residuals analysis 2 Residuals-Based Detection Network test statistics Subgraph detection Simulation study 3 Estimation for Point Process Graphs (Perry & W, 2010) Point process graph model Parameter estimation Data analysis example Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 18 / 26

Introduction Detection Estimation Point Process Estimation Data Analysis A Point Process Model for Graphs Likelihood-based formulation In many applications, a data matrix X comes in the form of counts with associated time stamps: for example, pairwise e-mail exchanges, text messages, or phone conversations Here, given a set of individuals labeled 1, 2,..., n under observation for times t (0, T ], we choose to model the interactions between individuals i and j as a point process ξ ij (t) = #{s (0, t] : node i interacts with node j at time s} For s < t we set ξ ij (s, t] ξ ij (t) ξ ij (s) to be the number of times that i and j interact in (s, t], and we assume the interactions to be instantaneous and dyadically independent Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 19 / 26

Introduction Detection Estimation Point Process Estimation Data Analysis A Point Process Model for Graphs Likelihood-based formulation Under mild assumptions, the intensity exists at all times t (0, T ] λ ij (t) lim δ 0 E [ξ ij (t + δ) ξ ij (t)] δ Independence of interactions in non-overlapping intervals results in ξ ij (s, t] being a Poisson random variable. We refer to its rate parameter simply by λ ij The total number of counts X ij ξ ij (T ) in a given interval is thus a sufficient statistic for λ ij, and elements of the data matrix X are independent with P(X ij = x ij ) = (λ ijt ) x ij x ij! e λ ij T Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 20 / 26

Introduction Detection Estimation Point Process Estimation Data Analysis A Log-Linear Model for Intensities Related to the gravity model and network traffic analysis In this modeling context, the adjacency matrix A is simply a right-censored version of the data matrix X Under our assumptions, then, A ij is a Bernoulli random variable with P(A ij = 1) = 1 e λ ij T A simple log-linear model is given by log λ ij = η + α i + α j, i j with the identifiability constraint that i α i = 0 In general, it is possible to prove consistency and asymptotic normality of ML estimators for this model as T, and derive the explicit form of the Fisher information Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 21 / 26

Introduction Detection Estimation Point Process Estimation Data Analysis Maximum-Likelihood Inference Closed-form solution and a connection to Chung-Lu If self-loops are allowed, then maximum-likelihood estimates are obtainable in closed form, via the standard exponential family parameterization for a Poisson mean In fact, such estimates are easily seen to be ˆλ ij = 1 T k i k j i k i, showing the relation to our earlier Chung-Lu model! A simple Taylor argument shows that as long as ˆλ ij is small, then the corresponding adjacency matrix has P(A ij = 1) k ik j i k i Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 22 / 26

Introduction Detection Estimation Point Process Estimation Data Analysis Maximum-Likelihood Inference Structural zeros Asymptotic results are also available for fixed T, with n So-called structural zeros (as they arise in contingency table analysis) complicate matters for instance, the prohibition of self-loops We can estimate such zeros directly from the observed graph data or, they may come directly from the application context These typically preclude a closed-form MLE, but our earlier method of Chung-Lu fitting can be used to initialize a sparse solver. We are currently building an R package for the general case of directed graph fitting Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 23 / 26

Introduction Detection Estimation Point Process Estimation Data Analysis Enron E-mail Data Per-week e-mail volume Number of emails 0 100 200 300 400 500 1999 2000 2001 2002 2001 04 01 Week Figure: E-mail volume per week from the Enron corpus (left) and a suitably time-homogeneous period (right) The Enron e-mail corpus comprises 189 weeks of e-mail exchanges amongst 156 employees Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 24 / 26

Introduction Detection Estimation Point Process Estimation Data Analysis Estimated Parameters Point process model fitted to the Enron corpus Estimated receive parameter 1 0 1 2 Normal quantile 2 2 1 0 1 2 3 Estimated send parameter Figure: Estimated parameters for the Enron corpus 2 2 Normal quantile Estimate θ = (η, α, β) via maximum likelihood, using 90 identifiability constraints on 2 156 + 1 = 313 parameters, giving 313 90 = 223 d.o.f. For an arbitrary node, i, there is no apparent relationship between the estimated send parameter, ˆα i, and receive parameter, ˆβ i. In this modeling framework, it is also possible to include covariates (for instance, employee organizational hierarchy) Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 25 / 26

Introduction Detection Estimation Point Process Estimation Data Analysis Summary Graph detection and estimation theory 1 Introduction Identifying structure in network data Erdős-Rényi and related models Network modularity as residuals analysis 2 Residuals-Based Detection Network test statistics Subgraph detection Simulation study 3 Estimation for Point Process Graphs (Perry & W, 2010) Point process graph model Parameter estimation Data analysis example MIT Lincoln Laboratory, NSF-DMS/MSBS/CISE, DARPA, and ARO PECASE support is gratefully acknowledged Wolfe (Harvard University) Graph Detection and Estimation Theory April 2010 26 / 26