Learning latent structure in complex networks

Similar documents
Machine Learning in Simple Networks. Lars Kai Hansen

Nonparametric Bayesian Matrix Factorization for Assortative Networks

The non-backtracking operator

Mixed Membership Stochastic Blockmodels

Deterministic modularity optimization

Markov chain Monte Carlo methods for visual tracking

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Lecture 6: Graphical Models: Learning

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Undirected Graphical Models

Probabilistic Graphical Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Latent Dirichlet Allocation (LDA)

STA 4273H: Statistical Machine Learning

Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). (v ji (1 x i ) + (1 v ji )x i )

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Machine Learning Summer School

Lecture 15. Probabilistic Models on Graph

Learning in Bayesian Networks

Mixed Membership Stochastic Blockmodels

Learning Gaussian Graphical Models with Unknown Group Sparsity

Part 1: Expectation Propagation

Gentle Introduction to Infinite Gaussian Mixture Modeling

LEARNING WITH BAYESIAN NETWORKS

Data Mining Techniques

Notes on Machine Learning for and

Bayesian Methods for Machine Learning

Chapter 16. Structured Probabilistic Models for Deep Learning

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Active and Semi-supervised Kernel Classification

Bayesian methods for graph clustering

Appendix: Modeling Approach

STA 4273H: Statistical Machine Learning

Bayesian Machine Learning - Lecture 7

The Origin of Deep Learning. Lili Mou Jan, 2015

Probabilistic Graphical Models

Factorization of Seperable and Patterned Covariance Matrices for Gibbs Sampling

STA 4273H: Statistical Machine Learning

Introduction to Machine Learning CMU-10701

CSC 412 (Lecture 4): Undirected Graphical Models

Network Event Data over Time: Prediction and Latent Variable Modeling

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Lecture 8: Bayesian Networks

Lecture 16 Deep Neural Generative Models

Directed and Undirected Graphical Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Bayesian Networks BY: MOHAMAD ALSABBAGH

p L yi z n m x N n xi

Bayesian non parametric inference of discrete valued networks

Pattern Recognition and Machine Learning

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Generalization in high-dimensional factor models

Bayesian Models in Machine Learning

Markov Chains and MCMC

Study Notes on the Latent Dirichlet Allocation

An Introduction to Bayesian Machine Learning

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 27 Mar 2006

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

MCMC: Markov Chain Monte Carlo

Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Large-scale Ordinal Collaborative Filtering

Machine Learning using Bayesian Approaches

Clustering using Mixture Models

Learning Energy-Based Models of High-Dimensional Data

bound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

When are networks truly modular?

Summary STK 4150/9150

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

Sparse Stochastic Inference for Latent Dirichlet Allocation

CS 188: Artificial Intelligence. Bayes Nets

Lin-Kernighan Heuristic. Simulated Annealing

Scaling Neighbourhood Methods

Expectation propagation for signal detection in flat-fading channels

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

3 : Representation of Undirected GM

An Empirical-Bayes Score for Discrete Bayesian Networks

Learning Bayesian Networks

Parameter Learning: Binary Variables

Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models

Data science with multilayer networks: Mathematical foundations and applications

Variational inference

Probabilistic Graphical Models

Time-Sensitive Dirichlet Process Mixture Models

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

ECE521 week 3: 23/26 January 2017

OPTIMIZATION BY SIMULATED ANNEALING: A NECESSARY AND SUFFICIENT CONDITION FOR CONVERGENCE. Bruce Hajek* University of Illinois at Champaign-Urbana

Session 3A: Markov chain Monte Carlo (MCMC)

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Markov Random Fields

Chapter 11. Matrix Algorithms and Graph Partitioning. M. E. J. Newman. June 10, M. E. J. Newman Chapter 11 June 10, / 43

Better restore the recto side of a document with an estimation of the verso side: Markov model and inference with graph cuts

Transcription:

Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann

N nodes/vertices and links/edges Directed / undirected Weighted / un-weighted Here A ij is symmetric matrix of 1/0 s Network models Link distributions Random Long tail Hubs and authorities Friends of friends are friends Assortative mixing The rich club Communities A community is a set of densely linked nodes Typically community structure is hidden or latent

The main points Community detection can be formulated as an inference problem The success of the inference depends on the link sampling process. There is a phase transition like detection threshold. The location can be estimated with mean field analysis The phase transition shifts (sharpens?) if we simultaneously learn the parameters of a generative model For good link prediction we need more complex latent structures: Simple community models do not beat basic heuristics

Why look for latent community structure? Communities may represent different mechanisms, hence different statistics the network is non-stationarity Communities detected by spectral clustering M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)

Why look for latent community structure? Communities may be predictive of dynamics and structural (in-)stability, e.g., Palla et al. (2007): Small communities depend on stable core of membership, large communities can persist longer if they renew membership Communities found by the clique percolation method (CPM) for detection of overlapping communities

Why look for latent community structure? Community structure may assist link prediction M. Mørup, L.K. Hansen: Learning latent structure in complex networks NIPS Workshop Analyzing Networks and Learning With Graphs (2009)

Outline Community detection Detection as a combinatorial optimization problem Mean field approx and Gibbs sampling Phase transition leads to sharp detection threshold Learning community parameters The Hofman-Wiggins model Detection threshold with learning Stochastic block models Link prediction Robustness to link dilution

Formal community detection.. Newman s Modularity The modularity is expressed as a sum over links, such that we reward excess links in communities relative to a baseline measure P ij 1 Q = Aij Pij δ ( ci, cj ) ij 2m With c i = k if node i is in community k, a total of m links 2m = Σ ij A ij, The baseline assumes independence P ij = k i k j /2m, with k i =Σ j A ij, Combinatorial optimization problem M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69:026113, 2004, cond-mat/0308217.

Potts representation Introduce C x N binary matrices S encoding the community assignment δ ( c, c ) = S S i j k ki kj 1 Q = Aij Pij SkiS ij k kj 2m 1 Tr( S ' BS) Q= BS ijk ij kiskj = 2m 2m

Spectral heuristic Newman makes a relaxation of the optimization problem to the unit sphere Q 1 Tr( S ' BS) = BS ijk ij kiskj = 2m 2m BS = SΛ Procedure: solve the eigenvalue problem to get a Fiedler vector (can be repeated), convert it to assignments and improve the resulting pre-community structure by local Lin-Kernighan postprocessing M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69:026113, 2004, cond-mat/0308217.

Combinatorial optimization Alternatively, use Gibbs sampling with simulated annealing (Kirkpatrick et al. 1983, Geman, Geman 1984) Q( S) Tr( SBS ') P( S AT, ) = exp = exp T 2mT Monte Carlo realization of a Markov process in which each variable is randomly assigned according to its marginal distribution P( S AT, ) P( Sj S j, AT, ) = P( S AT, ) S j S Geman,D Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images". IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (6): 721 741 (1984)

Gibbs sampling ϕ Bij Aij k k i j = S = S S j j j 2m 2m 2m 2m ki kj kj kj µ ki = exp( ϕki / T ) exp( ϕ / T ) k ' ki ' S ki = discrete( µ ) ki

Potts model: single node Discrete probability distribution on states k = 1,,C ( ) 1 ' ' (, ) exp (, ) exp exp k C k k k S k k k k k k S PS T T PS T T T ϕ ϕ ϕ µ ϕ µ ϕ = = =

Mean Field method: Approximate the posterior by product of discrete distributions Minimize the KL distance between P(S μ) and P(S A,p,q) PS ( µ ) = ( µ ) µ ki = exp( ϕki / T ) exp( ϕ / T ) k ' ki ki ki ' S ki Bij Aij k k i j ϕ ki = µ j kj = µ j kj µ j kj 2m 2m 2m 2m S. Lehmann, L.K. Hansen: Deterministic modularity optimization European Physical Journal B 60(1) 83-88 (2007).

Deterministic annealing Iterative solution with a decreasing sequence of temperatures to reach the ground state = MAP solution µ ( t+ 1) ki = ϕ T exp( / T ) () t exp( ki / ) () t ϕ k ' ki ' B A k k ϕ = µ = µ µ 2m 2m 2m 2m () t ij () t ij () t i j () t ki j kj j kj j kj

Experimental evaluation Create a simple testbed with within link probability p and between noise links q q = f p

S. Lehmann, L.K. Hansen: Deterministic modularity optimization European Physical Journal B 60(1) 83-88 (2007).

The Hofman-Wiggins model (2008) The H&W model is generalization of the Modularity heuristic to a proper statistical model with Bayesian inference ( d (1 d ) ) A ( d (1 d ) ) 1 ij PA ( S, pq, ) = p q (1 p) (1 q) d = S S ij k ki kj i> j ij ij ij ij A ij Consider the link probability parameters within p and between q unknown J.M. Hofman and C.H. Wiggins. A Bayesian approach to network modularity. Phys. Rev. Lett. 100:258701, 2008.

The Hofman-Wiggins model (2008) Critical behavior for fixed parameters, ( d (1 d ) ) A ( d (1 d ) ) 1 ij PS ( Apq,, ) p q (1 p) (1 q) i> j ij ij ij ij 1 p 1 q PS ( Apq,, ) = Z exp log SkiSkj Aij 1 ij k p q A ij Effective inverse temperature

The Hofman-Wiggins model (2008) Mean field critical behavior for fixed parameters, as function of p,q and A µ ( t+ 1) ki = ϕ = µ () t () t ki j kj ji () t exp( ϕki / T( pq, )) () t ϕ k ' ki ' exp( / T( pq, )) A Converges to μ= 1/C for T > T c p 1 q p = 1 p q q 1 (, ) log log logsnr T pq

The community detection threshold how many links are needed to detect the structure? Jorg Reichardt and Michele Leone, Un)detectable Cluster Structure in Sparse Networks Phys. Rev. Lett. 101, 078701 (2008),

The Hofman-Wiggins model Mean field critical link density Assume that A is indeed drawn with parameters p, q=fp=p/snr The iteration scheme converges to uniform random solution below the critical density ( ) 1 max 1 λ A N pc qc ( pc qc) log = = T( pc, qc) C 1 pc qc

Learning the parameters of the generative model Hofman & Wiggins (2008) Variational Bayes Dirichlets/beta prior and posterior distributions for the probabilities Independent binomials for the assignment variables Here Maximum likelihood for the parameters Gibbs sampling for the assignments Jake M. Hofman and Chris H. Wiggins, Bayesian Approach to Network Modularity Phys. Rev. Lett. 100, 258701 (2008),

Experimental design Planted solution N = 1000 nodes C true = 5 Quality: Mutual information between planted assignments and the best identified Gibbs sampling No annealing Burn-in 200 iterations Averaging 800 iterations Parameter learning Q = 10 iterations

Community Detection fully informed on number of communities and probabilities

Now what happens to the phase transition if we learn the parameters with a too complex model (C > C true = 5)?

More complex latent structures There is a very rich statistics literature on closely related models Review: Goldberg, Zeng, Fienberg, Airoldi (2010) The equivalent of the H&W model was analysed by Snijders & Nowicki (1997) using both EM and Gibbs sampling Stochastic block membership models parameterize the link density using a C x C matrix of parameters describing the potential different link probability between two given communities MMSB -the Mixed membership stochastic block model recently proposed and analyzed by Airoldi et al. (2008)

The general link density model Stochastic block model with a (learned) node specific link probability (R ij = r j r i ) ala Modularity A Bern( R P ) ji ij c( i), c( j) Key research questions How to evaluate these representations? - Link prediction Can we speed-up the inference process to make large graphs feasible? - a new NMF-like relaxation to the simplex avoids annealing M. Mørup, L.K. Hansen: Learning latent structure in complex networks NIPS Workshop Analyzing Networks and Learning With Graphs (2009)

Link prediction Inspired by Clauset et al. (2008) we use a crossvalidation like procedure where we predict the presence of held-out links in a number of networks: A. Clauset, C. Moore, and M.E.J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature, 453:98 101 (2008).

Link prediction results (fixed community #) M. Mørup, L.K. Hansen: Learning latent structure in complex networks NIPS Workshop Analyzing Networks and Learning With Graphs (2009)

Potential critique of link prediction i) cross validation is the structure robust to dilution? ii) can we relax the fixed cap on number of communities? Free word association Yeast Large communities seem very robust to link dilution. These runs use non-parametric Bayes Dirichlet process priors the number of communities is on the order of C = 50 as in earlier results, drops to about 10-20 when community structure deteriorates

Conclusions Community detection can be formulated as an inference problem The sampling process for fixed SNR has a phase transition like detection threshold - we can estimate the threshold from MF analysis The phase transition remains (sharpens?) if we learn the parameters of a generative model with unknown complexity For link prediction more complex latent structures are necessary: Modularity and H&W do not beat simple non-parametric models

Acknowledgements Morten Mørup www.imm.dtu.dk/~mm Sune Lehmann sune.barabasilab.com Funding Danish Research Councils The Lundbeck Foundation