Lecture 6 (supplemental): Stochastic Block Models
|
|
- Candice Rich
- 5 years ago
- Views:
Transcription
1 3 Lecture 6 (supplemental): Stochastic Block Models Aaron Assistant Professor of Computer Science University of Colorado Boulder External Faculty, Santa Fe Institute 2017 Aaron Clauset
2 what is structure?
3 what is structure? makes data different from noise makes a network different from a random graph
4 what is structure? makes data different from noise makes a network different from a random graph helps us compress the data describe the network succinctly capture most relevant patterns
5 what is structure? makes data different from noise makes a network different from a random graph helps us compress the data describe the network succinctly capture most relevant patterns helps us generalize, from data we ve seen to data we haven t seen: i. from one part of network to another ii. from one network to others of same type iii. from small scale to large scale (coarse-grained structure) iv. from past to future (dynamics)
6 statistical inference imagine graph G is drawn from an ensemble or generative model: a probability distribution Pr(G ) with parameters can be continuous or discrete; represents structure of graph
7 statistical inference imagine graph G is drawn from an ensemble or generative model: a probability distribution Pr(G ) with parameters can be continuous or discrete; represents structure of graph inference (MLE): given G, find that maximizes Pr(G ) inference (Bayes): compute or sample from posterior distribution Pr( G)
8 statistical inference imagine graph G is drawn from an ensemble or generative model: a probability distribution Pr(G ) with parameters can be continuous or discrete; represents structure of graph inference (MLE): given G, find that maximizes Pr(G ) inference (Bayes): compute or sample from posterior distribution Pr( G) if is partly known, constrain inference and determine the rest if Gis partly known, infer and use Pr(G ) to generate the rest if model is good fit (application dependent), we can generate synthetic graphs structurally similar to G if part of G has low probability under model, flag as possible anomaly
9 statistical inference imagine graph G is drawn from an ensemble or generative model: a probability distribution Pr(G ) with parameters can be continuous or discrete; represents structure of graph inference (MLE): given G, find that maximizes Pr(G ) inference (Bayes): compute or sample from posterior distribution Pr( G) statistical inference = principled approach to learning from data combines tools from statistics, machine learning, information theory, and statistical physics if is partly known, constrain inference and determine the rest if Gis partly known, infer and use Pr(G ) to generate the rest if model is good fit (application dependent), we can generate synthetic graphs structurally similar to G if part of G has low probability under model, flag as possible anomaly quantifies uncertainty separates the model from the learning
10 statistical inference: key ideas interpretability model parameters have meaning for scientific questions auxiliary information node & edge attributes, temporal dynamics (beyond static binary graphs) scalability fast algorithms for fitting models to big data (methods from physics, machine learning) model selection which model is better? is this model bad? how many communities? partial or noisy data extrapolation, interpolation, hidden data, missing data anomaly detection low probability events under generative model
11 generative models for complex networks define a parametric probability distribution over networks generation : given, draw G from this distribution inference : given G, choose that makes G likely Pr(G ) model Pr(G θ) generation inference G=(V,E) data
12 generative models for complex networks general form model Pr(G θ) generation inference G=(V,E) data Pr(G ) = Y ij Pr(A ij ) edge generation function assumptions about structure go into consistency lim ˆ Pr 6= =0 n!1 Pr(A ij ) requires that edges be conditionally independent [Shalizi, Rinaldo 2011] two general classes of these models
13 generative models for complex networks model Pr(G θ) generation G=(V,E) data stochastic block models k types of vertices, Pr(A ij M,z) depends only on node types z i,z j inference originally invented by sociologists [Holland, Laskey, Leinhardt 1983] many, many flavors, including binomial SBM [Holland et al. 1983, Wang & Wong 1987] simple assortative SBM [Hofman & Wiggins 2008] mixed-membership SBM [Airoldi et al. 2008] hierarchical SBM [Clauset et al. 2006,2008, Peixoto 2014] fractal SBM [Leskovec et al. 2005] infinite relational model [Kemp et al. 2006] degree-corrected SBM [Karrer & Newman 2011] SBM + topic models [Ball et al. 2011] SBM + vertex covariates [Mariadassou et al. 2010, Newman & Clauset 2016] SBM + edge weights [Aicher et al. 2013,2014, Peixoto 2015] bipartite SBM [Larremore et al. 2014] multilayer SBM [Peixoto 2015, Valles-Catata et al. 2016] and many others D
14 generative models for complex networks model Pr(G θ) generation G=(V,E) latent space models nodes live in a latent space, Pr(A ij f(x i,x j )) depends only on vertex-vertex proximity originally invented by statisticians [Hoff, Raftery, Handcock 2002] many, many flavors, including logistic function on vertex features [Hoff et al. 2002] social status / ranking [Ball, Newman 2013] nonparametric metadata relations [Kim et al. 2012] multiplicative attribute graphs [Kim & Leskovec 2010] nonparametric latent feature model [Miller et al. 2009] infinite multiple memberships [Morup et al. 2011] ecological niche model [Williams et al. 2010] hyperbolic latent spaces [Boguna et al. 2010] and many others inference data
15 opportunities and challenges model Pr(G θ) generation G=(V,E) data richly annotated data edge weights, node attributes, time, etc. = new classes of generative models inference generalize from n =1 to ensemble useful for modeling checking, simulating other processes, etc. many familiar techniques frequentist and Bayesian frameworks makes probabilistic statements about observations, models predicting missing links leave-k-out cross validation approximate inference techniques (EM, VB, BP, etc.) sampling techniques (MCMC, Gibbs, etc.) learn from partial or noisy data extrapolation, interpolation, hidden data, missing data
16 opportunities and challenges model Pr(G θ) generation G=(V,E) only two classes of models stochastic block models (categorical latent variables) latent space models (ordinal / continuous latent variables) bootstrap / resampling for network data critical missing piece depends on what is independent in the data model comparison naive AIC, BIC, marginalization, LRT can be wrong for networks what is goal of modeling: realistic representation or accurate prediction? model assessment / checking? how do we know a model has done well? what do we check? what is v-fold cross-validation for networks? Omit n 2 /v edges? Omit n/v nodes? What? inference data
17 the stochastic block model each vertex i has type z i 2 {1,...,k} ( k vertex types or groups) stochastic block matrix M of group-level connection probabilities probability that i, j are connected = M zi,z j community = vertices with same pattern of inter-community connections inferred M 9 D
18 the stochastic block model assortative edges within groups disassortative edges between groups ordered linear group hierarchy core-periphery dense core, sparse periphery
19 the stochastic block model likelihood function the probability of G given labeling z and block matrix M Pr(G z,m) = Y Y (1 M zi,z j ) M zi,z j (i,j)2e (i,j)62e } } edge / non-edge probability
20 the stochastic block model likelihood function the probability of G given labeling z and block matrix M Pr(G z,m) = Y Y (1 M zi,z j ) M zi,z j (i,j)2e (i,j)62e = Y rs M e r,s r,s (1 M r,s ) n sn r e r,s (Bernoulli edges) Bernoulli random graph with parameter M r,s
21 the stochastic block model the most general SBM Pr(A z, ) = Y i,j f A ij R(zi,z j ) A ij : value of adjacency R : partition of adjacencies f : probability function a, : pattern for a-type adjacencies Binomial = simple graphs Poisson = multi-graphs Normal = weighted graphs etc
22 the stochastic block model degree-corrected SBM ( f = Poisson) Karrer, Newman, Phys. Rev. E (2011)
23 the stochastic block model degree-corrected SBM ( f = Poisson) key assumption Pr(i! j) = i j! zi,z j stochastic block matrix (degree) propensity of node likelihood:! r,s i Pr(A z,,!) = Y i<j i j! zr,z j A ij! A ij exp i j! zr,z j where ˆ i = k i Pj k j z i,z j ˆ! rs = m rs = X A ij zi,r z j,s ij } } fraction of i s group s stubs on i total number of edges between r and s Karrer, Newman, Phys. Rev. E (2011)
24 the stochastic block model comparing SBM vs. DC-SBM : Zachary karate club Karrer, Newman, Phys. Rev. E (2011)
25 the stochastic block model comparing SBM vs. DC-SBM : Zachary karate club SBM leader/follower division DC-SBM social group division Karrer, Newman, Phys. Rev. E (2011)
26 the stochastic block model comparing SBM vs. DC-SBM : Zachary karate club SBM leader/follower division DC-SBM social group division Karrer, Newman, Phys. Rev. E (2011)
27 the stochastic block model comparing SBM vs. DC-SBM : Zachary karate club log likelihood SBM leader/follower division DC-SBM social group division Peel, Larremore, Clauset, forthcoming (2016)
28 the stochastic block model comparing SBM vs. DC-SBM : Zachary karate club log likelihood log likelihood SBM leader/follower division Peel, Larremore, Clauset, forthcoming (2016) DC-SBM social group division
29 extending the SBM many variants! we ll cover three: bipartite community structure weighted community structure hierarchical community structure
30 bipartite networks many networks are bipartite scientists and papers (co-authorship networks) actors and movies (co-appearance networks) words and documents (topic modeling) plants and pollinators genes and genomes etc. assortative structure? Larremore et al., Phys. Rev. E 90, (2014)
31 bipartite networks many networks are bipartite scientists and papers (co-authorship networks) actors and movies (co-appearance networks) words and documents (topic modeling) plants and pollinators genes and genomes etc. most analyses focus on one-mode projections which discard information assortative structure? no, bipartite structure Larremore et al., Phys. Rev. E 90, (2014)
32 bipartite networks bipartite stochastic block model (bisbm) exactly the SBM, but model knows network is bipartite if type(z i )=type(z j ) assortative structure? then require M zi,z j =0 inference proceeds as before no, bipartite structure Larremore et al., Phys. Rev. E 90, (2014)
33 bipartite networks SBM can learn bipartite structure on its own but often over fits or returns mixed-type groups oops degree corrected bsbm likelihood score, Eq. (16) SBM K=5 SBM K=5 bisbm K a,k b= 3,2 SBM K=4 bisbm K a,k b= 3,2 bisbm K a,k b= 3,2 Larremore et al., Phys. Rev. E 90, (2014)
34 bipartite networks bipartite stochastic block model (bisbm) how do we know it works well? Larremore et al., Phys. Rev. E 90, (2014)
35 bipartite networks bipartite stochastic block model (bisbm) how do we know it works well? synthetic networks with planted partitions ! planted = B A ! planted = B 0 A 0 easy case: 4x4 easy-to-see communities hard case: 3x2 hard-to-see communities Larremore et al., Phys. Rev. E 90, (2014)
36 bipartite networks bipartite stochastic block model (bisbm) how do we know it works well? synthetic networks with planted partitions ! planted = B A easy case: 4x4 easy-to-see communities hard case: 3x2 hard-to-see communities Larremore et al., Phys. Rev. E 90, (2014)
37 bipartite networks bipartite stochastic block model (bisbm) how do we know it works well? synthetic networks with planted partitions 1.0 A 1.0 B bisbm, degree corrected normalized mutual information bisbm degree corrected SBM weighted projection unweighted projection uncorrected SBM weighted projection unweighted projection normalized mutual information SBM and degree-corrected SBM applied to both weighted and unweighted one-mode projections bisbm, no degree correction λ easy case: 4x4 easy-to-see communities λ hard case: 3x2 hard-to-see communities Larremore et al., Phys. Rev. E 90, (2014)
38 bipartite networks bipartite stochastic block model (bisbm) always find pure-type communities more accurate than modeling one-mode projections (even weighted projections) finds communities in both modes Larremore et al., Phys. Rev. E 90, (2014)
39 bipartite networks example 1: malaria gene network genes substrings genes substrings Larremore et al., Phys. Rev. E 90, (2014)
40 bipartite networks example 2: Southern Women network Events Attended 2 4 People People Events Attended Larremore et al., Phys. Rev. E 90, (2014)
41 bipartite networks example 3: IMDb Actors (53158) 1 Czech, Danish, German 2 French, Dutch, Cantonese, Greek, Filipino, Finnish Movies (39768) Japanese, Russian Adult 6 }English Larremore et al., Phys. Rev. E 90, (2014)
42 bipartite networks other approaches PRL 110, (2013) P H Y S I C A L R E V I E W L E T T E R S week ending 5 APRIL 2013 Parsimonious Module Inference in Large Networks Tiago P. Peixoto* Institut für Theoretische Physik, Universität Bremen, Hochschulring 18, D Bremen, Germany (Received 19 December 2012; published 5 April 2013; publisher error corrected 5 April 2013) minimum description length (MDL) principle learns that a network is bipartite Predicting Human Preferences Using the Block Structure of Complex Social Networks Roger Guimerà 1,2 *, Alejandro Llorente 3, Esteban Moro 4,5,3, Marta Sales-Pardo 2 marginalize over bipartite SBM parameterizations Peixoto, Phys. Rev. Lett. 110, (2013) Guimera et al., PLOS ONE 7(9), e44620 (2012)
43 bipartite networks PRL 110, (2013) P H Y S I C A L R E V I E W L E T T E R S Parsimonious Module Inference in Large Networks Tiago P. Peixoto* Institut für Theoretische Physik, Universität Bremen, Hochschulring 18, D Bremen, Germany (Received 19 December 2012; published 5 April 2013; publisher error corrected 5 April 2013) We investigate the detectability of modules in large networks when the number of modules is not known in advance. We employ the minimum description length principle which seeks to minimize the total amount of information required to describe the network, and avoid overfitting. According to this criterion, we obtain general bounds on the detectability of any prescribed block structure, given the number of nodes and p edges ffiffiffiffi in the sampled network. We also obtain that the maximum number of detectable blocks scales as N, where N is the number of nodes in the network, for a fixed average degree hki. We also show that the simplicity of the minimum description length approach yields an efficient multilevel Monte Carlo inference algorithm with a complexity of OðN36 lognþ, if the number of blocks is unknown, and OðNÞ if it is known, where is the mixing time of the Markov chain. We illustrate the application of the method on a large network of actors and films with over 10 6 edges, and a dissortative, bipartite block structure. 67 week ending 5 APRIL DOI: /PhysRevLett PACS numbers: Hc, Tt, Cf The detection of modules or communities is one of of B blocks, and the number of edges between nodes of the most intensely studied problems in the recent literature blocks r and s is given by the matrix e rs (or twice that of network systems [1,2]. The use of generative285models number if r ¼ s). The degree-corrected variant [14] further 304 for this 251 purpose, such as the stochastic 232blockmodel family imposes that each node i has a degree given by k i, where [3 20], has been gaining increasing attention. 31 This the set fk i g is an additional parameter 310 set of the model. The approach contrasts 297drastically with the majority of other directed version of both models is analogously defined, methods thus far employed214 in the 157 field 12 (such 266 as modularity with e rs becoming asymmetric, and fk i g together with maximization [21]), since not only is it derived from first 321 fk þ 10 i g fixing the in- and out-degrees of the nodes, respectively. These ensembles are characterized principles, but 146 also it is not restricted to purely assortative by their microcanonical entropy S ¼ 177 ln, where is the total number and undirected community structures. However, most inference methods used to 158 obtain the most 127 likely 125 blockmodel 59 of network realizations [29]. The entropy can be computed assume that the number 155 of communities 137 is known in analytically in both cases [30], advance [14,18,22 25]. Unfortunately, in most practical cases this quantity is completely unknown, 120 and one would S t ffi E 1 X ers e 201 like to infer it from the data as well. 140 Here we explore a very 2 rs ln ; (1) rs n r n s efficient way of obtaining this information from the data, for the traditional blockmodel ensemble 257 and, known as the minimum description length principle (MDL) [26,27], which predicates that the best choice of model which fits given data is the one which most compresses 108 it, 106 i.e., minimizes 294 k S c ffi E X N k lnk! 1 X ers e rs ln ; (2) rs e r e s the total amount of 74 information required to describe 183 it. This123approach 109 has been 38 introduced for P the degree corrected variant, where in both cases E ¼ 105 in the task of blockmodel inference in Ref. [28]. Here, we rse rs =2 is the total number of edges, n 281 r is the number of generalize it to accommodate an arbitrarily large number nodes which belong to block r, and N k is the total number of268 communities, and to obtain 225 general bounds on the 241 of nodes with degree k, and e 23 r ¼ P se rs is the number 99 detectability of arbitrary community 260 structures. We also of44 half-edges incident on block r. The directed case show that, according to this criterion, the maximum number of detectable blocks scales as N p ffiffiffiffi is analogous [30] (see Supplemental Material [31] for an , where N is the overview). 64 number of nodes in the network. Since the MDL approach 163 The detection problem consists in obtaining the block results in a simple penalty on the log-likelihood, 41 we 226 use it partition fb i g which is the most likely, when given an to implement 92 an efficient multilevel Monte 259 Carlo algorithm with an overall complexity of OðN lognþ, where node i. This 187 is done by maximizing the log-likelihood unlabeled network G, where b i is the block label of is the91average mixing time of the Markov chain, which lnp that the network G is observed, given the model can be used to infer arbitrary block 70 structures on very large compatible with a chosen block partition. Since we have networks. 231 simply P ¼ 1=, maximizing lnp is equivalent to minimize the entropy S 261 The model. The stochastic blockmodel 62 ensemble is t=c, which is the language we will use composed of graphs with N nodes, each belonging to one henceforth Entropy minimization is well defined, but only =13=110(14)=148701(5) Ó 2013 American Physical Society Peixoto, Phys. Rev. Lett. 110, (2013)
44 weighted networks most interactions are weighted frequency of interaction strength of interaction outcome of interaction etc. but! thresholding discards information and can obscure underlying structure
45 weighted networks Aicher et al., J. Complex Networks 3, (2015).
46 weighted networks Threshold = 1 Threshold = 2 Threshold = 3 Threshold = Aicher et al., J. Complex Networks 3, (2015).
47 weighted networks how will the results depend on the threshold? what impact does noise have, under threshold? Thomas & Blitzstein, arxiv: (2011).
48 recall the most general SBM Pr(A z, ) = Y i,j f A ij R(zi,z j ) A ij : value of adjacency R : partition of adjacencies f : probability function a, : pattern for a-type adjacencies Binomial = simple graphs Poisson = multi-graphs Normal = weighted graphs etc
49 weighted networks weighted stochastic block model (WSBM) model edge existence and edge weight separately edge existence: SBM edge weights: exponential family distribution log-likelihood: ln Pr(G M,z,,f) = ln Pr(G M,z) +(1 )lnpr(g, z, f) edge-existence [binomial distribution] M zi,z j edge-weights [exponential-family distribution] zi,z j Poisson, Normal, Gamma, Exponential, Pareto, etc. Aicher et al., J. Complex Networks 3, (2015).
50 weighted networks weighted stochastic block model (WSBM) model edge existence and edge weight separately edge existence: SBM edge weights: exponential family distribution log-likelihood: ln Pr(G M,z,,f) = ln Pr(G M,z) +(1 )lnpr(g, z, f) mixing parameter =1 only model edge existence (ignore weights) =0 only model edge weights (ignore non-edges) Aicher et al., J. Complex Networks 3, (2015).
51 weighted networks American football, NFL 2009 season 32 teams, 2 divisions, 4 subdivisions edge existence: who plays whom edge weight: mean score difference Aicher et al., J. Complex Networks 3, (2015).
52 weighted networks American football, NFL 2009 season 32 teams, 2 divisions, 4 subdivisions SBM ( =1 ) recovers subdivisions perfectly NFC { { AFC block matrix M,z Aicher et al., J. Complex Networks 3, (2015).
53 { weighted networks American football, NFL 2009 season 32 teams, 2 divisions, 4 subdivisions WSBM ( =0 ) recovers team skill hierarchy top teams { { wild cards (Steelers & Dolphins) middle teams { bottom teams block matrix M,z Aicher et al., J. Complex Networks 3, (2015).
54 weighted networks adding weights to the SBM what does A ij =0mean? no edge or weight=0 edge or non-observed edge? how will we model the distribution of edge weights? edge existences and edge weights may contain different large-scale structure (conference structure vs. skill hierarchy)
55 weighted networks other approaches
56 weighted networks other approaches PHYSICAL REVIEW E 92,042807(2015) Inferring the mesoscale structure of layered, edge-valued, and time-varying networks Tiago P. Peixoto * Institut für Theoretische Physik, Universität Bremen, Hochschulring 18, D Bremen, Germany (Received 18 June 2015; published 9 October 2015) bin the edge weights into C ranges C bins = C layers of a multi-layer SBM common node labels each layer has its own block matrix infer C, M c, z Peixoto, Phys. Rev. E 92, (2015)
57 weighted networks other approaches (discretized weights = SBM layers) OpenFlights airport network edge weights = distance (km) 10 3 Probability density (a) (b) (c) (d) (e) Edge distance (km) Peixoto, Phys. Rev. E 92, (2015)
58 weighted networks other approaches (discretized weights = SBM layers) OpenFlights airport network Peixoto, Phys. Rev. E 92, (2015)
59 fin
Mixed Membership Stochastic Blockmodels
Mixed Membership Stochastic Blockmodels (2008) Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg and Eric P. Xing Herrissa Lamothe Princeton University Herrissa Lamothe (Princeton University) Mixed
More informationNonparametric Bayesian Matrix Factorization for Assortative Networks
Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin
More informationEstimating the Number of Communities in a Bipartite Network
Estimating the Number of Communities in a Bipartite Network Tzu-Chi Yen, @oneofyen http://junipertcy.info Visitor at Santa Fe Institute, and Institute of Theoretical Physics, Chinese Academy of Sciences
More informationNetwork Analysis and Modeling
lecture 0: what are networks and how do we talk about them? 2017 Aaron Clauset 003 052 002 001 Aaron Clauset @aaronclauset Assistant Professor of Computer Science University of Colorado Boulder External
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationLearning latent structure in complex networks
Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann
More informationOverlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach
Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Author: Jaewon Yang, Jure Leskovec 1 1 Venue: WSDM 2013 Presenter: Yupeng Gu 1 Stanford University 1 Background Community
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationLifted and Constrained Sampling of Attributed Graphs with Generative Network Models
Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Pablo Robles Granda,
More informationA Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016
A Random Dot Product Model for Weighted Networks arxiv:1611.02530v1 [stat.ap] 8 Nov 2016 Daryl R. DeFord 1 Daniel N. Rockmore 1,2,3 1 Department of Mathematics, Dartmouth College, Hanover, NH, USA 03755
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationAdvanced Machine Learning
Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing
More informationMachine Learning in Simple Networks. Lars Kai Hansen
Machine Learning in Simple Networs Lars Kai Hansen www.imm.dtu.d/~lh Outline Communities and lin prediction Modularity Modularity as a combinatorial optimization problem Gibbs sampling Detection threshold
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationThe Trouble with Community Detection
The Trouble with Community Detection Aaron Clauset Santa Fe Institute 7 April 2010 Nonlinear Dynamics of Networks Workshop U. Maryland, College Park Thanks to National Science Foundation REU Program James
More informationMixed Membership Stochastic Blockmodels
Mixed Membership Stochastic Blockmodels Journal of Machine Learning Research, 2008 by E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P. Xing as interpreted by Ted Westling STAT 572 Intro Talk April 22, 2014
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationLarge-scale Ordinal Collaborative Filtering
Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationBayesian nonparametric models for bipartite graphs
Bayesian nonparametric models for bipartite graphs François Caron Department of Statistics, Oxford Statistics Colloquium, Harvard University November 11, 2013 F. Caron 1 / 27 Bipartite networks Readers/Customers
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are
More informationThe non-backtracking operator
The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationLink Prediction. Eman Badr Mohammed Saquib Akmal Khan
Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationData science with multilayer networks: Mathematical foundations and applications
Data science with multilayer networks: Mathematical foundations and applications CDSE Days University at Buffalo, State University of New York Monday April 9, 2018 Dane Taylor Assistant Professor of Mathematics
More informationA physical model for efficient rankings in networks
A physical model for efficient rankings in networks Daniel Larremore Assistant Professor Dept. of Computer Science & BioFrontiers Institute March 5, 2018 CompleNet danlarremore.com @danlarremore The idea
More informationTwo-sample hypothesis testing for random dot product graphs
Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,
More informationQuilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs
Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Hyokun Yun (work with S.V.N. Vishwanathan) Department of Statistics Purdue Machine Learning Seminar November 9, 2011 Overview
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationComputational Genomics
Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationModeling heterogeneity in random graphs
Modeling heterogeneity in random graphs Catherine MATIAS CNRS, Laboratoire Statistique & Génome, Évry (Soon: Laboratoire de Probabilités et Modèles Aléatoires, Paris) http://stat.genopole.cnrs.fr/ cmatias
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationNonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5)
Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5) Tamara Broderick ITT Career Development Assistant Professor Electrical Engineering & Computer Science MIT Bayes Foundations
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationRecurrent Latent Variable Networks for Session-Based Recommendation
Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationOn the errors introduced by the naive Bayes independence assumption
On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of
More informationReview: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)
Case Study 4: Collaborative Filtering Review: Probabilistic Matrix Factorization Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 2 th, 214 Emily Fox 214 1 Probabilistic
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationCS Homework 3. October 15, 2009
CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationDecision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016
Decision-making, inference, and learning theory ECE 830 & CS 761, Spring 2016 1 / 22 What do we have here? Given measurements or observations of some physical process, we ask the simple question what do
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationProbabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April
Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationChaos, Complexity, and Inference (36-462)
Chaos, Complexity, and Inference (36-462) Lecture 21 Cosma Shalizi 3 April 2008 Models of Networks, with Origin Myths Erdős-Rényi Encore Erdős-Rényi with Node Types Watts-Strogatz Small World Graphs Exponential-Family
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationSupporting Statistical Hypothesis Testing Over Graphs
Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,
More informationEE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Uncertainty & Bayesian Networks
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationRaRE: Social Rank Regulated Large-scale Network Embedding
RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationNetwork Event Data over Time: Prediction and Latent Variable Modeling
Network Event Data over Time: Prediction and Latent Variable Modeling Padhraic Smyth University of California, Irvine Machine Learning with Graphs Workshop, July 25 th 2010 Acknowledgements PhD students:
More information