Lecture 6 (supplemental): Stochastic Block Models

Size: px
Start display at page:

Download "Lecture 6 (supplemental): Stochastic Block Models"

Transcription

1 3 Lecture 6 (supplemental): Stochastic Block Models Aaron Assistant Professor of Computer Science University of Colorado Boulder External Faculty, Santa Fe Institute 2017 Aaron Clauset

2 what is structure?

3 what is structure? makes data different from noise makes a network different from a random graph

4 what is structure? makes data different from noise makes a network different from a random graph helps us compress the data describe the network succinctly capture most relevant patterns

5 what is structure? makes data different from noise makes a network different from a random graph helps us compress the data describe the network succinctly capture most relevant patterns helps us generalize, from data we ve seen to data we haven t seen: i. from one part of network to another ii. from one network to others of same type iii. from small scale to large scale (coarse-grained structure) iv. from past to future (dynamics)

6 statistical inference imagine graph G is drawn from an ensemble or generative model: a probability distribution Pr(G ) with parameters can be continuous or discrete; represents structure of graph

7 statistical inference imagine graph G is drawn from an ensemble or generative model: a probability distribution Pr(G ) with parameters can be continuous or discrete; represents structure of graph inference (MLE): given G, find that maximizes Pr(G ) inference (Bayes): compute or sample from posterior distribution Pr( G)

8 statistical inference imagine graph G is drawn from an ensemble or generative model: a probability distribution Pr(G ) with parameters can be continuous or discrete; represents structure of graph inference (MLE): given G, find that maximizes Pr(G ) inference (Bayes): compute or sample from posterior distribution Pr( G) if is partly known, constrain inference and determine the rest if Gis partly known, infer and use Pr(G ) to generate the rest if model is good fit (application dependent), we can generate synthetic graphs structurally similar to G if part of G has low probability under model, flag as possible anomaly

9 statistical inference imagine graph G is drawn from an ensemble or generative model: a probability distribution Pr(G ) with parameters can be continuous or discrete; represents structure of graph inference (MLE): given G, find that maximizes Pr(G ) inference (Bayes): compute or sample from posterior distribution Pr( G) statistical inference = principled approach to learning from data combines tools from statistics, machine learning, information theory, and statistical physics if is partly known, constrain inference and determine the rest if Gis partly known, infer and use Pr(G ) to generate the rest if model is good fit (application dependent), we can generate synthetic graphs structurally similar to G if part of G has low probability under model, flag as possible anomaly quantifies uncertainty separates the model from the learning

10 statistical inference: key ideas interpretability model parameters have meaning for scientific questions auxiliary information node & edge attributes, temporal dynamics (beyond static binary graphs) scalability fast algorithms for fitting models to big data (methods from physics, machine learning) model selection which model is better? is this model bad? how many communities? partial or noisy data extrapolation, interpolation, hidden data, missing data anomaly detection low probability events under generative model

11 generative models for complex networks define a parametric probability distribution over networks generation : given, draw G from this distribution inference : given G, choose that makes G likely Pr(G ) model Pr(G θ) generation inference G=(V,E) data

12 generative models for complex networks general form model Pr(G θ) generation inference G=(V,E) data Pr(G ) = Y ij Pr(A ij ) edge generation function assumptions about structure go into consistency lim ˆ Pr 6= =0 n!1 Pr(A ij ) requires that edges be conditionally independent [Shalizi, Rinaldo 2011] two general classes of these models

13 generative models for complex networks model Pr(G θ) generation G=(V,E) data stochastic block models k types of vertices, Pr(A ij M,z) depends only on node types z i,z j inference originally invented by sociologists [Holland, Laskey, Leinhardt 1983] many, many flavors, including binomial SBM [Holland et al. 1983, Wang & Wong 1987] simple assortative SBM [Hofman & Wiggins 2008] mixed-membership SBM [Airoldi et al. 2008] hierarchical SBM [Clauset et al. 2006,2008, Peixoto 2014] fractal SBM [Leskovec et al. 2005] infinite relational model [Kemp et al. 2006] degree-corrected SBM [Karrer & Newman 2011] SBM + topic models [Ball et al. 2011] SBM + vertex covariates [Mariadassou et al. 2010, Newman & Clauset 2016] SBM + edge weights [Aicher et al. 2013,2014, Peixoto 2015] bipartite SBM [Larremore et al. 2014] multilayer SBM [Peixoto 2015, Valles-Catata et al. 2016] and many others D

14 generative models for complex networks model Pr(G θ) generation G=(V,E) latent space models nodes live in a latent space, Pr(A ij f(x i,x j )) depends only on vertex-vertex proximity originally invented by statisticians [Hoff, Raftery, Handcock 2002] many, many flavors, including logistic function on vertex features [Hoff et al. 2002] social status / ranking [Ball, Newman 2013] nonparametric metadata relations [Kim et al. 2012] multiplicative attribute graphs [Kim & Leskovec 2010] nonparametric latent feature model [Miller et al. 2009] infinite multiple memberships [Morup et al. 2011] ecological niche model [Williams et al. 2010] hyperbolic latent spaces [Boguna et al. 2010] and many others inference data

15 opportunities and challenges model Pr(G θ) generation G=(V,E) data richly annotated data edge weights, node attributes, time, etc. = new classes of generative models inference generalize from n =1 to ensemble useful for modeling checking, simulating other processes, etc. many familiar techniques frequentist and Bayesian frameworks makes probabilistic statements about observations, models predicting missing links leave-k-out cross validation approximate inference techniques (EM, VB, BP, etc.) sampling techniques (MCMC, Gibbs, etc.) learn from partial or noisy data extrapolation, interpolation, hidden data, missing data

16 opportunities and challenges model Pr(G θ) generation G=(V,E) only two classes of models stochastic block models (categorical latent variables) latent space models (ordinal / continuous latent variables) bootstrap / resampling for network data critical missing piece depends on what is independent in the data model comparison naive AIC, BIC, marginalization, LRT can be wrong for networks what is goal of modeling: realistic representation or accurate prediction? model assessment / checking? how do we know a model has done well? what do we check? what is v-fold cross-validation for networks? Omit n 2 /v edges? Omit n/v nodes? What? inference data

17 the stochastic block model each vertex i has type z i 2 {1,...,k} ( k vertex types or groups) stochastic block matrix M of group-level connection probabilities probability that i, j are connected = M zi,z j community = vertices with same pattern of inter-community connections inferred M 9 D

18 the stochastic block model assortative edges within groups disassortative edges between groups ordered linear group hierarchy core-periphery dense core, sparse periphery

19 the stochastic block model likelihood function the probability of G given labeling z and block matrix M Pr(G z,m) = Y Y (1 M zi,z j ) M zi,z j (i,j)2e (i,j)62e } } edge / non-edge probability

20 the stochastic block model likelihood function the probability of G given labeling z and block matrix M Pr(G z,m) = Y Y (1 M zi,z j ) M zi,z j (i,j)2e (i,j)62e = Y rs M e r,s r,s (1 M r,s ) n sn r e r,s (Bernoulli edges) Bernoulli random graph with parameter M r,s

21 the stochastic block model the most general SBM Pr(A z, ) = Y i,j f A ij R(zi,z j ) A ij : value of adjacency R : partition of adjacencies f : probability function a, : pattern for a-type adjacencies Binomial = simple graphs Poisson = multi-graphs Normal = weighted graphs etc

22 the stochastic block model degree-corrected SBM ( f = Poisson) Karrer, Newman, Phys. Rev. E (2011)

23 the stochastic block model degree-corrected SBM ( f = Poisson) key assumption Pr(i! j) = i j! zi,z j stochastic block matrix (degree) propensity of node likelihood:! r,s i Pr(A z,,!) = Y i<j i j! zr,z j A ij! A ij exp i j! zr,z j where ˆ i = k i Pj k j z i,z j ˆ! rs = m rs = X A ij zi,r z j,s ij } } fraction of i s group s stubs on i total number of edges between r and s Karrer, Newman, Phys. Rev. E (2011)

24 the stochastic block model comparing SBM vs. DC-SBM : Zachary karate club Karrer, Newman, Phys. Rev. E (2011)

25 the stochastic block model comparing SBM vs. DC-SBM : Zachary karate club SBM leader/follower division DC-SBM social group division Karrer, Newman, Phys. Rev. E (2011)

26 the stochastic block model comparing SBM vs. DC-SBM : Zachary karate club SBM leader/follower division DC-SBM social group division Karrer, Newman, Phys. Rev. E (2011)

27 the stochastic block model comparing SBM vs. DC-SBM : Zachary karate club log likelihood SBM leader/follower division DC-SBM social group division Peel, Larremore, Clauset, forthcoming (2016)

28 the stochastic block model comparing SBM vs. DC-SBM : Zachary karate club log likelihood log likelihood SBM leader/follower division Peel, Larremore, Clauset, forthcoming (2016) DC-SBM social group division

29 extending the SBM many variants! we ll cover three: bipartite community structure weighted community structure hierarchical community structure

30 bipartite networks many networks are bipartite scientists and papers (co-authorship networks) actors and movies (co-appearance networks) words and documents (topic modeling) plants and pollinators genes and genomes etc. assortative structure? Larremore et al., Phys. Rev. E 90, (2014)

31 bipartite networks many networks are bipartite scientists and papers (co-authorship networks) actors and movies (co-appearance networks) words and documents (topic modeling) plants and pollinators genes and genomes etc. most analyses focus on one-mode projections which discard information assortative structure? no, bipartite structure Larremore et al., Phys. Rev. E 90, (2014)

32 bipartite networks bipartite stochastic block model (bisbm) exactly the SBM, but model knows network is bipartite if type(z i )=type(z j ) assortative structure? then require M zi,z j =0 inference proceeds as before no, bipartite structure Larremore et al., Phys. Rev. E 90, (2014)

33 bipartite networks SBM can learn bipartite structure on its own but often over fits or returns mixed-type groups oops degree corrected bsbm likelihood score, Eq. (16) SBM K=5 SBM K=5 bisbm K a,k b= 3,2 SBM K=4 bisbm K a,k b= 3,2 bisbm K a,k b= 3,2 Larremore et al., Phys. Rev. E 90, (2014)

34 bipartite networks bipartite stochastic block model (bisbm) how do we know it works well? Larremore et al., Phys. Rev. E 90, (2014)

35 bipartite networks bipartite stochastic block model (bisbm) how do we know it works well? synthetic networks with planted partitions ! planted = B A ! planted = B 0 A 0 easy case: 4x4 easy-to-see communities hard case: 3x2 hard-to-see communities Larremore et al., Phys. Rev. E 90, (2014)

36 bipartite networks bipartite stochastic block model (bisbm) how do we know it works well? synthetic networks with planted partitions ! planted = B A easy case: 4x4 easy-to-see communities hard case: 3x2 hard-to-see communities Larremore et al., Phys. Rev. E 90, (2014)

37 bipartite networks bipartite stochastic block model (bisbm) how do we know it works well? synthetic networks with planted partitions 1.0 A 1.0 B bisbm, degree corrected normalized mutual information bisbm degree corrected SBM weighted projection unweighted projection uncorrected SBM weighted projection unweighted projection normalized mutual information SBM and degree-corrected SBM applied to both weighted and unweighted one-mode projections bisbm, no degree correction λ easy case: 4x4 easy-to-see communities λ hard case: 3x2 hard-to-see communities Larremore et al., Phys. Rev. E 90, (2014)

38 bipartite networks bipartite stochastic block model (bisbm) always find pure-type communities more accurate than modeling one-mode projections (even weighted projections) finds communities in both modes Larremore et al., Phys. Rev. E 90, (2014)

39 bipartite networks example 1: malaria gene network genes substrings genes substrings Larremore et al., Phys. Rev. E 90, (2014)

40 bipartite networks example 2: Southern Women network Events Attended 2 4 People People Events Attended Larremore et al., Phys. Rev. E 90, (2014)

41 bipartite networks example 3: IMDb Actors (53158) 1 Czech, Danish, German 2 French, Dutch, Cantonese, Greek, Filipino, Finnish Movies (39768) Japanese, Russian Adult 6 }English Larremore et al., Phys. Rev. E 90, (2014)

42 bipartite networks other approaches PRL 110, (2013) P H Y S I C A L R E V I E W L E T T E R S week ending 5 APRIL 2013 Parsimonious Module Inference in Large Networks Tiago P. Peixoto* Institut für Theoretische Physik, Universität Bremen, Hochschulring 18, D Bremen, Germany (Received 19 December 2012; published 5 April 2013; publisher error corrected 5 April 2013) minimum description length (MDL) principle learns that a network is bipartite Predicting Human Preferences Using the Block Structure of Complex Social Networks Roger Guimerà 1,2 *, Alejandro Llorente 3, Esteban Moro 4,5,3, Marta Sales-Pardo 2 marginalize over bipartite SBM parameterizations Peixoto, Phys. Rev. Lett. 110, (2013) Guimera et al., PLOS ONE 7(9), e44620 (2012)

43 bipartite networks PRL 110, (2013) P H Y S I C A L R E V I E W L E T T E R S Parsimonious Module Inference in Large Networks Tiago P. Peixoto* Institut für Theoretische Physik, Universität Bremen, Hochschulring 18, D Bremen, Germany (Received 19 December 2012; published 5 April 2013; publisher error corrected 5 April 2013) We investigate the detectability of modules in large networks when the number of modules is not known in advance. We employ the minimum description length principle which seeks to minimize the total amount of information required to describe the network, and avoid overfitting. According to this criterion, we obtain general bounds on the detectability of any prescribed block structure, given the number of nodes and p edges ffiffiffiffi in the sampled network. We also obtain that the maximum number of detectable blocks scales as N, where N is the number of nodes in the network, for a fixed average degree hki. We also show that the simplicity of the minimum description length approach yields an efficient multilevel Monte Carlo inference algorithm with a complexity of OðN36 lognþ, if the number of blocks is unknown, and OðNÞ if it is known, where is the mixing time of the Markov chain. We illustrate the application of the method on a large network of actors and films with over 10 6 edges, and a dissortative, bipartite block structure. 67 week ending 5 APRIL DOI: /PhysRevLett PACS numbers: Hc, Tt, Cf The detection of modules or communities is one of of B blocks, and the number of edges between nodes of the most intensely studied problems in the recent literature blocks r and s is given by the matrix e rs (or twice that of network systems [1,2]. The use of generative285models number if r ¼ s). The degree-corrected variant [14] further 304 for this 251 purpose, such as the stochastic 232blockmodel family imposes that each node i has a degree given by k i, where [3 20], has been gaining increasing attention. 31 This the set fk i g is an additional parameter 310 set of the model. The approach contrasts 297drastically with the majority of other directed version of both models is analogously defined, methods thus far employed214 in the 157 field 12 (such 266 as modularity with e rs becoming asymmetric, and fk i g together with maximization [21]), since not only is it derived from first 321 fk þ 10 i g fixing the in- and out-degrees of the nodes, respectively. These ensembles are characterized principles, but 146 also it is not restricted to purely assortative by their microcanonical entropy S ¼ 177 ln, where is the total number and undirected community structures. However, most inference methods used to 158 obtain the most 127 likely 125 blockmodel 59 of network realizations [29]. The entropy can be computed assume that the number 155 of communities 137 is known in analytically in both cases [30], advance [14,18,22 25]. Unfortunately, in most practical cases this quantity is completely unknown, 120 and one would S t ffi E 1 X ers e 201 like to infer it from the data as well. 140 Here we explore a very 2 rs ln ; (1) rs n r n s efficient way of obtaining this information from the data, for the traditional blockmodel ensemble 257 and, known as the minimum description length principle (MDL) [26,27], which predicates that the best choice of model which fits given data is the one which most compresses 108 it, 106 i.e., minimizes 294 k S c ffi E X N k lnk! 1 X ers e rs ln ; (2) rs e r e s the total amount of 74 information required to describe 183 it. This123approach 109 has been 38 introduced for P the degree corrected variant, where in both cases E ¼ 105 in the task of blockmodel inference in Ref. [28]. Here, we rse rs =2 is the total number of edges, n 281 r is the number of generalize it to accommodate an arbitrarily large number nodes which belong to block r, and N k is the total number of268 communities, and to obtain 225 general bounds on the 241 of nodes with degree k, and e 23 r ¼ P se rs is the number 99 detectability of arbitrary community 260 structures. We also of44 half-edges incident on block r. The directed case show that, according to this criterion, the maximum number of detectable blocks scales as N p ffiffiffiffi is analogous [30] (see Supplemental Material [31] for an , where N is the overview). 64 number of nodes in the network. Since the MDL approach 163 The detection problem consists in obtaining the block results in a simple penalty on the log-likelihood, 41 we 226 use it partition fb i g which is the most likely, when given an to implement 92 an efficient multilevel Monte 259 Carlo algorithm with an overall complexity of OðN lognþ, where node i. This 187 is done by maximizing the log-likelihood unlabeled network G, where b i is the block label of is the91average mixing time of the Markov chain, which lnp that the network G is observed, given the model can be used to infer arbitrary block 70 structures on very large compatible with a chosen block partition. Since we have networks. 231 simply P ¼ 1=, maximizing lnp is equivalent to minimize the entropy S 261 The model. The stochastic blockmodel 62 ensemble is t=c, which is the language we will use composed of graphs with N nodes, each belonging to one henceforth Entropy minimization is well defined, but only =13=110(14)=148701(5) Ó 2013 American Physical Society Peixoto, Phys. Rev. Lett. 110, (2013)

44 weighted networks most interactions are weighted frequency of interaction strength of interaction outcome of interaction etc. but! thresholding discards information and can obscure underlying structure

45 weighted networks Aicher et al., J. Complex Networks 3, (2015).

46 weighted networks Threshold = 1 Threshold = 2 Threshold = 3 Threshold = Aicher et al., J. Complex Networks 3, (2015).

47 weighted networks how will the results depend on the threshold? what impact does noise have, under threshold? Thomas & Blitzstein, arxiv: (2011).

48 recall the most general SBM Pr(A z, ) = Y i,j f A ij R(zi,z j ) A ij : value of adjacency R : partition of adjacencies f : probability function a, : pattern for a-type adjacencies Binomial = simple graphs Poisson = multi-graphs Normal = weighted graphs etc

49 weighted networks weighted stochastic block model (WSBM) model edge existence and edge weight separately edge existence: SBM edge weights: exponential family distribution log-likelihood: ln Pr(G M,z,,f) = ln Pr(G M,z) +(1 )lnpr(g, z, f) edge-existence [binomial distribution] M zi,z j edge-weights [exponential-family distribution] zi,z j Poisson, Normal, Gamma, Exponential, Pareto, etc. Aicher et al., J. Complex Networks 3, (2015).

50 weighted networks weighted stochastic block model (WSBM) model edge existence and edge weight separately edge existence: SBM edge weights: exponential family distribution log-likelihood: ln Pr(G M,z,,f) = ln Pr(G M,z) +(1 )lnpr(g, z, f) mixing parameter =1 only model edge existence (ignore weights) =0 only model edge weights (ignore non-edges) Aicher et al., J. Complex Networks 3, (2015).

51 weighted networks American football, NFL 2009 season 32 teams, 2 divisions, 4 subdivisions edge existence: who plays whom edge weight: mean score difference Aicher et al., J. Complex Networks 3, (2015).

52 weighted networks American football, NFL 2009 season 32 teams, 2 divisions, 4 subdivisions SBM ( =1 ) recovers subdivisions perfectly NFC { { AFC block matrix M,z Aicher et al., J. Complex Networks 3, (2015).

53 { weighted networks American football, NFL 2009 season 32 teams, 2 divisions, 4 subdivisions WSBM ( =0 ) recovers team skill hierarchy top teams { { wild cards (Steelers & Dolphins) middle teams { bottom teams block matrix M,z Aicher et al., J. Complex Networks 3, (2015).

54 weighted networks adding weights to the SBM what does A ij =0mean? no edge or weight=0 edge or non-observed edge? how will we model the distribution of edge weights? edge existences and edge weights may contain different large-scale structure (conference structure vs. skill hierarchy)

55 weighted networks other approaches

56 weighted networks other approaches PHYSICAL REVIEW E 92,042807(2015) Inferring the mesoscale structure of layered, edge-valued, and time-varying networks Tiago P. Peixoto * Institut für Theoretische Physik, Universität Bremen, Hochschulring 18, D Bremen, Germany (Received 18 June 2015; published 9 October 2015) bin the edge weights into C ranges C bins = C layers of a multi-layer SBM common node labels each layer has its own block matrix infer C, M c, z Peixoto, Phys. Rev. E 92, (2015)

57 weighted networks other approaches (discretized weights = SBM layers) OpenFlights airport network edge weights = distance (km) 10 3 Probability density (a) (b) (c) (d) (e) Edge distance (km) Peixoto, Phys. Rev. E 92, (2015)

58 weighted networks other approaches (discretized weights = SBM layers) OpenFlights airport network Peixoto, Phys. Rev. E 92, (2015)

59 fin

Mixed Membership Stochastic Blockmodels

Mixed Membership Stochastic Blockmodels Mixed Membership Stochastic Blockmodels (2008) Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg and Eric P. Xing Herrissa Lamothe Princeton University Herrissa Lamothe (Princeton University) Mixed

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

Estimating the Number of Communities in a Bipartite Network

Estimating the Number of Communities in a Bipartite Network Estimating the Number of Communities in a Bipartite Network Tzu-Chi Yen, @oneofyen http://junipertcy.info Visitor at Santa Fe Institute, and Institute of Theoretical Physics, Chinese Academy of Sciences

More information

Network Analysis and Modeling

Network Analysis and Modeling lecture 0: what are networks and how do we talk about them? 2017 Aaron Clauset 003 052 002 001 Aaron Clauset @aaronclauset Assistant Professor of Computer Science University of Colorado Boulder External

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Learning latent structure in complex networks

Learning latent structure in complex networks Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann

More information

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Author: Jaewon Yang, Jure Leskovec 1 1 Venue: WSDM 2013 Presenter: Yupeng Gu 1 Stanford University 1 Background Community

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models

Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Pablo Robles Granda,

More information

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016 A Random Dot Product Model for Weighted Networks arxiv:1611.02530v1 [stat.ap] 8 Nov 2016 Daryl R. DeFord 1 Daniel N. Rockmore 1,2,3 1 Department of Mathematics, Dartmouth College, Hanover, NH, USA 03755

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing

More information

Machine Learning in Simple Networks. Lars Kai Hansen

Machine Learning in Simple Networks. Lars Kai Hansen Machine Learning in Simple Networs Lars Kai Hansen www.imm.dtu.d/~lh Outline Communities and lin prediction Modularity Modularity as a combinatorial optimization problem Gibbs sampling Detection threshold

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

The Trouble with Community Detection

The Trouble with Community Detection The Trouble with Community Detection Aaron Clauset Santa Fe Institute 7 April 2010 Nonlinear Dynamics of Networks Workshop U. Maryland, College Park Thanks to National Science Foundation REU Program James

More information

Mixed Membership Stochastic Blockmodels

Mixed Membership Stochastic Blockmodels Mixed Membership Stochastic Blockmodels Journal of Machine Learning Research, 2008 by E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P. Xing as interpreted by Ted Westling STAT 572 Intro Talk April 22, 2014

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Large-scale Ordinal Collaborative Filtering

Large-scale Ordinal Collaborative Filtering Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Bayesian nonparametric models for bipartite graphs

Bayesian nonparametric models for bipartite graphs Bayesian nonparametric models for bipartite graphs François Caron Department of Statistics, Oxford Statistics Colloquium, Harvard University November 11, 2013 F. Caron 1 / 27 Bipartite networks Readers/Customers

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are

More information

The non-backtracking operator

The non-backtracking operator The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Data science with multilayer networks: Mathematical foundations and applications

Data science with multilayer networks: Mathematical foundations and applications Data science with multilayer networks: Mathematical foundations and applications CDSE Days University at Buffalo, State University of New York Monday April 9, 2018 Dane Taylor Assistant Professor of Mathematics

More information

A physical model for efficient rankings in networks

A physical model for efficient rankings in networks A physical model for efficient rankings in networks Daniel Larremore Assistant Professor Dept. of Computer Science & BioFrontiers Institute March 5, 2018 CompleNet danlarremore.com @danlarremore The idea

More information

Two-sample hypothesis testing for random dot product graphs

Two-sample hypothesis testing for random dot product graphs Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,

More information

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Hyokun Yun (work with S.V.N. Vishwanathan) Department of Statistics Purdue Machine Learning Seminar November 9, 2011 Overview

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Computational Genomics

Computational Genomics Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Modeling heterogeneity in random graphs

Modeling heterogeneity in random graphs Modeling heterogeneity in random graphs Catherine MATIAS CNRS, Laboratoire Statistique & Génome, Évry (Soon: Laboratoire de Probabilités et Modèles Aléatoires, Paris) http://stat.genopole.cnrs.fr/ cmatias

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5)

Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5) Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5) Tamara Broderick ITT Career Development Assistant Professor Electrical Engineering & Computer Science MIT Bayes Foundations

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Recurrent Latent Variable Networks for Session-Based Recommendation

Recurrent Latent Variable Networks for Session-Based Recommendation Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF) Case Study 4: Collaborative Filtering Review: Probabilistic Matrix Factorization Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 2 th, 214 Emily Fox 214 1 Probabilistic

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

CS Homework 3. October 15, 2009

CS Homework 3. October 15, 2009 CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016 Decision-making, inference, and learning theory ECE 830 & CS 761, Spring 2016 1 / 22 What do we have here? Given measurements or observations of some physical process, we ask the simple question what do

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 21 Cosma Shalizi 3 April 2008 Models of Networks, with Origin Myths Erdős-Rényi Encore Erdős-Rényi with Node Types Watts-Strogatz Small World Graphs Exponential-Family

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

Supporting Statistical Hypothesis Testing Over Graphs

Supporting Statistical Hypothesis Testing Over Graphs Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,

More information

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Uncertainty & Bayesian Networks

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Network Event Data over Time: Prediction and Latent Variable Modeling

Network Event Data over Time: Prediction and Latent Variable Modeling Network Event Data over Time: Prediction and Latent Variable Modeling Padhraic Smyth University of California, Irvine Machine Learning with Graphs Workshop, July 25 th 2010 Acknowledgements PhD students:

More information