CPSC 540: Machine Learning

Size: px
Start display at page:

Download "CPSC 540: Machine Learning"

Transcription

1 CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018

2 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j), wj ), i=1 j=1 which becomes a supervised learning problem for each feature j. Tabular parameterization is common but requires small number of parents. Gaussian belief networks use least squares (defines a multivariate Gaussian). Sigmoid belief networks use logistic regression. For inference in DAGs (decoding, computing marginals, computing conditionals): We can use ancestral sampling to compute Monte Carlo approximations. We can apply message passing, but messages may be huge. Only guarantee O(dk 2 ) cost if each node has at most one parent ( tree or forest ).

3 Conditional Sampling in DAGs What about conditional sampling in DAGs? Could be easy or hard depending on what we condition on. For example, easy if we condition on the first variables in the order: Just fix these and run ancestral sampling. Hard to condition on the last variables in the order: Conditioning on descendent makes ancestors dependent.

4 DAG Structure Learning Structure learning is the problem of choosing the graph. Input is data X. Output is a graph G. The easy case is when we re given the ordering of the variables. So the parents of j must be chosen from {1, 2,..., j 1}. Given the ordering, structure learning reduces to feature selection: Select features {x 1, x 2,..., x j 1 } that best predict label x j. We can use any feature selection method to solve these d problems.

5 Example: Structure Learning in Rain Data Given Ordering Structure learning in rain data using L1-regularized logistic regression. For different λ values, assuming chronological ordering.

6 DAG Structure Learning without an Ordering Without an ordering, a common approach is search and score Define a score for a particular graph structure (like BIC). Search through the space of possible DAGs (greedily add/remove/reverse edges). Another common approach is constraint-based methods: Based on performing a sequence of conditional independence tests. Prune edge between x i and x j if you find variables S making them independent, x i x j x S. Assumes faithfulness (all independences are reflected in graph). Otherwise it s weird (a duplicated feature would be disconnected from everything.) Structure learning is NP-hard in general, but finding the optimal tree is poly-time: For symmetric scores, can be done by minimum spanning tree. For asymetric scores, can be by minimum spanning arborescence.

7 Structure Learning on USPS Digits Optimal tree on USPS digits. 1,1 1,2 2,1 1,3 1,4 1,5 1,6 2,2 2,3 2,4 1,7 2,5 2,6 3,1 3,2 3,3 3,4 2,7 3,5 4,1 4,2 4,3 1,8 3,6 1,9 1,10 1,11 4,4 5,1 5,2 5,3 2,8 3,7 4,5 2,9 2,10 2,11 1,12 6,1 6,2 6,3 3,8 4,7 4,6 5,4 3,9 3,10 1,13 3,11 2,12 7,1 7,2 7,3 4,8 5,5 4,9 4,12 4,13 1,15 4,10 1,16 1,14 2,13 3,12 8,1 8,2 8,3 5,8 5,7 5,6 6,4 5,9 5,10 5,11 5,12 5,13 2,15 2,16 2,14 3,13 4,11 9,1 9,2 9,3 6,8 6,7 6,6 6,5 7,4 6,9 6,10 6,11 6,12 6,13 3,16 3,15 3,14 10,1 10,2 10,3 7,8 7,7 7,5 7,6 8,4 7,9 7,10 7,11 7,12 4,15 4,14 11,1 11,2 11,3 8,8 8,7 8,5 8,6 9,4 8,9 8,10 8,11 4,16 5,15 5,14 12,1 12,2 12,3 9,8 9,7 9,6 9,5 10,4 9,9 9,10 5,16 13,1 10,8 10,7 10,6 10,5 11,4 10,9 6,15 13,2 14,1 11,8 11,6 11,7 11,5 6,16 6,14 7,13 13,3 14,2 15,1 12,7 12,8 13,4 12,5 12,6 12,4 7,16 7,15 7,14 8,13 8,12 14,3 15,2 16,1 13,5 8,16 8,15 8,14 9,12 9,11 15,3 16,2 13,6 14,4 9,16 9,15 14,8 9,14 13,8 10,11 9,13 10,10 12,9 16,3 13,7 14,5 10,16 10,15 14,9 15,8 10,14 13,9 11,10 10,12 10,13 11,9 12,10 14,6 15,4 11,16 11,15 14,10 16,8 11,14 13,10 11,11 11,12 11,13 12,11 15,5 14,7 16,4 12,16 12,15 14,11 12,14 12,13 13,11 12,12 15,9 15,6 15,7 13,14 14,12 13,13 13,12 16,9 13,15 13,16 15,10 16,5 16,6 16,7 14,14 14,13 14,15 15,11 16,10 15,13 15,14 15,12 15,15 14,16 16,11 16,13 16,14 16,12 15,16 16,15 16,16

8 20 Newsgroups Data Data containing presence of 100 words from newsgroups posts: car drive files hockey mac league pc win Structure learning should give relationship between words.

9 Structure Learning on News Words Optimal tree on newsgroups data: ftp phone files number disk format windows drive memory system image card dos driver pc program version win car scsi data problem display graphics video software space team won bmw dealer engine honda mac help server launch moon nasa orbit shuttle technology fans games hockey league players puck season oil lunar mars earth satellite solar mission nhl baseball god hit bible christian jesus religion jews government israel children power president rights state war health human law university world aids food insurance medicine fact gun research science msg water patients studies case course evidence question computer cancer disease doctor vitamin

10 Outline 1 Structure Learning 2

11 Directed vs. Undirected Models In some applications we have a natural ordering of the x j. In the rain data, the past affects the future. In some applications we don t have a natural order. E.g., pixels in an image. In these settings we often use undirected graphical models (UGMs). Also known as Markov random fields (MRFs) and originally from statistical physics.

12 Directed vs. Undirected Models Undirected graphical models are based on undirected graphs: They are a classic way to model dependencies in images: Can capture dependencies between neighbours without imposing an ordering.

13 Ising Models from Statistical Physics The Ising model for binary x i is defined by d p(x 1, x 2,..., x d ) exp x i w i + x i x j w ij, i=1 (i,j) E where E is the set of edges in an undirected graph. Called a log-linear model, because log p(x) is linear plus a constant. Consider using x i { 1, 1}: If w i > 0 it encourages x i = 1. If w ij > 0 it encourages neighbours i and j to have the same value. E.g., neighbouring pixels in the image receive the same label ( attractive model) We re modeling dependencies, but haven t assumed an ordering.

14 Pairwise undirected graphical models (UGMs) assume p(x) has the form d p(x) φ j (x j ) φ ij (x i, x j ). j=1 (i,j) E The φ j and φ ij functions are called potential functions: They can be any non-negative function. Ordering doesn t matter: more natural for things like pixels of an image. Ising model is a special case where φ i (x i ) = exp(x i w i ), φ ij (x i, x j ) = exp(x i x j w ij ). Bonus slides generalize Ising to non-binary case.

15 Label Propagation as a UGM Consider modeling the probability of a vector of labels ȳ R t using n t p(ȳ 1, ȳ 2,..., ȳ t ) exp w ij (y i ȳ i ) 2 1 t t w ij (ȳ i ȳ j ) 2. 2 i=1 j=1 i=1 j=1 Decoding in this model is equivalent to the label propagation problem. This is a pairwise UGM: ( ) n φ j (ȳ j ) = exp w ij (y i ȳ j ) 2, φ ij (ȳ i, ȳ j ) = exp ( 12 ) w ij(ȳ i ȳ j ) 2. i=1

16 Conditional Independence in It s easy to check conditional independence in UGMs: A B C if C blocks all paths from any A to any B. Example: A C. A C B. A C B, E. A, B F C A, B F C, E.

17 Multivariate Gaussian and Pairwise Graphical Models Independence in multivariate Gaussian: In Gaussians, marginal independence is determined by covariance: x i x j Σ ij = 0. But how can we determine conditional independence? Multivarate Gaussian is a special case of a pairwise UGM. So we can just use graph separation. In particular, edges of the UGM are (i, j) values where Θ i,j 0. We use the term Gaussian graphical model (GGM) in this context. Or Gaussian Markov random field (GMRF).

18 Digression: Gaussian Graphical Models Multivariate Gaussian can be written as ( p(x) exp 1 ) 2 (x µ)t Σ 1 (x µ) exp 1 2 xt Σ 1 x + x T Σ 1 µ, }{{} v and writing it in summation notation we can see that it s a pairwise UGM: p(x) exp( 1 d d d x i x j Σ 1 ij + x i v i 2 i=1 j=1 i=1 d d ( = exp 1 ) 2 x ix j Σ 1 d ij exp (x i v i ) }{{} i=1 j=1 }{{} i=1 φ i(x i) φ ij(x i,x j)

19 Independence in GGMs So Gaussians are pairwise UGMs with φ ij (x i, x j ) = exp ( 1 2 x ix j Θ ij ), Where Θ ij is element (i, j) of Σ 1. Consider setting Θ ij = 0: For all (x i, x j ) we have φ(x i, x j ) = 1, which is equivalent to not having edge (i, j). So setting Θ ij = 0 is equivalent to removing φ ij (x i, x j ) from the UGM. Gaussian conditional independence is determined by precision matrix sparsity. Diagonal Θ gives disconnected graph: all variables are independent. Full Θ gives fully-connected graph: there are no independences.

20 Independence in GGMs Consider a Gaussian with the following covariance matrix: Σ = Σ ij 0 so all variables are dependent: x 1 x 2, x 1 x 5, and so on. This would show up in graph: you would be able to reach any x i from any x j. The inverse is given by a tri-diagonal matrix: Σ 1 = So conditional independence is described by a Markov chain: p(x 1 x 2, x 3, x 4, x 5 ) = p(x 1 x 2 ).

21 Graphical Lasso Conditional independence in GGMs is described by sparsity in Θ. Setting a Θ ij to 0 removes an edge from the graph. Recall fitting multivariate Gaussian with L1-regularization, argmin Tr(SΘ) log Θ + λ Θ 1, Θ 0 which is called the graphical Lasso because it encourages a sparse graph. Graphical Lasso is a convex approach to structure learning for GGMs. Examples

22 Higher-Order In UGMs, we can also define potentials on higher-order interactions. A three-variable generalization of Ising potentials is: φ ijk (x i, x j, x k ) = w ijk x i x j x k. If w ijk > 0 and x j {0, 1}, encourages you to set all three to 1. If w ijk > 0 and x j { 1, 1}, encourages odd number of positives. In the general case, a UGM just assumes p(x) factorizes over subsets c, p(x 1, x 2,..., x d ) c C φ c (x c ), from among a collection of subsets of C. In this case, graph has edge (i, j) if i and j are together in at least one c. Conditional independences are still given by graph separation.

23 Tractability of UGMs Without using, we write UGM probability as p(x) = 1 φ c (x c ), Z c C where Z is the constant that makes the probabilites sum up to 1. Z = φ c (x c ) or Z = φ c (x c )dx d dx d 1... dx 1 = 1. x 1 x 2 x 1 x 2 x d c C x d c C Whether you can compute Z depends on the choice of the φ c : Gaussian case: O(d 3 ) in general, but O(d) for forests (no loops). Continuous non-gaussian: usually requires numerical integration. Discrete case: #P-hard in general, but O(dk 2 ) for forests (no loops).

24 Summary Structure learning is the problem of learning the graph structure. Hard in general, but easy for trees and L1-regularization gives fast heuristic. Undirected graphical models factorize probability into non-negative potentials. Gaussians are a special case. Log-linear models (like Ising) are a common choice. Simple conditional independence properties. Next time: our first visit to the wild world of approximate inference.

25 General Pairwise UGM For general discrete x i a generalization of Ising models is p(x 1, x 2,..., x d ) = 1 d Z exp w i,xi + i=1 (i,j) E w i,j,xi,x j which can represent any positive pairwise UGM (meaning p(x) > 0 for all x)., Interpretation of weights for this UGM: If w i,1 > w i,2 then we prefer x i = 1 to x i = 2. If w i,j,1,1 > w i,j,2,2 then we prefer (x i = 1, x j = 1) to (x i = 2, x j = 2). As before, we can use parameter tieing: We could use the same w i,xi for all positions i. Ising model corresponds to a particular parameter tieing of the w i,j,xi,x j.

26 Factor Graphs Factor graphs are a way to visualize UGMs that distinguishes different orders. Use circles for variables, squares to represent dependencies. Factor graph if p(x 1, x 2, x 3 ) φ 12 (x 1, x 2 )φ 13 (x 1, x 2, x 3 )φ 23 (x 2, x 3 ): Factor graph if p(x 1, x 2, x 3 ) φ 123 (x 1, x 2, x 3 ):

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,

More information

Lecture 2: Parameter Learning in Fully Observed Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan

Lecture 2: Parameter Learning in Fully Observed Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan Lecture 2: Parameter Learning in Fully Observed Graphical Models Sam Roweis Monday July 24, 2006 Machine Learning Summer School, Taiwan Likelihood Functions So far we have focused on the (log) probability

More information

The Generative Self-Organizing Map. A Probabilistic Generalization of Kohonen s SOM

The Generative Self-Organizing Map. A Probabilistic Generalization of Kohonen s SOM Submitted to: European Symposium on Artificial Neural Networks 2003 Universiteit van Amsterdam IAS technical report IAS-UVA-02-03 The Generative Self-Organizing Map A Probabilistic Generalization of Kohonen

More information

Self-Organization by Optimizing Free-Energy

Self-Organization by Optimizing Free-Energy Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade. High-Dimensional Graphical Modeling

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mixture Models, Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 2: 1 late day to hand it in now. Assignment 3: Posted,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Semi-Markov/Graph Cuts

Semi-Markov/Graph Cuts Semi-Markov/Graph Cuts Alireza Shafaei University of British Columbia August, 2015 1 / 30 A Quick Review For a general chain-structured UGM we have: n n p(x 1, x 2,..., x n ) φ i (x i ) φ i,i 1 (x i, x

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2019 Last Time: Monte Carlo Methods If we want to approximate expectations of random functions, E[g(x)] = g(x)p(x) or E[g(x)]

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

BAYESIAN MACHINE LEARNING.

BAYESIAN MACHINE LEARNING. BAYESIAN MACHINE LEARNING frederic.pennerath@centralesupelec.fr What is this Bayesian Machine Learning course about? A course emphasizing the few essential theoretical ingredients Probabilistic generative

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Example: multivariate Gaussian Distribution

Example: multivariate Gaussian Distribution School of omputer Science Probabilistic Graphical Models Representation of undirected GM (continued) Eric Xing Lecture 3, September 16, 2009 Reading: KF-chap4 Eric Xing @ MU, 2005-2009 1 Example: multivariate

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Lecture 8: PGM Inference

Lecture 8: PGM Inference 15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3 Marginal and MAP X1 X2 X3 X4

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Monte Carlo Methods If we want to approximate expectations of random functions, E[g(x)] = g(x)p(x) or E[g(x)]

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Directed Graphical Models

Directed Graphical Models Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks Matt Gormley Lecture 24 April 9, 2018 1 Homework 7: HMMs Reminders

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Representation of undirected GM. Kayhan Batmanghelich

Representation of undirected GM. Kayhan Batmanghelich Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

Submodularity in Machine Learning

Submodularity in Machine Learning Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Graphical Models - Part I

Graphical Models - Part I Graphical Models - Part I Oliver Schulte - CMPT 726 Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Outline Probabilistic

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Directed Graphical Models or Bayesian Networks

Directed Graphical Models or Bayesian Networks Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton) Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research) Table 1:

More information

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Message-Passing Algorithms for GMRFs and Non-Linear Optimization Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate

More information

Tightness of LP Relaxations for Almost Balanced Models

Tightness of LP Relaxations for Almost Balanced Models Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Total positivity in Markov structures

Total positivity in Markov structures 1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv:1510.01290) Faculty of Science Total positivity in Markov structures Steffen Lauritzen

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Learning Gaussian Graphical Models with Unknown Group Sparsity

Learning Gaussian Graphical Models with Unknown Group Sparsity Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation

More information

The Ising model and Markov chain Monte Carlo

The Ising model and Markov chain Monte Carlo The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Matrix Notation Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them at end of class, pick them up end of next class. I need

More information

Markov Random Fields for Computer Vision (Part 1)

Markov Random Fields for Computer Vision (Part 1) Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Stat 521A Lecture 18 1

Stat 521A Lecture 18 1 Stat 521A Lecture 18 1 Outline Cts and discrete variables (14.1) Gaussian networks (14.2) Conditional Gaussian networks (14.3) Non-linear Gaussian networks (14.4) Sampling (14.5) 2 Hybrid networks A hybrid

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Bayesian Networks to design optimal experiments. Davide De March

Bayesian Networks to design optimal experiments. Davide De March Bayesian Networks to design optimal experiments Davide De March davidedemarch@gmail.com 1 Outline evolutionary experimental design in high-dimensional space and costly experimentation the microwell mixture

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information