CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs

Size: px
Start display at page:

Download "CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs"

Transcription

1 CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy David Barber, Bayesian Reasoning and Machine Learning

2 A Prototypical Course Project Ø Find a problem where learning with graphical models seems promising. Typically this needs to be an area where data is easily accessible. Ø Identify a baseline approach someone has used to analyze related data. Could involve graphical models, or could involve alternative ML methods. Ø Propose, derive, implement, and validate a new approach requiring: A different graphical model structure than previous work, and/or different inference/learning algorithms than those tested in previous work. Ø Validate your graphical model via careful experiments. Not only performance metrics for your application, but also confirmation of learning algorithm correctness, and exploration of learned model properties. Ø Interested in a different project style, perhaps more theoretical? Talk to me.

3 Course Project Timeline Details online: Ø Oct. 26: Identify project teams, send brief description (2-3 paragraphs). Team size? 3 students preferred, 2 is okay, 1 requires special permission. Ø Nov. 9: Project proposal due (2-3 pages). Requires an in-depth plan and preliminary results, not just problem area. Ø Dec. 13: Project presentations (~10 minutes per team). All team members must contribute. Ø Dec. 20: Project reports due (6-10 pages). Although your results need not be sufficiently novel for publication, the presentation and experimental protocols should be of high quality.

4 CS242: Lecture 4B Outline Ø Learning Tree-Structured Graphical Models Ø Learning Directed Graphical Models

5 A Little Information Theory The entropy is a natural measure of the inherent uncertainty (difficulty of compression) of some random variable: H(p) = p(x) log p(x) 0 x The relative entropy or Kullback-Leibler (KL) divergence is a non-negative, but asymmetric, distance between a given pair of probability distributions: D(p q) = p(x) log p(x) 0 q(x) x2 The KL divergence equals zero if and only if p(x) =q(x) for all x. The mutual information measures dependence between a pair of random variables: I(p xy )=D(p xy p x p y )= x2 y2y p(x, y) log p(x, y) p(x)p(y) Zero if and only if variables and Y are independent. 0

6 Factorizations for Pairwise Markov Trees x 3 x 2 x 1 x 4 Pick any node as root, write directed factorization out to leaves: p(x) =p(x s ) Y p(x t x (t) )=p(x s ) Y p(x t,x (t) ) = p(x s ) Y p(x t,x (t) ) p(x t6=s t6=s (t) ) p(x t )p(x t6=s (t) ) p(x t) Grouping terms gives canonical undirected factorizations: p(x) = Y Y p(x s,x t ) p(x s ) p(x s )p(x t ) = Y Y p(x s ) 1 d s p(x s,x t ) s2v s2v d s = number of neighbors for node s x 5

7 x 2 Factorizations for Pairwise Markov Trees x 1 x 3 x 5 x 4 Pairwise MRFs defined by non-negative potentials: p(x) = 1 Y Y s(x s ) st(x s,x t ) Z s2v A canonical parameterization via marginals: p(x) = Y Y q st (x s,x t ) q s (x s ) q s (x s )q t (x t ) s2v Suppose we choose a self-consistent set of non-negative discrete marginals: x s q s (x s )=1, x t q st (x s,x t )=q s (x s ) for all x s. The canonical parameterization then exactly matches these marginals: q s (a) = x x s =a p(x) q st (a, b) = x x s =a,x t =b p(x)

8 x 2 Learning with a Fixed Tree Structure x 1 x 3 x 5 x 4 A canonical parameterization via valid marginals: p(x) = Y Y q st (x s,x t ) q s (x s ) q s (x s )q t (x t ) s2v Suppose we have L observations ˆp s (a) = 1 L L x t q st (x s,x t )=q s (x s ) for all x s. {x (`) } L a(x (`) s ) ˆp st (a, b) = 1 L s2v with empirical distribution: L a(x (`) s ) b (x (`) t ) The (normalized) maximum likelihood parameter estimation objective is: 1 L log p(x (`) )= 1 L apple log q s (x (`) s )+ log q st(x (`) s,x (`) t ) ) L L q s (x (`) s )q t (x (`) t

9 x 2 Learning with a Fixed Tree Structure x 1 x 3 x 4 A canonical parameterization via valid marginals: p(x) = Y Y q st (x s,x t ) q s (x s ) q s (x s )q t (x t ) s2v q st (x s,x t )=q s (x s ) for all x s. x 5 x t This is an exponential family distribution with statistics (features): s;a(x) = a (x s ) for all s 2V,a=1,...,K s. st;ab(x) = a (x s ) b (x t ) for all (s, t) 2E,a=1,...,K s,b=1,...,k t. Maximum likelihood estimates match expected values of the statistics, and thus exactly match the empirical marginal distributions: q s (x s )=ˆp s (x s ) for all s 2V. q st (x s,x t )=ˆp st (x s,x t ) for all (s, t) 2E.

10 x 2 Learning with a Fixed Tree Structure x 1 x 3 x 5 x 4 p(x) = Y s2v q s (x s ) q s (a) =ˆp s (a) = 1 L q st (a, b) =ˆp st (a, b) = 1 L Y P L P L q st (x s,x t ) q s (x s )q t (x t ) a(x (`) s ) a(x (`) s ) b (x (`) t ) Optimum ML model matches empirical marginals for all single variables, and empirical pairwise marginals for all neighboring variables For these optimal parameters, the corresponding training log-likelihood is: 1 L L log p(x (`) )= s2v Ĥ s + Ĥ s = x s ˆp s (x s ) log ˆp s (x s ) Empirical Entropy & Empirical Mutual Info Î st Î st = x s,x t ˆp st (x s,x t ) log ˆp st(x s,x t ) ˆp s (x s )ˆp t (x t )

11 x 2 Learning with a Fixed Tree Structure x 1 x 3 x 5 x 4 p(x) = Y s2v q s (x s ) q s (a) =ˆp s (a) = 1 L q st (a, b) =ˆp st (a, b) = 1 L Y P L P L q st (x s,x t ) q s (x s )q t (x t ) a(x (`) s ) a(x (`) s ) b (x (`) t ) Optimum ML model matches empirical marginals for all single variables, and empirical pairwise marginals for all neighboring variables For these optimal parameters, the corresponding training log-likelihood is: 1 L L log p(x (`) )= s2v Ĥ s + Î st Data that is more random is harder to predict. Empirical Entropy & Empirical Mutual Info Likelihood increases when link strongly dependent variables.

12 Optimizing Tree Structures p(x) = Y s2v q s (x s ) Y q st (x s,x t ) q s (x s )q t (x t ) Picking the max-likelihood marginals for any tree: 1 L log p(x (`) )= Ĥ s + Î st L Need to maximize the following objective over acyclic edge patterns: f(e) = Î st s2v Ĥ s = x s ˆp s (x s ) log ˆp s (x s ) Î st = x s,x t ˆp st (x s,x t ) log ˆp st(x s,x t ) ˆp s (x s )ˆp t (x t ) Weight each pair of nodes by its empirical mutual information, and find the tree which maximizes the sum of edge weights! Example greedy algorithm (Kruskal): Ø Initialize with highest-weight edge (most-dependent pair of variables). Ø Iteratively add the highest-weight edge that is not yet included in the tree, and does not introduce any cycles. Stop when all variables connected.

13 Example: Word Usage in Newsgroups ML tree structure ftp phone files number disk format windows drive memory system image card dos driver pc program version win car scsi data problem display graphics video software space team won bmw dealer engine honda mac help server launch moon nasa orbit shuttle technology fans games hockey league players puck season oil lunar mars earth satellite solar mission nhl baseball god hit bible christian jesus religion jews government israel children power president rights state war health human law university world Schmidt 2010 PhD Thesis: Presence of 100 words across 16,242 postings to 20 newsgroups. aids food insurance medicine msg water patients studies case cancer disease doctor vitamin fact course gun research evidence question science computer

14 Example: Word Usage in Newsgroups L 1 regularization ion ( = 256, isolated nodes are not plotted). case children bible course christian computer evidence health insurance disk display card fact earth files graphics government god dos format help data image video gun human car president israel jesus drive memory number power law engine dealer jews baseball ftp mac scsi problem rights war religion games fans pc program phone nasa state question hockey software research shuttle league nhl launch moon science orbit players space university world season Schmidt 2010 PhD Thesis: Presence of 100 words across version 16,242 postings to 20 newsgroups. windows system technology driver win won team

15 CS242: Lecture 4B Outline Ø Learning Tree-Structured Graphical Models Ø Learning Directed Graphical Models 1,1 1,2 1,N 1 2,1 2,2 2,N 2 3,1 3,2 3,N 3

16 Learning Directed Graphical Models p(x) = Y i2v p(x i x (i), i ) Intuition: Must learn a good predictive model of each node, given its parent nodes 4 Directed factorization allows likelihood to locally decompose: p(x ) =p(x 1 1 )p(x 2 x 1, 2 )p(x 3 x 1, 3 )p(x 4 x 2,x 3, 4 ) log p(x ) = log p(x 1 1 ) + log p(x 2 x 1, 2 ) + log p(x 3 x 1, 3 ) + log p(x 4 x 2,x 3, 4 ) We often assume a similarly factorized (meta-independent) prior: p( ) =p( 1 )p( 2 )p( 3 )p( 4 ) log p( ) = log p( 1 ) + log p( 2 ) + log p( 3 ) + log p( 4 ) We thus have independent ML or Bayesian learning problems at each node

17 Learning with Complete Observations Directed (a) (a) Graphical Model 1,1 1,2 1,1 2,1 3,1 1,2 2,1 2,2 3,1 3,2 2,2 3,2 1,N 1,N 2,N 2,N 3,N 1 2 3,N L Independent, Identically (b) (b) Distributed Training Examples 3 (a) Plate Notation Directed graph encodes statistical structure of single training examples: L Y N Y (`) (`) p(x ) = p(xi x (i), i ) i=1 Given L independent, identically distributed, completely observed samples: " # L N N L (`) (`) (`) (`) log p(x ) = log p(xi x (i), i ) = log p(xi x (i), i ) i=1 i=1 L

18 Prior Distributions and Tied Parameters log p(x ) = L N i=1 log p(x (`) i log p( ) = P N i=1 log p( i) x (`) (i), i)= " N L A meta-independent factorized prior Factorized posterior allows independent MAP learning for each node: log p( x) =C + " N log p( i )+ i=1 i=1 L log p(x (`) i x (`) (i), i) log p(x (`) i x (`) (i), i) Learning easy when groups of nodes are tied to use identical, shared parameters: log p(x ) = " N L i=1 log p(x (`) i x (`) (i), b i ) log p( b x) =C + log p( b )+ i b i =b L # b i log p(x (`) i x (`) (i), b) # # type or class for variable node i

19 Example: Tied Temporal Parameters z 0 z 0 z 0 z 1 z 2 z 3 z 4 z 5 x 1 x 2 x 3 x 4 x 5 z 1 z 2 z 3 x 1 x 2 x 3 z 1 z 2 z 3 z 4 x 1 x 2 x 3 x 4 LY YT` p(x, z ) = t=1 p(z (`) t z (`) t 1, time)p(x (`) t Sequence 1 Sequence 2 Sequence 3 z (`) t, obs )

20 Learning (Discrete) Directed Graphical Models log p( x) =C + " N log p( i )+ i=1 For nodes with no parents, parameters define some categorical distribution Ø MAP or ML learning of standard Bernoulli/categorical parameters More generally, there are multiple categorical distributions per node, one for every combination of parent variables Ø Learning objective decomposes into multiple terms, one for subset of training data with each parent configuration Ø Apply independent MAP or ML learning to each Concerns for nodes with many parents: Ø Computation: Large number of parameters to estimate Ø Sparsity: May have little (or even no) data for some configurations of the parent variables Ø Priors can help, but may still be inadequate L log p(x (`) i x (`) (i), i) #

21 Generalized Linear Models General framework for modeling non-gaussian data with linear prediction, using exponential families: Ø Choose features of both inputs y and outputs x: (x i ) 2 R M (y i ) 2 R D Ø Construct context-specific vector of parameters: i = W (y i ) Ø Observation comes from exponential family: W 2 R M D p(x i y i,w)=exp T i (x i ) ( i ) Special cases: linear regression (Gaussian) & logistic regression (discrete) Application to directed graphical models: y i = x (i) Learn to predict each node based on features of parents.

22 Logistic Regression & Binary Graphical Models p(x i x (i),w)=ber(x i (w T (x (i) ))) Suppose that node i has d binary parent nodes An arbitrary conditional distribution can be written using 2 d binary features: Indicators for all joint configurations of parents Alternatively, we can use d+1 binary features to independently model the relationship between node i and each of its d parents: Or, we can pick feature sets in between these extremes, for example O(d 2 ) binary features to capture pairwise relationships among parents ( ) = 1 1+e = e 1+e (x (i) )=[1,x i1,x i2,...,x id ] for ij 2 (i)

23 Example: Medical Diagnosis Learning independent distributions for every combination of diseases may be computationally intractable and lead to poor generalization Instead assume restricted parameterizations, in which child distributions depend on some features of parents. Use logistic regression for binary symptoms, or more generally some appropriate generalized linear model

24 EM for Learning with Partial Observations Ø Initialization: Randomly select starting parameters Ø E-Step: Given parameters, infer posterior of hidden data q (t) (z) =p(z x, (t 1) ) (0) z 1 z 2 z 3 x 1 x 2 x 3 Find posterior distribution of factor/clique linking each variable to its parents, as in EM for undirected models. Ø M-Step: Given posterior distributions, learn parameters that fit data (t) = arg max log p( )+ z q (t) (z)lnp(x, z ) Independently for every node in the graphical model, use expected statistics from E-step to estimate parameters. Ø Iteration: Alternate E-step & M-step until convergence z 0 L

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),

More information

The Generative Self-Organizing Map. A Probabilistic Generalization of Kohonen s SOM

The Generative Self-Organizing Map. A Probabilistic Generalization of Kohonen s SOM Submitted to: European Symposium on Artificial Neural Networks 2003 Universiteit van Amsterdam IAS technical report IAS-UVA-02-03 The Generative Self-Organizing Map A Probabilistic Generalization of Kohonen

More information

Self-Organization by Optimizing Free-Energy

Self-Organization by Optimizing Free-Energy Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Lecture 2: Parameter Learning in Fully Observed Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan

Lecture 2: Parameter Learning in Fully Observed Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan Lecture 2: Parameter Learning in Fully Observed Graphical Models Sam Roweis Monday July 24, 2006 Machine Learning Summer School, Taiwan Likelihood Functions So far we have focused on the (log) probability

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade. High-Dimensional Graphical Modeling

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

4.1 Notation and probability review

4.1 Notation and probability review Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Uncertainty & Bayesian Networks

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Bayesian Networks. Motivation

Bayesian Networks. Motivation Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence. Unit # 11 Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

BAYESIAN MACHINE LEARNING.

BAYESIAN MACHINE LEARNING. BAYESIAN MACHINE LEARNING frederic.pennerath@centralesupelec.fr What is this Bayesian Machine Learning course about? A course emphasizing the few essential theoretical ingredients Probabilistic generative

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Conditional Independence and Factorization

Conditional Independence and Factorization Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004 A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael

More information

Sum-Product Networks: A New Deep Architecture

Sum-Product Networks: A New Deep Architecture Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network

More information

Tufts COMP 135: Introduction to Machine Learning

Tufts COMP 135: Introduction to Machine Learning Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

CS Lecture 3. More Bayesian Networks

CS Lecture 3. More Bayesian Networks CS 6347 Lecture 3 More Bayesian Networks Recap Last time: Complexity challenges Representing distributions Computing probabilities/doing inference Introduction to Bayesian networks Today: D-separation,

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

Directed Graphical Models or Bayesian Networks

Directed Graphical Models or Bayesian Networks Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Another TA: Hongchao Zhou Please fill out the questionnaire about recitations Homework 1 out.

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Representation of undirected GM. Kayhan Batmanghelich

Representation of undirected GM. Kayhan Batmanghelich Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF

More information

Introduction to continuous and hybrid. Bayesian networks

Introduction to continuous and hybrid. Bayesian networks Introduction to continuous and hybrid Bayesian networks Joanna Ficek Supervisor: Paul Fink, M.Sc. Department of Statistics LMU January 16, 2016 Outline Introduction Gaussians Hybrid BNs Continuous children

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks

More information

DATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering

DATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering DATA MINING LECTURE 9 Minimum Description Length Information Theory Co-Clustering MINIMUM DESCRIPTION LENGTH Occam s razor Most data mining tasks can be described as creating a model for the data E.g.,

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information