# 6.867 Machine learning, lecture 23 (Jaakkola)

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context of probabilistic inference discussed below (using the model to calculate various probabilities over the variables). The origin of these models is physics (e.g., spin glass) and they retain some of the terminology from the physics literature. The semantics of MRFs is similar but simpler than Bayesian networks. The graph again represents independence properties between the variables but the properties can be read off from the graph through simple graph separation rather than D-separation criterion. So, for example, x 3 x 4 encodes two independence properties. First, is independent of x 4 given and x 3. In other words, if we remove and x 3 from the graph then and x 4 are no longer connected. The second property is that is independent of x 3 given and x 4. Incidentally, we couldn t define a Bayesian network over the same four variables that would explicate both of these properties (you can capture one while failing the other). So, in terms of their ability to explicate independence properties, MRFs and Bayesian networks are not strict subsets of each other. By Hammersley-Clifford theorem we can specify the form that any joint distribution consistent with an undirected graph has to take. A distribution is consistent with the graph

2 2 if it satisfies all the conditional independence properties we can read from the graph. The result is again in terms of how the distribution has to factor. For the above example, we can write the distribution as a product of (non-negative) potential functions over pairs of variables that specify how the variables depend on each other P (,, x 3, x 4 ) = ψ 2 (, )ψ 3 (, x 3 )ψ 24 (, x 4 )ψ 34 (x 3, x 4 ) () Z where Z is a normalization constant (required since the potential functions can be any nonnegative functions). The distribution is therefore globally normalized. More generally, an undirected graph places no constraints on how any fully connected subset of the variables, variables in a clique, depend on each other. In other words, we are free to associate any potential function with such variables. Without loss of generality we can restrict ourselves to maximal cliques, i.e., not consider separately cliques that are subsets of other cliques. In the above example, the maximal cliques where the pairs of connected variables. Now, in general, the Hammersley-Clifford theorem states that the joint distribution has to factor according to (maximal) cliques in the graph: P (,..., x n ) = ψ c (x c ) (2) Z c C where c C is a (maximal) clique in the graph and x c = {x i } i c denotes the set of variables in the clique. The normalization constant Z could be easily absorbed into one of the potential functions but we will write it explicitly here as a reminder that the model has to be normalized globally (is not automatically normalized as Bayesian networks). Figure below provides an example of a graph with three maximal cliques. c c 2 x 3 x 4 x 5 c 3 C = {c, c 2, c 3 }

3 3 Bayesian networks as undirected models We can always turn a Bayesian network into a MRF via moralization, i.e., connecting all the parents of a common child and dropping the directions on the edges. After the transformation we naturally still have the same probability distribution but may no longer capture all the independence properties explicitly in the graph. For example, in x 3 x 3 P (,, x 3 ) = P ( )P ( )P (x 3, ) P (,, x 3 ) = ψ(,, x 3 ) where, clearly, ψ(,, x 3 ) = P ( )P ( )P (x 3, ) so that the underlying distributions are the same (only the representation changed). The undirected graph is fully connected, however, and the marginal independence of and is no longer visible in the graph. In terms of probabilistic inference, i.e., calculating various probabilities, little is typically lost by turning a Bayesian network first into an undirected model. For example, we would often have some evidence pertaining to the variables, something would be known about x 3 and, as a result, and would become dependent. The advantage from the transformation is that the inference algorithms will run uniformly on both types of models. Let s consider one more example of turning Bayesian networks into MRFs. The figure below gives a simple HMM with the associated probability model and the same for the undirected version after moralization. x 3 x 3 y y 2 y 3 y y 2 y 3 P ( )P ( )P (x 3 ) P (y )P (y 2 )P (y 3 x 3 ) Z ψ 2(, )ψ 23 (, x 3 ) φ (, y )φ 2 (, y 2 )φ 3 (x 3, y 3 ) Each conditional probability on the left can be assigned to any potential function that contains the same set of variables. For example, P ( ) could be included in ψ 2 (, ) or in φ (, y ). The objective is merely to maintain the same distribution when we take the

4 4 product (it doesn t matter how we reorder terms in a product). Here s a possible complete setting of the potential functions: Z = (already normalized as we start with a Bayesian network) (3) ψ 2 (, ) = P ( )P ( ) (4) ψ 23 (, x 3 ) = P (x 3 ) (5) φ (, y ) = P (y ) (6) φ (, y 2 ) = P (y 2 ) (7) φ (x 3, y 3 ) = P (y 3 x 3 ) (8) Probabilistic inference Once we have the graph and the associated distribution (either learned from data or given to us), we would like to make use of this distribution. For example, in HMMs discussed above, we could try to compute P (y 3 y, y 2 ), i.e., predict what we expect to see as the next observation having already seen y and y 2. Note that the sequence of observations in an HMM does not satisfy the Markov property, i.e., y 3 is not independent of y given y. This is easy to see from either Bayesian network or the undirected version via the separation criteria. You can also understand it by noting that y may influence the state sequence, and therefore which value x 3 takes provided that y 2 does not fully constraint to take a specific value. We are also often interested in diagnostic probabilities such as P ( y, y 2, y 3 ), the posterior distribution over the states at t = 2 when we have already observed y, y 2, y 3. Or we may be interested in the most likely hidden state sequence and need to evaluate max-probabilities. All of these are probabilistic inference calculations. If we can compute basic conditional probabilities such as P ( y 2 ), we can also evaluate various other quantities. For example, suppose we have an HMM x 3 y y 2 y 3 and we are interesting known which y s we should observe (query) so as to obtain the most information about. To this end we have to define a value of new information. Consider,

5 5 for example, defining Value({P ( y i )} x2 =,...,m) = Entropy( y i ) (9) = P ( y i ) log P ( y i ) (0) In other words, if given a specific observation y i, we can evaluate the value of the resulting conditional distribution P ( y i ) where y i is known. There are many ways to define the value and, for simplicity, we defined it in terms of the uncertainty about. The value is zero if we know perfectly and negative otherwise. Since we cannot know y i prior to querying its value, we will have to evaluate its expected value assuming our HMM is correct: the expected value of information in response to querying a value for y i is given by [ ] P (y i ) P ( y i ) log P ( y i ) () y i where P (y i ) is a marginal probability computed from the same HMM. We could now use the above criterion to find the observation most helpful in determining the value of. Note that all the probabilities we needed were simple marginal and conditional probabilities. We still need to discuss how to evaluate such probabilities efficiently from a given distribution. Belief propagation Belief propagation is a simple message passing algorithm that generalizes the forwardbackward algorithm. It is exact on undirected graphs that are trees (a graph is a tree if any pair of nodes have a unique path connecting them, i.e., has no loops). In case of more general graphs, we can cluster variables together so as to obtain a tree of clusters, and apply the same algorithm again, now on the level of clusters. We will begin by demonstrating how messages are computed in the belief propagation algorithm. Consider the problem of evaluating the marginal probability of x 3 in an undirected HMM

6 6 m x ( ) m x2 x 3 (x 3 ) x 3 m y ( ) m y2 ( ) m y3 x 3 (x 3 ) y y 2 y 3 ψ Z 2(, )ψ 23 (, x 3 ) φ (, y )φ 2 (, y 2 )φ 3 (x 3, y 3 ) We can perform the required marginalizations (summing over the other variables) in order: first y, then, then y 2, and so on. Each of such operations will affect the variables they interact with. This effect is captured in terms of messages that are shown in red in the above figure. For example, m y ( ) is a message, a function of, that summarizes the effect of marginalizing over y. It is computed as m y ( ) = φ (, y ) (2) y Note that in the absence of any evidence about y, φ(, y ) = P (y ) and we would simply get a constant function of as the message. Suppose, instead, that we had already observed the value of y and denote this value as ŷ. We can incorporate this observation (evidence about y ) into the potential function φ (, y ) by redefining it as φ (, y ) = δ(y, ŷ )P (y ) (3) This way the message would be calculated as before but the value of the message as a function of would certainly change: m y ( ) = φ (, y ) = δ(y, ŷ )P (y ) = P (ŷ ) (4) y y After completing the marginalization over y, we turn to. This marginalization results in a message m x ( ) as relates to. In calculating this message, we will have to take into account the message from y so that m x ( ) = m y ( )ψ 2 (, ) (5)

7 7 More generally, in evaluating such messages, we incorporate (take a product of) all the incoming messages except the one coming from the variable we are marginalizing towards. Thus, will send the following message to x 3 m x2 x 3 (x 3 ) = m x ( )m y2 ( )ψ 23 (, x 3 ) (6) Finally, the probability of x 3 is obtained by collecting all the messages into x 3 P (x 3, D) = m x2 x 3 (x 3 )m y3 x 3 (x 3 ) (7) where D refers to any data or observations incorporated into the potential functions (as illustrated above). The distribution we were after is then P (x 3, D) P (x 3 D) = P (x 3, D) x 3 (8) It might seem that to evaluate similar distributions for all the other variables we would have to perform the message passing operations (marginalizations) in a particular order in each case. This is not necessary, actually. We can simply initialize all the messages to all one functions, and carry out the message passing operations asynchronously. For example, we can pick and calculate its message to based on the other available incoming messages (that may be wrong at this time). The messages will converge to the correct ones provided that the undirected model is a tree, and we repeatedly update each message (i.e., send messages from each variable in all directions). The asynchronous message updates will propagate the necessary information across the graph. This exchange of information between the variables is a bit more efficient to carry out synchronously, however. In other words, we designate a root variable and imagine the edges oriented outwards from this variable (this orientation has nothing to do with Bayesian networks). Then collect all the messages toward the root, starting from the leaves. Once the root has all its incoming messages, the direction is switched and we send (or distribute ) the messages outwards from the root, starting with the root. These two passes suffice to get the correct incoming messages for all the variables.

### CS Lecture 4. Markov Random Fields

CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

### Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

### Probabilistic Graphical Models

Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University

### Probabilistic Graphical Models

Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

### Machine Learning Lecture 14

Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

### p L yi z n m x N n xi

y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

### Undirected graphical models

Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

### Probabilistic Graphical Models

2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

### Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

### Lecture 4 October 18th

Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

### Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Machine learning: lecture 20 ommi. Jaakkola MI CAI tommi@csail.mit.edu opics Representation and graphical models examples Bayesian networks examples, specification graphs and independence associated distribution

### Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

(Bayesian Networks) Undirected Graphical Models 2: Use d-separation to read off independencies in a Bayesian network Takes a bit of effort! 1 2 (Markov networks) Use separation to determine independencies

### Probabilistic Graphical Models

Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

### Introduction to Graphical Models

Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

### Conditional Independence and Factorization

Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

### 4 : Exact Inference: Variable Elimination

10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

### Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Message Passing and Junction Tree Algorithms Kayhan Batmanghelich 1 Review 2 Review 3 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11

### Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

### Based on slides by Richard Zemel

CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

### Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:

### Learning Bayesian networks

1 Lecture topics: Learning Bayesian networks from data maximum likelihood, BIC Bayesian, marginal likelihood Learning Bayesian networks There are two problems we have to solve in order to estimate Bayesian

### Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

### Variable Elimination: Algorithm

Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product

### Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

### Lecture 4: Probability Propagation in Singly Connected Bayesian Networks

Lecture 4: Probability Propagation in Singly Connected Bayesian Networks Singly Connected Networks So far we have looked at several cases of probability propagation in networks. We now want to formalise

### 4.1 Notation and probability review

Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

### Bayesian Machine Learning

Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

### Markov properties for undirected graphs

Graphical Models, Lecture 2, Michaelmas Term 2011 October 12, 2011 Formal definition Fundamental properties Random variables X and Y are conditionally independent given the random variable Z if L(X Y,

### Variable Elimination: Algorithm

Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product

### Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

### Probability Propagation in Singly Connected Networks

Lecture 4 Probability Propagation in Singly Connected Networks Intelligent Data Analysis and Probabilistic Inference Lecture 4 Slide 1 Probability Propagation We will now develop a general probability

### Learning in Bayesian Networks

Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

### Reasoning Under Uncertainty: Belief Network Inference

Reasoning Under Uncertainty: Belief Network Inference CPSC 322 Uncertainty 5 Textbook 10.4 Reasoning Under Uncertainty: Belief Network Inference CPSC 322 Uncertainty 5, Slide 1 Lecture Overview 1 Recap

### A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

### Markov properties for undirected graphs

Graphical Models, Lecture 2, Michaelmas Term 2009 October 15, 2009 Formal definition Fundamental properties Random variables X and Y are conditionally independent given the random variable Z if L(X Y,

### CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

### Bayesian Networks: Belief Propagation in Singly Connected Networks

Bayesian Networks: Belief Propagation in Singly Connected Networks Huizhen u janey.yu@cs.helsinki.fi Dept. Computer Science, Univ. of Helsinki Probabilistic Models, Spring, 2010 Huizhen u (U.H.) Bayesian

### Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

N-1 Experiments Suffice to Determine the Causal Relations Among N Variables Frederick Eberhardt Clark Glymour 1 Richard Scheines Carnegie Mellon University Abstract By combining experimental interventions

### A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael

### Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics Steve Cheng, Ph.D. Guest Speaker for EECS 6893 Big Data Analytics Columbia University October 26, 2017 Outline Introduction Probability

### Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Franz Pernkopf, Robert Peharz, Sebastian Tschiatschek Graz University of Technology, Laboratory of Signal Processing and Speech Communication Inffeldgasse

### Lecture 21: Spectral Learning for Graphical Models

10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

### Expectation Propagation in Factor Graphs: A Tutorial

DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

### Probability Propagation

Graphical Models, Lectures 9 and 10, Michaelmas Term 2009 November 13, 2009 Characterizing chordal graphs The following are equivalent for any undirected graph G. (i) G is chordal; (ii) G is decomposable;

### Bayesian networks in Mastermind

Bayesian networks in Mastermind Jiří Vomlel http://www.utia.cas.cz/vomlel/ Laboratory for Intelligent Systems Inst. of Inf. Theory and Automation University of Economics Academy of Sciences Ekonomická

### The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

### Variable Elimination (VE) Barak Sternberg

Variable Elimination (VE) Barak Sternberg Basic Ideas in VE Example 1: Let G be a Chain Bayesian Graph: X 1 X 2 X n 1 X n How would one compute P X n = k? Using the CPDs: P X 2 = x = x Val X1 P X 1 = x

### Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Machine learning: lecture 20 ommi. Jaakkola MI AI tommi@csail.mit.edu Bayesian networks examples, specification graphs and independence associated distribution Outline ommi Jaakkola, MI AI 2 Bayesian networks

### Bayesian network modeling. 1

Bayesian network modeling http://springuniversity.bc3research.org/ 1 Probabilistic vs. deterministic modeling approaches Probabilistic Explanatory power (e.g., r 2 ) Explanation why Based on inductive

### Unsupervised Learning

Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

### ABSTRACT. Inference by Reparameterization using Neural Population Codes. Rajkumar Vasudeva Raju

ABSTRACT Inference by Reparameterization using Neural Population Codes by Rajkumar Vasudeva Raju Behavioral experiments on humans and animals suggest that the brain performs probabilistic inference to

### Latent Dirichlet Allocation

Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10-601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability

### Bayes Networks 6.872/HST.950

Bayes Networks 6.872/HST.950 What Probabilistic Models Should We Use? Full joint distribution Completely expressive Hugely data-hungry Exponential computational complexity Naive Bayes (full conditional

### Applying Bayesian networks in the game of Minesweeper

Applying Bayesian networks in the game of Minesweeper Marta Vomlelová Faculty of Mathematics and Physics Charles University in Prague http://kti.mff.cuni.cz/~marta/ Jiří Vomlel Institute of Information

### Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

### Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

### Learning Bayesian Networks Does Not Have to Be NP-Hard

Learning Bayesian Networks Does Not Have to Be NP-Hard Norbert Dojer Institute of Informatics, Warsaw University, Banacha, 0-097 Warszawa, Poland dojer@mimuw.edu.pl Abstract. We propose an algorithm for

### Lecture 1: Probabilistic Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan

Lecture 1: Probabilistic Graphical Models Sam Roweis Monday July 24, 2006 Machine Learning Summer School, Taiwan Building Intelligent Computers We want intelligent, adaptive, robust behaviour in computers.

### STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Bayesian Networks Representing Joint Probability Distributions 2 n -1 free parameters Reducing Number of Parameters: Conditional Independence

### A Graphical Model for Simultaneous Partitioning and Labeling

A Graphical Model for Simultaneous Partitioning and Labeling Philip J Cowans Cavendish Laboratory, University of Cambridge, Cambridge, CB3 0HE, United Kingdom pjc51@camacuk Martin Szummer Microsoft Research

### A Tutorial on Bayesian Belief Networks

A Tutorial on Bayesian Belief Networks Mark L Krieg Surveillance Systems Division Electronics and Surveillance Research Laboratory DSTO TN 0403 ABSTRACT This tutorial provides an overview of Bayesian belief

### Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

### Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and

### Factor Analysis (10/2/13)

STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

### Bayesian Networks: Representation, Variable Elimination

Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation

### Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

### Cours 7 12th November 2014

Sum Product Algorithm and Hidden Markov Model 2014/2015 Cours 7 12th November 2014 Enseignant: Francis Bach Scribe: Pauline Luc, Mathieu Andreux 7.1 Sum Product Algorithm 7.1.1 Motivations Inference, along

### Introduction To Graphical Models

Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013

### Dual Decomposition for Inference

Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for

### An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application, CleverSet, Inc. STARMAP/DAMARS Conference Page 1 The research described in this presentation has been funded by the U.S.

### Introduction to Bayesian Networks

Introduction to Bayesian Networks Anders Ringgaard Kristensen Slide 1 Outline Causal networks Bayesian Networks Evidence Conditional Independence and d-separation Compilation The moral graph The triangulated

### Bayesian Networks with Imprecise Probabilities: Theory and Application to Classification

Bayesian Networks with Imprecise Probabilities: Theory and Application to Classification G. Corani 1, A. Antonucci 1, and M. Zaffalon 1 IDSIA, Manno, Switzerland {giorgio,alessandro,zaffalon}@idsia.ch

### Probabilistic Reasoning: Graphical Models

Probabilistic Reasoning: Graphical Models Christian Borgelt Intelligent Data Analysis and Graphical Models Research Unit European Center for Soft Computing c/ Gonzalo Gutiérrez Quirós s/n, 33600 Mieres

### CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy

### Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Sampling Methods Oliver Schulte - CMP 419/726 Bishop PRML Ch. 11 Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs A particular tree structure

### A Brief Review of Probability, Bayesian Statistics, and Information Theory

A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system

### Bayesian networks. Chapter Chapter

Bayesian networks Chapter 14.1 3 Chapter 14.1 3 1 Outline Syntax Semantics Parameterized distributions Chapter 14.1 3 2 Bayesian networks A simple, graphical notation for conditional independence assertions

### Sparse Forward-Backward for Fast Training of Conditional Random Fields

Sparse Forward-Backward for Fast Training of Conditional Random Fields Charles Sutton, Chris Pal and Andrew McCallum University of Massachusetts Amherst Dept. Computer Science Amherst, MA 01003 {casutton,

### Factor Graphs and Message Passing Algorithms Part 1: Introduction

Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except

### Reasoning Under Uncertainty: Conditional Probability

Reasoning Under Uncertainty: Conditional Probability CPSC 322 Uncertainty 2 Textbook 6.1 Reasoning Under Uncertainty: Conditional Probability CPSC 322 Uncertainty 2, Slide 1 Lecture Overview 1 Recap 2

### Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time

### Data Compression Techniques

Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques

### VIBES: A Variational Inference Engine for Bayesian Networks

VIBES: A Variational Inference Engine for Bayesian Networks Christopher M. Bishop Microsoft Research Cambridge, CB3 0FB, U.K. research.microsoft.com/ cmbishop David Spiegelhalter MRC Biostatistics Unit

### Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams)

Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams) Ross D. Shachter Engineering-Economic Systems and Operations Research

### Causality in Econometrics (3)

Graphical Causal Models References Causality in Econometrics (3) Alessio Moneta Max Planck Institute of Economics Jena moneta@econ.mpg.de 26 April 2011 GSBC Lecture Friedrich-Schiller-Universität Jena

### Hidden Markov models

Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

### A New Approach for Hybrid Bayesian Networks Using Full Densities

A New Approach for Hybrid Bayesian Networks Using Full Densities Oliver C. Schrempf Institute of Computer Design and Fault Tolerance Universität Karlsruhe e-mail: schrempf@ira.uka.de Uwe D. Hanebeck Institute

### Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

Outline Graphical Models ML 701 nna Goldenberg! ynamic Models! Gaussian Linear Models! Kalman Filter! N! Undirected Models! Unification! Summary HMMs HMM in short! is a ayes Net hidden states! satisfies

### An Embedded Implementation of Bayesian Network Robot Programming Methods

1 An Embedded Implementation of Bayesian Network Robot Programming Methods By Mark A. Post Lecturer, Space Mechatronic Systems Technology Laboratory. Department of Design, Manufacture and Engineering Management.

### Active learning in sequence labeling

Active learning in sequence labeling Tomáš Šabata 11. 5. 2017 Czech Technical University in Prague Faculty of Information technology Department of Theoretical Computer Science Table of contents 1. Introduction

### An Introduction to Probabilistic Graphical Models

An Introduction to Probabilistic Graphical Models Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Reference material D. Koller

### Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

### CPSC 540: Machine Learning

CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

### Introduction to Graphical Models

Introduction to Graphical Models STA 345: Multivariate Analysis Department of Statistical Science Duke University, Durham, NC, USA Robert L. Wolpert 1 Conditional Dependence Two real-valued or vector-valued

### ESTIMATION OF SWITCHING ACTIVITY IN SEQUENTIAL CIRCUITS USING DYNAMIC BAYESIAN NETWORKS KARTHIKEYAN LINGASUBRAMANIAN

ESTIMATION OF SWITCHING ACTIVITY IN SEQUENTIAL CIRCUITS USING DYNAMIC BAYESIAN NETWORKS by KARTHIKEYAN LINGASUBRAMANIAN A thesis submitted in partial fulfillment of the requirements for the degree of Master

### Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

### Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Creighton Heaukulani and Zoubin Ghahramani University of Cambridge TU Denmark, June 2013 1 A Network Dynamic network data