Intelligent Systems:
|
|
- Vanessa Carroll
- 5 years ago
- Views:
Transcription
1 Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM
2 Roadmap for next two lectures Definition and Visualization of Factor Graphs Converting Directed Graphical Models to Factor Graphs Probabilistic Programing Queries and making decisions Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut) Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 2
3 Reminder: Structured Models - when to use what representation? Directed graphical model: The unknown variables have different meanings Example: MaryCalls, JohnCalls(J), AlarmOn(A), burglarinhouse(b) Factor Graphs: the unknown variables have all the same meaning Examples: Pixel in an image, nuclei in C-Elegans (worm) Undirected graphical model are used, instead of Factor graphs, when we are interested in studying conditional independency (not relevant for our context) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 3
4 Reminder: Machine Learning: Structured versus Unstructured Models Structured Output Prediction: m n f/p: Z X (for example: X = R or X = N ) Important: the elements in X do not make independent decisions Definition (not formal) The Output consists of several parts, and not only the parts themselves contain information, but also the way in which the parts belong together. Example: Image Labelling (Computer Vision) n n K has a fixed vocabulary, e.g. K={Wall, Picture, Person, Clutter, } Input: Image (Z m ) Output: Labeling (K m ) Important: The labeling of neighboring pixels is highly correlated 4
5 Reminder: Machine Learning: Structured versus Unstructured Models Structured Output Prediction: m n f/p: Z X (for example: X = R or X = N ) Important: the elements in X do not make independent decisions Definition (not formal) The Output consists of several parts, and not only the parts themselves contain information, but also the way in which the parts belong together. Example: Text Processing n n The boy went home Input: Text (Z m ) Output: X m (Parse tree of a sentence) 5
6 Factor Graph model - Example A Factor Graph defines a distribution as: p x = 1 f F F ψ F (x N F ) where f = x F F ψ F(x N F ) f: Partition function so that distribution is normalized F: Factor F: Set of all factors N(F): Neighbourhood of a factor ψ F : function (not distribution) depending on x N(F) (ψ C : K C R where x i K) Example p x 1, x 2 = 1 f ψ 1 x 1, x 2 ψ 2 x 2 x i 0,1 ; K = 2 x N 1 = {x 1, x 2 } ψ 1 0,0 = 1; ψ 1 0,1 = 0;ψ 1 1,0 = 1;ψ 1 1,1 = 2 x N 2 = {x 2 } ψ 2 0 = 1; ψ 2 1 = 0; f = = 2 Check yourself that: x p x = 1 6
7 Factor Graph model - Visualization For visualization: utilize an undirected Graph G = (V, F, E), where V, F are the set of nodes and E the set of Edges Example p x 1, x 2 = 1 f ψ 1 x 1, x 2 ψ 2 x 2 x i 0,1 ; K = 2 x N 1 = {x 1, x 2 } ψ 1 0,0 = 1; ψ 1 0,1 = 0;ψ 1 1,0 = 1;ψ 1 1,1 = 2 x N 2 = {x 2 } ψ 2 0 = 1; ψ 2 1 = 0; f = = 2 Check yourself that: x p x = 1 visualizes a variable node Visualization of: ψ x 1 x 2 2 ψ 1 visualizes a factor node means that these variables are in one factor p x 1, x 2 = 1 f ψ 1 x 1, x 2 ψ 2 x 2 7
8 Factor Graph model - Visualization Example: p x 1, x 2, x 3, x 4, x 5 = 1 f ψ x 1, x 2, x 4 ψ x 2, x 3 ψ x 3, x 4 ψ x 4, x 5 ψ x 4 ψ s are specified in some way x 1 x 2 x 4 x 5 Visualization x 3 visualizes a variable node visualizes a factor node means that these variables are in one factor 8
9 Probabilities and Energies p x = 1 f F F ψ F(x N F ) = 1 f F F exp{ θ F(x N F )} = 1 f exp{ F F θ F x N F } = 1 f exp{ E(x) } The energy E x is just a sum of factors: E x = F F θ F x N F The most likely solution x is reached by minimizing the energy: x = argmax x p x x = argmin x E(x) Note: 1) If x is minimizer of f(x) then also of log f(x) (note that x 1 x 2 means that log x 1 log x 2 ) 2) It is: log P x = log f E x = constant E(x) 9
10 Names The Probability distribution: p x = 1 f exp{ E(x)} with energy: E x = F F θ F x N F is a so-called Gibbs distribution and f = x exp{ E(x)} We define the order of a Factor Graph as the arity (number of variables) of the largest factor. Example of an order 3 model: E x = θ x 1, x 2, x 4 + θ x 2, x 3 + θ x 3, x 4 + θ x 5, x 4 + θ(x 4 ) arity 3 arity 2 arity 1 A different name for factor graph / undirected graphical model is Markov Random Field. This is an extension of Markov Chains to Fields. The name Markov stands for the Markov property that means essentially that the order of a factor is small. 10
11 Examples: Order 4-connected; pairwise MRF E(x) = θ ij (x i, x j ) i, j Є N 4 higher(8)-connected; pairwise MRF E(x) = θ ij (x i, x j ) Higher-order RF E(x) = θ ij (x i, x j ) i, j Є N 8 i, j Є N 4 +θ(x 1,, x n ) Order 2 Order 2 Order n Pairwise energy higher-order energy 11
12 Roadmap for next two lectures Definition and Visualization of Factor Graphs Converting Directed Graphical Models to Factor Graphs Probabilistic Programing Queries and making decisions Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut) Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 12
13 Converting Directed Graphical Model to Factor Graph A simple case: x 3 x 2 x 1 p x 1, x 2, x 3 = p x 3 x 2 p x 2 x 1 p(x 1 ) x 3 x 2 x 1 p x 1, x 2, x 3 = 1 f ψ x 3, x 2 ψ(x 2, x 1 ) ψ(x 1 ) where: f = 1 ψ x 3, x 2 = p x 3 x 2 ψ x 2, x 1 = p x 2 x 1 ψ x 1 = p(x 1 ) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 13
14 Converting Directed Graphical Model to Factor Graph A more complex case: x 4 x 4 x 3 x 2 x 3 x 2 x 1 p x 1, x 2, x 3, x 4 = p x 1 x 2, x 3 p x 2 p x 3 x 4 p(x 4 ) p x 1, x 2, x 3, x 4 = x 1 1 f (ψ x 1, x 2, x 3 ψ x 2 ψ x 3, x 4 ψ x 4 ) where: f = 1 ψ x 1, x 2, x 3 = p(x 1 x 2, x 3 ) ψ x 2 = p(x 2 ) ψ x 3, x 4 = p(x 3 x 4 ) ψ x 4 = p(x 4 ) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 14
15 Converting Directed Graphical Model to Factor Graph Take each conditional probability and convert it to a factor (without conditioning), i.e. replace conditional symbols with commas. Set normalization constant f = 1 Visualization: all parents to a certain node form a new factor (this step is called moralization) Comment the other direction is more complicated since factors ψ s have to be converted correctly to individual probabilities, such that overall joint distribution stays the same. Our example: Directed GM: p x 1, x 2, x 3, x 4 = p x 1 x 2, x 3 p x 2 p x 3 x 4 p(x 4 ) Factor Graph: p x 1, x 2, x 3, x 4 = 1 (ψ x f 1, x 2, x 3 ψ x 2 ψ x 3, x 4 ψ x 4 ) where: f = 1 ψ x 1, x 2, x 3 = p(x 1 x 2, x 3 ) ψ x 2 = p(x 2 ) ψ x 3, x 4 = p(x 3 x 4 ) ψ x 4 = p(x 4 ) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 15
16 Roadmap for next two lectures Definition and Visualization of Factor Graphs Converting Directed Graphical Models to Factor Graphs Probabilistic Programing Queries and making decisions Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut) Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 16
17 Probabilistic programming Languages A programming language for machine learning tasks. In particular for modelling, learning and making predictions in directed graphical models (DGM), undirected graphical models (UGM), and factor graphs (FG) Comment DGM and UGM are converted to Factor graphs. All operations are run on factor graphs. Basic Idea is to associate with each variables a distribution: See Bool coin = 0; % Normally C++: Bool coin = Bernoulli(0.5); % Bernoulli is a distribution with 2 states. Probabilistic Program. 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 17
18 An Example: Two coins Example: You draw two fair coins. What is the chance that both a head? Random variables: coin1 (x 1 ) and coin2 (x 2 ), event z about the state of both variables We know: coin1 (x 1 ) and coin2 (x 2 ) are independent Each coin has equal probability to be head (1) or tail (0) New random variable z which is true if and only if both coins are head: z = x 1 & x 2 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 18
19 An Example: Two coins x 1 x 2 x i 0,1 p x i = 1 = p x i = 0 = 0.5 z P(z = 1 x 1, x 2 ) P(z = 0 x 1, x 2 ) Value x 1 Value x Joint: p x 1, x 2, z = p z x 1, x 2 p x 1 p(x 2 ) Compute Marginal: p z = x 1,x 2 p z, x 1, x 2 = x 1,x 2 p z x 1, x 2 p x 1 p(x 2 ) p z = 1 = = 0.25 p z = 0 = = /01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 19
20 An Example: Two coins - Infer.net Program: Run it: Add evidence to the program: 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 20
21 Roadmap for next two lectures Definition and Visualization of Factor Graphs Converting Directed Graphical Models to Factor Graphs Probabilistic Programing Queries and making decisions Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut) Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 21
22 What to infer? Same as in directed graphical models: MAP inference (Maximum a posterior state): x = argmax x p x = argmin x E x Probabilistic Inference, so-called marginal: p x i = k = x x i =k p(x 1, x i = k,, x n ) This can be used to make a maximum marginal decision: x i = argmax xi p x i 22
23 MAP versus Marginal - visually Input Image Ground Truth Labeling MAP solution x (each pixel has 0,1 label) Marginal p x i (each pixel has a probability between 0 and 1) 23
24 MAP versus Marginal Making Decisions p(x z) Input image z Space of all solutions x (sorted by pixel difference) Which solution x would you choose? 24
25 Reminder: How to make a decision Assume model p x z is known Question: What solution x should we give out? Answer: Choose x which minimizes the Bayesian risk: x = argmin x x p x z C(x, x ) C x 1, x 2 is called the loss function (or cost function) of comparing to results x 1, x 2 25
26 MAP versus Marginal Making Decisions p(x z) Input image z Space of all solutions x (sorted by pixel difference) Which one is the MAP solution? MAP solution (red) takes globally optimal solution x = argmax x p x z = argmin x E x, z 26
27 Reminder: The Cost Function behind MAP The cost function for MAP: C x, x = 0 if x = x, otherwise 1 x = argmin x p x z C(x, x ) x = argmin x 1 p x = x z = argmax x p(x z) The MAP estimation optimizes a global 0-1 loss 27
28 The Cost Function behind Max Marginal Probabilistic Inference give marginal. We can take the max-marginal solution: x i = argmax xi p x i (where p x i = k = x xi =k p(x 1, x i = k,, x n ) This represents the decision with minimum Bayesian Risk: x = argmin x where C x, x = i x i x i 2 x p x z C(x, x ) (proof not done) For x i {0,1} this counts the number of differently labeled pixels Example: x 1 x 2 C x 1, x 2 = 10 C x 2, x 3 = 10 x 3 C x1, x 3 = 20 (numbers guessed) 28
29 MAP versus Marginal Making Decisions p = 0.1 (all numbers are arbitrary chosen) p = 0.11 p = 0.1 p = 0.2 p(x z) C=1 C=1 C=100 Which one is the Max-Marginal solution? Input image z Space of all solutions x (sorted by pixel difference) x is red then Risk is (sum only over 4 solutions): *0.2 = 20.2 x is blue then Risk is (sum only over 4 solutions): = 31 Hence red is the max-marginal solution 29
30 This lecture: MAP Inference in order 2 models p x = 1 f exp{ E(x) } Gibbs distribution E x = θ i x i + θ ij x i, x j + θ i,j,k x i, x j, x k i i,j i,j,k Unary terms Pairwise terms Higher-order terms + We only look at energies with unary and pairwise factors MAP inference: x = argmax x p x = argmin x E x Label space: binary x i 0,1 or multi-label x i 0,, K 30
31 Roadmap for next two lectures Definition and Visualization of Factor Graphs Converting Directed Graphical Models to Factor Graphs Probabilistic Programing Queries and making decisions Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut) Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 31
32 Image Segmentation Input Image Desired binary output labeling with user brush strokes (blue-background; red-foreground) We will use the following energy: E x = i θ i x i + Unary term Binary Label: x i {0,1} i,j N 4 θ ij x i, x j N 4 is the set of all neighboring pixels Pairwise term x j x i θ ij (x i, x j ) θ i (x i ) 32
33 Image Segmentation: Energy Goal: formulate E(x) such that E x = 0.01 E x = 0.05 E x = 0.05 E x = 10 (numbers may not represent real numbers) MAP solution: x = argmin x E(x) 15/01/2015 Computer Vision I: Introduction 33
34 Green Unary term Green Gaussian Mixture Model fit Red user labelled pixels (cross foreground; dot background) Red Foreground model is blue Background model is red 15/01/2015 Computer Vision I: Introduction 34
35 Unary term New query image z i θ i x i = 0 = log P red (z i x i = 0) θ i x i = 1 = log P blue (z i x i = 1) θ i (x i = 0) θ i (x i = 1) Dark means likely background Dark means likely foreground Optimum with unary terms only x = argmin x E(x) E x = 15/01/2015 Computer Vision I: Introduction 35 i θ i x i
36 Pairwise term We choose a so-called Ising Prior: θ ij x i, x j = x i x j which gives an energy: E x = i,j N4 θ ij x i, x j θ ij (x i, x j ) x j Questions: What labelling has lowest energy? What labelling has highest energy? x i θ i (x i ) Lowest energy Lowest energy Intermediate energy Very high energy This models the assumption that the object is spatially coherent 36
37 Adding unary and Pairwise term ω = 0 ω = 10 Question: What happens when ω increases further? Question (done in exercise) Can the global optimum be computed with graph cut? Please prove. ω = 40 ω = 200 Energy: E x = i θ i x i + ω i,j N4 x i x j 15/01/2015 Computer Vision I: Introduction 37
38 Is it the best we can do? 4-connected segmentation zoom zoom Zoom-in on image 38
39 Is it the best we can do? Given: E x = i θ i x i + i,j N4 x i x j Which segmentation has higher energy? Answers: 1) It depends on unary costs 2) The pairwise cost is the same in both cases (16 edges are cut of N 4 ) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 39
40 From 4-connected to 8-connected Factor Graph 4-connected 8-connected 1 Length of the paths: Eucl con con Larger connectivity can model true Euclidean length (also other metric possible) [Boykov et al. 03; 05] 40
41 Going to 8-connectivty 4-connected Euclidean 8-connected Euclidean (MRF) Is it the best we can do? Zoom-in image 41
42 Adapt the pairwise term E x = θ i x i + i i,j N 4 θ ij x i, x j exp β z i z j 2 Question (done in exercise) Can the global optimum be computed with graph cut? Please prove. θ ij x i, x j = x i x j (exp β z i z j 2 ) What is this term doing? β is a constant z i z j 2 Standard 4-connected Edge-dependent 4-connected 42
43 A probabilistic view 1. Just look at the conditional distribution Gibbs Distribution: p x z = 1 exp{ E(x, z) } f and f = x exp{ E(x, z)} E x, z = i θ i x i, z i + i,j N 4 θ ij x i, x j θ i x i = 0, z i = log p red (z i x i = 0) θ i x i = 1, z i = log p blue (z i x i = 1) θ ij x i, x j = x i x j 2. Factorize the conditional distribution p x z = 1 p z x p(x) p(z) is a constant factor p(z) p x = 1 exp{ x f i x j } p z x = 1 p(z 1 i,j N4 f i x i ) = 2 i 1 exp 0 = 1 (p blue z f i x i = 1 x i + p red z i x i = 0 1 x i ) exp 1 = i Check yourself: p x z = 1 f p z x p x = 1 f exp{ E(x, z) } 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 43
44 ICM - Iterated conditional mode Energy: x 2 E x = i θ i x i + i,j N 4 θ ij x i, x j x 3 x 1 x 4 x 5 Idea: fix all variable but one and optimize for this one. Insight: The optimization has an implicit energy that depends only on a few factors Example: E x 1 x \1 = i θ i x i + i,j N4 θ ij x i, x j = θ i x 1 + θ 12 x 1, x 2 + θ 13 x 1, x 3 + θ 14 x 1, x 4 + θ 15 x 1, x 5 x 1 x \1 means all labels but x 1 are fixed 44
45 ICM - Iterated conditional mode Energy: x 2 x 3 x 1 x 4 E x = i θ i x i + Algorithm: 1. Fix x = 0 2. From i = 1 n i,j N 4 θ ij x i, x j x 5 3. Update x i = argmin E x i x \i x i 4. Go to Step 2. if E(x) has not change wrt previous Iteration Problems: Can get stuck in local minima Depends on initialization ICM Global optimum (with graph cut) 45
46 ICM - parallelization Normal procedure: Parallel procedure: Step 1 Step 2 Step 3 Step 4 Step 1 Step 2 Step 3 Step 4 The schedule is a more complex task in graphs which are not 4-connected 46
47 Roadmap for next two lectures Definition and Visualization of Factor Graphs Converting Directed Graphical Models to Factor Graphs Probabilistic Programing Queries and making decisions Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut) Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 47
Intelligent Systems:
Intelligent Systems: Probabilistic Inference in Undirected and Directed Graphical models Carsten Rother 04/02/2014 Intelligent Systems: Probabilistic Inference in DGM and UGM Reminder: Random Forests 1)
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationMarkov Random Fields for Computer Vision (Part 1)
Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationIntroduction To Graphical Models
Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationSemi-Markov/Graph Cuts
Semi-Markov/Graph Cuts Alireza Shafaei University of British Columbia August, 2015 1 / 30 A Quick Review For a general chain-structured UGM we have: n n p(x 1, x 2,..., x n ) φ i (x i ) φ i,i 1 (x i, x
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationCS Lecture 4. Markov Random Fields
CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields
More informationProbabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm
Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationPart 6: Structured Prediction and Energy Minimization (1/2)
Part 6: Structured Prediction and Energy Minimization (1/2) Providence, 21st June 2012 Prediction Problem Prediction Problem y = f (x) = argmax y Y g(x, y) g(x, y) = p(y x), factor graphs/mrf/crf, g(x,
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation
More informationBayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks Matt Gormley Lecture 24 April 9, 2018 1 Homework 7: HMMs Reminders
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationStatistical Learning
Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A.
More informationRepresentation of undirected GM. Kayhan Batmanghelich
Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationIntroduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah
Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationThe Ising model and Markov chain Monte Carlo
The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationGraphical models. Sunita Sarawagi IIT Bombay
1 Graphical models Sunita Sarawagi IIT Bombay http://www.cse.iitb.ac.in/~sunita 2 Probabilistic modeling Given: several variables: x 1,... x n, n is large. Task: build a joint distribution function Pr(x
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationUsing first-order logic, formalize the following knowledge:
Probabilistic Artificial Intelligence Final Exam Feb 2, 2016 Time limit: 120 minutes Number of pages: 19 Total points: 100 You can use the back of the pages if you run out of space. Collaboration on the
More informationAlternative Parameterizations of Markov Networks. Sargur Srihari
Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models Features (Ising,
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More information4.1 Notation and probability review
Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationProbabilistic Graphical Models Lecture Notes Fall 2009
Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationNotes on Markov Networks
Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationMarkov Chains and MCMC
Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationDirected Probabilistic Graphical Models CMSC 678 UMBC
Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationEnergy Minimization via Graph Cuts
Energy Minimization via Graph Cuts Xiaowei Zhou, June 11, 2010, Journal Club Presentation 1 outline Introduction MAP formulation for vision problems Min-cut and Max-flow Problem Energy Minimization via
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationIntroduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah
Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,
More informationUNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS
UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems
More informationMACHINE LEARNING 2 UGM,HMMS Lecture 7
LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationAlternative Parameterizations of Markov Networks. Sargur Srihari
Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationJoint Optimization of Segmentation and Appearance Models
Joint Optimization of Segmentation and Appearance Models David Mandle, Sameep Tandon April 29, 2013 David Mandle, Sameep Tandon (Stanford) April 29, 2013 1 / 19 Overview 1 Recap: Image Segmentation 2 Optimization
More informationLanguage as a Stochastic Process
CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any
More information11 The Max-Product Algorithm
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,
More informationMAP Examples. Sargur Srihari
MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances
More informationEfficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials by Phillip Krahenbuhl and Vladlen Koltun Presented by Adam Stambler Multi-class image segmentation Assign a class label to each
More informationInference in Graphical Models Variable Elimination and Message Passing Algorithm
Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption
More informationMarkov Random Fields and Bayesian Image Analysis. Wei Liu Advisor: Tom Fletcher
Markov Random Fields and Bayesian Image Analysis Wei Liu Advisor: Tom Fletcher 1 Markov Random Field: Application Overview Awate and Whitaker 2006 2 Markov Random Field: Application Overview 3 Markov Random
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationUniversity of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models
University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models Jeff A. Bilmes Lecture 1 Slides March 28 th, 2006 Lec 1: March 28th, 2006 EE512
More informationProbablistic Graphical Models, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07
Probablistic Graphical odels, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07 Instructions There are four questions in this homework. The last question involves some programming which
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationMarkov Random Fields
Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and
More information{ p if x = 1 1 p if x = 0
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationConditional Independence and Factorization
Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More information