Intelligent Systems:

Similar documents
Intelligent Systems:

CPSC 540: Machine Learning

Lecture 15. Probabilistic Models on Graph

CSC 412 (Lecture 4): Undirected Graphical Models

Chapter 16. Structured Probabilistic Models for Deep Learning

Markov Random Fields for Computer Vision (Part 1)

3 : Representation of Undirected GM

Introduction To Graphical Models

Variational Inference (11/04/13)

Semi-Markov/Graph Cuts

Probabilistic Graphical Models (I)

Undirected graphical models

Undirected Graphical Models

Graphical Models and Kernel Methods

Graphical Models for Collaborative Filtering

Chris Bishop s PRML Ch. 8: Graphical Models

Probabilistic Graphical Models

Naïve Bayes classification

Probabilistic Graphical Models

CS Lecture 4. Markov Random Fields

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Bayesian Networks. Motivation

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Part 6: Structured Prediction and Energy Minimization (1/2)

Undirected Graphical Models: Markov Random Fields

Machine Learning 4771

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Statistical Learning

Representation of undirected GM. Kayhan Batmanghelich

Directed and Undirected Graphical Models

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Lecture 4 October 18th

The Ising model and Markov chain Monte Carlo

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Rapid Introduction to Machine Learning/ Deep Learning

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Graphical models. Sunita Sarawagi IIT Bombay

10708 Graphical Models: Homework 2

Using first-order logic, formalize the following knowledge:

Alternative Parameterizations of Markov Networks. Sargur Srihari

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Lecture 9: PGM Learning

4.1 Notation and probability review

Bayesian Machine Learning - Lecture 7

Machine Learning for Signal Processing Bayes Classification and Regression

Probabilistic Graphical Models Lecture Notes Fall 2009

Notes on Markov Networks

Conditional Random Field

Markov Chains and MCMC

Undirected Graphical Models

Directed Probabilistic Graphical Models CMSC 678 UMBC

Linear Dynamical Systems

Bayesian Learning (II)

Energy Minimization via Graph Cuts

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Probabilistic Graphical Models

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Alternative Parameterizations of Markov Networks. Sargur Srihari

Machine Learning for Data Science (CS4786) Lecture 24

STA 4273H: Statistical Machine Learning

14 : Theory of Variational Inference: Inner and Outer Approximation

p L yi z n m x N n xi

STA 4273H: Statistical Machine Learning

A brief introduction to Conditional Random Fields

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

13: Variational inference II

Hidden Markov Models in Language Processing

6.867 Machine learning, lecture 23 (Jaakkola)

Joint Optimization of Segmentation and Appearance Models

Language as a Stochastic Process

11 The Max-Product Algorithm

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Probabilistic Graphical Models

Intelligent Systems (AI-2)

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

MAP Examples. Sargur Srihari

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Markov Random Fields and Bayesian Image Analysis. Wei Liu Advisor: Tom Fletcher

CPSC 540: Machine Learning

Intelligent Systems (AI-2)

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

Probablistic Graphical Models, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07

Bayesian Networks Inference with Probabilistic Graphical Models

Markov Random Fields

{ p if x = 1 1 p if x = 0

Machine Learning Lecture 14

COMP90051 Statistical Machine Learning

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

14 : Theory of Variational Inference: Inner and Outer Approximation

Conditional Independence and Factorization

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Transcription:

Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM

Roadmap for next two lectures Definition and Visualization of Factor Graphs Converting Directed Graphical Models to Factor Graphs Probabilistic Programing Queries and making decisions Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut) Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 2

Reminder: Structured Models - when to use what representation? Directed graphical model: The unknown variables have different meanings Example: MaryCalls, JohnCalls(J), AlarmOn(A), burglarinhouse(b) Factor Graphs: the unknown variables have all the same meaning Examples: Pixel in an image, nuclei in C-Elegans (worm) Undirected graphical model are used, instead of Factor graphs, when we are interested in studying conditional independency (not relevant for our context) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 3

Reminder: Machine Learning: Structured versus Unstructured Models Structured Output Prediction: m n f/p: Z X (for example: X = R or X = N ) Important: the elements in X do not make independent decisions Definition (not formal) The Output consists of several parts, and not only the parts themselves contain information, but also the way in which the parts belong together. Example: Image Labelling (Computer Vision) n n K has a fixed vocabulary, e.g. K={Wall, Picture, Person, Clutter, } Input: Image (Z m ) Output: Labeling (K m ) Important: The labeling of neighboring pixels is highly correlated 4

Reminder: Machine Learning: Structured versus Unstructured Models Structured Output Prediction: m n f/p: Z X (for example: X = R or X = N ) Important: the elements in X do not make independent decisions Definition (not formal) The Output consists of several parts, and not only the parts themselves contain information, but also the way in which the parts belong together. Example: Text Processing n n The boy went home Input: Text (Z m ) Output: X m (Parse tree of a sentence) 5

Factor Graph model - Example A Factor Graph defines a distribution as: p x = 1 f F F ψ F (x N F ) where f = x F F ψ F(x N F ) f: Partition function so that distribution is normalized F: Factor F: Set of all factors N(F): Neighbourhood of a factor ψ F : function (not distribution) depending on x N(F) (ψ C : K C R where x i K) Example p x 1, x 2 = 1 f ψ 1 x 1, x 2 ψ 2 x 2 x i 0,1 ; K = 2 x N 1 = {x 1, x 2 } ψ 1 0,0 = 1; ψ 1 0,1 = 0;ψ 1 1,0 = 1;ψ 1 1,1 = 2 x N 2 = {x 2 } ψ 2 0 = 1; ψ 2 1 = 0; f = 1 1 + 0 0 + 1 1 + 2 0 = 2 Check yourself that: x p x = 1 6

Factor Graph model - Visualization For visualization: utilize an undirected Graph G = (V, F, E), where V, F are the set of nodes and E the set of Edges Example p x 1, x 2 = 1 f ψ 1 x 1, x 2 ψ 2 x 2 x i 0,1 ; K = 2 x N 1 = {x 1, x 2 } ψ 1 0,0 = 1; ψ 1 0,1 = 0;ψ 1 1,0 = 1;ψ 1 1,1 = 2 x N 2 = {x 2 } ψ 2 0 = 1; ψ 2 1 = 0; f = 1 1 + 0 0 + 1 1 + 2 0 = 2 Check yourself that: x p x = 1 visualizes a variable node Visualization of: ψ x 1 x 2 2 ψ 1 visualizes a factor node means that these variables are in one factor p x 1, x 2 = 1 f ψ 1 x 1, x 2 ψ 2 x 2 7

Factor Graph model - Visualization Example: p x 1, x 2, x 3, x 4, x 5 = 1 f ψ x 1, x 2, x 4 ψ x 2, x 3 ψ x 3, x 4 ψ x 4, x 5 ψ x 4 ψ s are specified in some way x 1 x 2 x 4 x 5 Visualization x 3 visualizes a variable node visualizes a factor node means that these variables are in one factor 8

Probabilities and Energies p x = 1 f F F ψ F(x N F ) = 1 f F F exp{ θ F(x N F )} = 1 f exp{ F F θ F x N F } = 1 f exp{ E(x) } The energy E x is just a sum of factors: E x = F F θ F x N F The most likely solution x is reached by minimizing the energy: x = argmax x p x x = argmin x E(x) Note: 1) If x is minimizer of f(x) then also of log f(x) (note that x 1 x 2 means that log x 1 log x 2 ) 2) It is: log P x = log f E x = constant E(x) 9

Names The Probability distribution: p x = 1 f exp{ E(x)} with energy: E x = F F θ F x N F is a so-called Gibbs distribution and f = x exp{ E(x)} We define the order of a Factor Graph as the arity (number of variables) of the largest factor. Example of an order 3 model: E x = θ x 1, x 2, x 4 + θ x 2, x 3 + θ x 3, x 4 + θ x 5, x 4 + θ(x 4 ) arity 3 arity 2 arity 1 A different name for factor graph / undirected graphical model is Markov Random Field. This is an extension of Markov Chains to Fields. The name Markov stands for the Markov property that means essentially that the order of a factor is small. 10

Examples: Order 4-connected; pairwise MRF E(x) = θ ij (x i, x j ) i, j Є N 4 higher(8)-connected; pairwise MRF E(x) = θ ij (x i, x j ) Higher-order RF E(x) = θ ij (x i, x j ) i, j Є N 8 i, j Є N 4 +θ(x 1,, x n ) Order 2 Order 2 Order n Pairwise energy higher-order energy 11

Roadmap for next two lectures Definition and Visualization of Factor Graphs Converting Directed Graphical Models to Factor Graphs Probabilistic Programing Queries and making decisions Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut) Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 12

Converting Directed Graphical Model to Factor Graph A simple case: x 3 x 2 x 1 p x 1, x 2, x 3 = p x 3 x 2 p x 2 x 1 p(x 1 ) x 3 x 2 x 1 p x 1, x 2, x 3 = 1 f ψ x 3, x 2 ψ(x 2, x 1 ) ψ(x 1 ) where: f = 1 ψ x 3, x 2 = p x 3 x 2 ψ x 2, x 1 = p x 2 x 1 ψ x 1 = p(x 1 ) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 13

Converting Directed Graphical Model to Factor Graph A more complex case: x 4 x 4 x 3 x 2 x 3 x 2 x 1 p x 1, x 2, x 3, x 4 = p x 1 x 2, x 3 p x 2 p x 3 x 4 p(x 4 ) p x 1, x 2, x 3, x 4 = x 1 1 f (ψ x 1, x 2, x 3 ψ x 2 ψ x 3, x 4 ψ x 4 ) where: f = 1 ψ x 1, x 2, x 3 = p(x 1 x 2, x 3 ) ψ x 2 = p(x 2 ) ψ x 3, x 4 = p(x 3 x 4 ) ψ x 4 = p(x 4 ) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 14

Converting Directed Graphical Model to Factor Graph Take each conditional probability and convert it to a factor (without conditioning), i.e. replace conditional symbols with commas. Set normalization constant f = 1 Visualization: all parents to a certain node form a new factor (this step is called moralization) Comment the other direction is more complicated since factors ψ s have to be converted correctly to individual probabilities, such that overall joint distribution stays the same. Our example: Directed GM: p x 1, x 2, x 3, x 4 = p x 1 x 2, x 3 p x 2 p x 3 x 4 p(x 4 ) Factor Graph: p x 1, x 2, x 3, x 4 = 1 (ψ x f 1, x 2, x 3 ψ x 2 ψ x 3, x 4 ψ x 4 ) where: f = 1 ψ x 1, x 2, x 3 = p(x 1 x 2, x 3 ) ψ x 2 = p(x 2 ) ψ x 3, x 4 = p(x 3 x 4 ) ψ x 4 = p(x 4 ) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 15

Roadmap for next two lectures Definition and Visualization of Factor Graphs Converting Directed Graphical Models to Factor Graphs Probabilistic Programing Queries and making decisions Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut) Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 16

Probabilistic programming Languages A programming language for machine learning tasks. In particular for modelling, learning and making predictions in directed graphical models (DGM), undirected graphical models (UGM), and factor graphs (FG) Comment DGM and UGM are converted to Factor graphs. All operations are run on factor graphs. Basic Idea is to associate with each variables a distribution: See http://probabilistic-programming.org/wiki/home Bool coin = 0; % Normally C++: Bool coin = Bernoulli(0.5); % Bernoulli is a distribution with 2 states. Probabilistic Program. 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 17

An Example: Two coins Example: You draw two fair coins. What is the chance that both a head? Random variables: coin1 (x 1 ) and coin2 (x 2 ), event z about the state of both variables We know: coin1 (x 1 ) and coin2 (x 2 ) are independent Each coin has equal probability to be head (1) or tail (0) New random variable z which is true if and only if both coins are head: z = x 1 & x 2 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 18

An Example: Two coins x 1 x 2 x i 0,1 p x i = 1 = p x i = 0 = 0.5 z P(z = 1 x 1, x 2 ) P(z = 0 x 1, x 2 ) Value x 1 Value x 2 0 1 0 0 0 1 0 1 0 1 1 0 1 0 1 1 Joint: p x 1, x 2, z = p z x 1, x 2 p x 1 p(x 2 ) Compute Marginal: p z = x 1,x 2 p z, x 1, x 2 = x 1,x 2 p z x 1, x 2 p x 1 p(x 2 ) p z = 1 = 1 0.5 0.5 = 0.25 p z = 0 = 1 0.5 0.5 + 1 0.5 0.5 + 1 0.5 0.5 = 0.75 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 19

An Example: Two coins - Infer.net Program: Run it: Add evidence to the program: 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 20

Roadmap for next two lectures Definition and Visualization of Factor Graphs Converting Directed Graphical Models to Factor Graphs Probabilistic Programing Queries and making decisions Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut) Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 21

What to infer? Same as in directed graphical models: MAP inference (Maximum a posterior state): x = argmax x p x = argmin x E x Probabilistic Inference, so-called marginal: p x i = k = x x i =k p(x 1, x i = k,, x n ) This can be used to make a maximum marginal decision: x i = argmax xi p x i 22

MAP versus Marginal - visually Input Image Ground Truth Labeling MAP solution x (each pixel has 0,1 label) Marginal p x i (each pixel has a probability between 0 and 1) 23

MAP versus Marginal Making Decisions p(x z) Input image z Space of all solutions x (sorted by pixel difference) Which solution x would you choose? 24

Reminder: How to make a decision Assume model p x z is known Question: What solution x should we give out? Answer: Choose x which minimizes the Bayesian risk: x = argmin x x p x z C(x, x ) C x 1, x 2 is called the loss function (or cost function) of comparing to results x 1, x 2 25

MAP versus Marginal Making Decisions p(x z) Input image z Space of all solutions x (sorted by pixel difference) Which one is the MAP solution? MAP solution (red) takes globally optimal solution x = argmax x p x z = argmin x E x, z 26

Reminder: The Cost Function behind MAP The cost function for MAP: C x, x = 0 if x = x, otherwise 1 x = argmin x p x z C(x, x ) x = argmin x 1 p x = x z = argmax x p(x z) The MAP estimation optimizes a global 0-1 loss 27

The Cost Function behind Max Marginal Probabilistic Inference give marginal. We can take the max-marginal solution: x i = argmax xi p x i (where p x i = k = x xi =k p(x 1, x i = k,, x n ) This represents the decision with minimum Bayesian Risk: x = argmin x where C x, x = i x i x i 2 x p x z C(x, x ) (proof not done) For x i {0,1} this counts the number of differently labeled pixels Example: x 1 x 2 C x 1, x 2 = 10 C x 2, x 3 = 10 x 3 C x1, x 3 = 20 (numbers guessed) 28

MAP versus Marginal Making Decisions p = 0.1 (all numbers are arbitrary chosen) p = 0.11 p = 0.1 p = 0.2 p(x z) C=1 C=1 C=100 Which one is the Max-Marginal solution? Input image z Space of all solutions x (sorted by pixel difference) x is red then Risk is (sum only over 4 solutions): 0.1+0.1+100*0.2 = 20.2 x is blue then Risk is (sum only over 4 solutions): 11+10+10 = 31 Hence red is the max-marginal solution 29

This lecture: MAP Inference in order 2 models p x = 1 f exp{ E(x) } Gibbs distribution E x = θ i x i + θ ij x i, x j + θ i,j,k x i, x j, x k i i,j i,j,k Unary terms Pairwise terms Higher-order terms + We only look at energies with unary and pairwise factors MAP inference: x = argmax x p x = argmin x E x Label space: binary x i 0,1 or multi-label x i 0,, K 30

Roadmap for next two lectures Definition and Visualization of Factor Graphs Converting Directed Graphical Models to Factor Graphs Probabilistic Programing Queries and making decisions Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut) Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 31

Image Segmentation Input Image Desired binary output labeling with user brush strokes (blue-background; red-foreground) We will use the following energy: E x = i θ i x i + Unary term Binary Label: x i {0,1} i,j N 4 θ ij x i, x j N 4 is the set of all neighboring pixels Pairwise term x j x i θ ij (x i, x j ) θ i (x i ) 32

Image Segmentation: Energy Goal: formulate E(x) such that E x = 0.01 E x = 0.05 E x = 0.05 E x = 10 (numbers may not represent real numbers) MAP solution: x = argmin x E(x) 15/01/2015 Computer Vision I: Introduction 33

Green Unary term Green Gaussian Mixture Model fit Red user labelled pixels (cross foreground; dot background) Red Foreground model is blue Background model is red 15/01/2015 Computer Vision I: Introduction 34

Unary term New query image z i θ i x i = 0 = log P red (z i x i = 0) θ i x i = 1 = log P blue (z i x i = 1) θ i (x i = 0) θ i (x i = 1) Dark means likely background Dark means likely foreground Optimum with unary terms only x = argmin x E(x) E x = 15/01/2015 Computer Vision I: Introduction 35 i θ i x i

Pairwise term We choose a so-called Ising Prior: θ ij x i, x j = x i x j which gives an energy: E x = i,j N4 θ ij x i, x j θ ij (x i, x j ) x j Questions: What labelling has lowest energy? What labelling has highest energy? x i θ i (x i ) Lowest energy Lowest energy Intermediate energy Very high energy This models the assumption that the object is spatially coherent 36

Adding unary and Pairwise term ω = 0 ω = 10 Question: What happens when ω increases further? Question (done in exercise) Can the global optimum be computed with graph cut? Please prove. ω = 40 ω = 200 Energy: E x = i θ i x i + ω i,j N4 x i x j 15/01/2015 Computer Vision I: Introduction 37

Is it the best we can do? 4-connected segmentation zoom zoom Zoom-in on image 38

Is it the best we can do? Given: E x = i θ i x i + i,j N4 x i x j Which segmentation has higher energy? 0 0 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 1 1 0 0 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 Answers: 1) It depends on unary costs 2) The pairwise cost is the same in both cases (16 edges are cut of N 4 ) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 39

From 4-connected to 8-connected Factor Graph 4-connected 8-connected 1 Length of the paths: Eucl. 5.65 8 4-con. 6.28 6.28 8-con. 5.08 6.75 Larger connectivity can model true Euclidean length (also other metric possible) [Boykov et al. 03; 05] 40

Going to 8-connectivty 4-connected Euclidean 8-connected Euclidean (MRF) Is it the best we can do? Zoom-in image 41

Adapt the pairwise term E x = θ i x i + i i,j N 4 θ ij x i, x j exp β z i z j 2 Question (done in exercise) Can the global optimum be computed with graph cut? Please prove. θ ij x i, x j = x i x j (exp β z i z j 2 ) What is this term doing? β is a constant z i z j 2 Standard 4-connected Edge-dependent 4-connected 42

A probabilistic view 1. Just look at the conditional distribution Gibbs Distribution: p x z = 1 exp{ E(x, z) } f and f = x exp{ E(x, z)} E x, z = i θ i x i, z i + i,j N 4 θ ij x i, x j θ i x i = 0, z i = log p red (z i x i = 0) θ i x i = 1, z i = log p blue (z i x i = 1) θ ij x i, x j = x i x j 2. Factorize the conditional distribution p x z = 1 p z x p(x) p(z) is a constant factor p(z) p x = 1 exp{ x f i x j } p z x = 1 p(z 1 i,j N4 f i x i ) = 2 i 1 exp 0 = 1 (p blue z f i x i = 1 x i + p red z i x i = 0 1 x i ) exp 1 = 0.36 2 i Check yourself: p x z = 1 f p z x p x = 1 f exp{ E(x, z) } 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 43

ICM - Iterated conditional mode Energy: x 2 E x = i θ i x i + i,j N 4 θ ij x i, x j x 3 x 1 x 4 x 5 Idea: fix all variable but one and optimize for this one. Insight: The optimization has an implicit energy that depends only on a few factors Example: E x 1 x \1 = i θ i x i + i,j N4 θ ij x i, x j = θ i x 1 + θ 12 x 1, x 2 + θ 13 x 1, x 3 + θ 14 x 1, x 4 + θ 15 x 1, x 5 x 1 x \1 means all labels but x 1 are fixed 44

ICM - Iterated conditional mode Energy: x 2 x 3 x 1 x 4 E x = i θ i x i + Algorithm: 1. Fix x = 0 2. From i = 1 n i,j N 4 θ ij x i, x j x 5 3. Update x i = argmin E x i x \i x i 4. Go to Step 2. if E(x) has not change wrt previous Iteration Problems: Can get stuck in local minima Depends on initialization ICM Global optimum (with graph cut) 45

ICM - parallelization Normal procedure: Parallel procedure: Step 1 Step 2 Step 3 Step 4 Step 1 Step 2 Step 3 Step 4 The schedule is a more complex task in graphs which are not 4-connected 46

Roadmap for next two lectures Definition and Visualization of Factor Graphs Converting Directed Graphical Models to Factor Graphs Probabilistic Programing Queries and making decisions Binary-valued Factors graphs: Models and Optimization (ICM, Graph Cut) Multi-valued Factors Graphs: Models and Optimization (Alpha Expansion) 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM 47