Graphical Models Another Approach to Generalize the Viterbi Algorithm

Similar documents
Graphical Models Seminar

Undirected Graphical Models

Conditional Random Field

CRF for human beings

Probabilistic Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models

Machine Learning 4771

4 : Exact Inference: Variable Elimination

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

MACHINE LEARNING 2 UGM,HMMS Lecture 7

STA 4273H: Statistical Machine Learning

STA 414/2104: Machine Learning

Intelligent Systems (AI-2)

Machine Learning 4771

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

CS281A/Stat241A Lecture 19

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Undirected Graphical Models: Markov Random Fields

Probabilistic Graphical Models (I)

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Machine Learning Lecture 14

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Intelligent Systems (AI-2)

COMP90051 Statistical Machine Learning

Bayesian Machine Learning - Lecture 7

9 Forward-backward algorithm, sum-product on factor graphs

Random Field Models for Applications in Computer Vision

3 : Representation of Undirected GM

11 The Max-Product Algorithm

Junction Tree, BP and Variational Methods

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Inference in Bayesian Networks

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Probability Propagation

CS Lecture 4. Markov Random Fields

Dynamic Approaches: The Hidden Markov Model

Today s Lecture: HMMs

Probabilistic Models for Sequence Labeling

Graphical models. Sunita Sarawagi IIT Bombay

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015

6.867 Machine learning, lecture 23 (Jaakkola)

CSC 412 (Lecture 4): Undirected Graphical Models

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Variational Inference (11/04/13)

Statistical Learning

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Message Passing Algorithms and Junction Tree Algorithms

Hidden Markov Models

Hidden Markov Models

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

p L yi z n m x N n xi

Introduction to Machine Learning CMU-10701

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models (I)

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Intelligent Systems (AI-2)

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Markov Random Fields

5. Sum-product algorithm

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Probabilistic Graphical Models

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

Data Mining in Bioinformatics HMM

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Probability Propagation

Genome 373: Hidden Markov Models II. Doug Fowler

A brief introduction to Conditional Random Fields

13 : Variational Inference: Loopy Belief Propagation

Pattern Recognition and Machine Learning

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Probabilistic Graphical Models

Parameter learning in CRF s

Log-Linear Models, MEMMs, and CRFs

Lecture 9: PGM Learning

Probabilistic Graphical Models

Directed Probabilistic Graphical Models CMSC 678 UMBC

Probabilistic Graphical Models Lecture Notes Fall 2009

Directed and Undirected Graphical Models

Final Exam, Machine Learning, Spring 2009

Graphical models for part of speech tagging

Linear Dynamical Systems

Lecture 6: Graphical Models

Undirected graphical models

Hidden Markov Models

Computational Genomics and Molecular Biology, Fall

Probabilistic Graphical Models

Rapid Introduction to Machine Learning/ Deep Learning

The Origin of Deep Learning. Lili Mou Jan, 2015

Semi-Markov/Graph Cuts

Machine Learning for Structured Prediction

Alternative Parameterizations of Markov Networks. Sargur Srihari

Lecture 17: May 29, 2002

Directed and Undirected Graphical Models

Hidden Markov Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Undirected Graphical Models

Conditional Random Fields: An Introduction

Transcription:

Exact Marginalization Another Approach to Generalize the Viterbi Algorithm Oberseminar Bioinformatik am 20. Mai 2010 Institut für Mikrobiologie und Genetik Universität Göttingen mario@gobics.de 1.1

Undirected Graphical Model Undirected Graphical Model (Markov random field) describes joint probability distribution of a set of random variables x = (x v ) v V conditional independence properties of the random variables given by an undirected graph G = (V, E) p(x v x \v ) = p(x v x N(v) ) (for all v V ) Here: N(v) = {w V {v, w} E} = set of neighbours of v in the graph Example HMM / linear chain CRF x 1 x i 1 x i x i+1 x n P(x i x 1,..., x i 1, x i+1,..., x n ) = P(x i x i 1, x i+1 ) 1.2

Undirected Graphical Model Factorization (Hammersley and Clifford, 1971) All undirected graphical models with P(x) > 0 are of this form: P(x) = 1 Z ψ C (x C ) C C ψ C > 0: potential functions C: subsets of V (maximal cliques or subsets thereof) x C = (x v ) v C = variables associated with the subset C Z implicitly defined normalization constant 1.3

Undirected Graphical Model Factorization (Hammersley and Clifford, 1971) All undirected graphical models with P(x) > 0 are of this form: P(x) = 1 Z ψ C (x C ) C C ψ C > 0: potential functions C: subsets of V (maximal cliques or subsets thereof) x C = (x v ) v C = variables associated with the subset C Z implicitly defined normalization constant Example (HMM) Conditional on observation sequence y: n P(x y) = 1 P(y) P(y i x i )P(x i x i 1 ) i=1 Z = P(y), C = V E ψ i = P(y i x i ) (emission probs) ψ {i 1,i} = P(x i x i 1 ) (transition probs) 1.3

CRFs in Particular Conditional Random Field Distribution conditional on observation y is an undirected graphical model: P(x y) = 1 ψ C (x C, y) Z (y) C C 1.4

CRFs in Particular Conditional Random Field Distribution conditional on observation y is an undirected graphical model: P(x y) = 1 ψ C (x C, y) Z (y) C C exponential form In practice, potentials are of this exponential form k ψ C (x C, y) = exp w i f i (x C, y, C), where the feature functions f i are often binary features. Remark: HMMs are still of this form. i=1 1.4

(Conditional) Random Fields Quantities of Interest 1 most likely labeling arg max P(x y) (or approx.) corresponds to most likely gene structure most likely alignment denoised image best party plan x 1.5

(Conditional) Random Fields Quantities of Interest 1 most likely labeling arg max P(x y) (or approx.) corresponds to most likely gene structure most likely alignment denoised image best party plan 2 posterior probabilities / marginals: P(x W y) / P(x W ) for W a vertex or clique x 1.5

(Conditional) Random Fields Quantities of Interest 1 most likely labeling arg max P(x y) (or approx.) corresponds to most likely gene structure most likely alignment denoised image best party plan 2 posterior probabilities / marginals: P(x W y) / P(x W ) for W a vertex or clique x 3 normalization constant Z e.g. to actually compute P(x) for given x (classification) 1.5

Reminder: Had Developed Generalized Viterbi Algorithm Own algorithm Finds most likely element (labeling) when C = vertices and edges. ˆx arg max P(x), x H1 B1 B2 H2 H3 B3 Implementation Java program GeneralViterbi written by Moritz Maneke in Bachelor thesis. 1.6

Application: Comparative Gene Finding Danesh Morady Garavand (lab rotation) use GeneralViterbi as a new approach to comparative gene finding: Find genes simultaneously in two or more orthologous DNA sequences. Example (two aligned genomes) horizontal edges: neighboring bases in same species vertical edges: aligned bases in different species 1.7

Application: Comparative Gene Finding Comparative Model used simple label space with 7 labels {N,E0,E1,E2,I0,I1,I2} 1.8

Application: Comparative Gene Finding Comparative Model used simple label space with 7 labels {N,E0,E1,E2,I0,I1,I2} reused parameters of AUGUSTUS (linear chain CRF) for all node features and horizontal edge features (emission probabilities, transition probabilities) without vertical edges, two independent ab initio predictions 1.8

Application: Comparative Gene Finding Comparative Model used simple label space with 7 labels {N,E0,E1,E2,I0,I1,I2} reused parameters of AUGUSTUS (linear chain CRF) for all node features and horizontal edge features (emission probabilities, transition probabilities) without vertical edges, two independent ab initio predictions used very simple vertical edge features: f (x {u,v}, y, {u, v}) = { 1, if xu = x v 0, otherwise same labels of aligned bases are rewarded 1.8

Application: Comparative Gene Finding Results/Conclusions test on sequence pairs of length 5000 accuracy improvement of with vertical edges over without vertical edges : 8 percent points on exon level extension to production setting conceptionally easy, e.g. 12 Drosophila species, evidence integration 1.9

Application: Comparative Gene Finding Results/Conclusions test on sequence pairs of length 5000 accuracy improvement of with vertical edges over without vertical edges : 8 percent points on exon level extension to production setting conceptionally easy, e.g. 12 Drosophila species, evidence integration But: efficiency drops exponentially with width of graph 1.9

Application: Protein-Protein-Interaction 63 62 15 59 60 61 14 16 19 17 58 56 57 55 52 54 51 53 22 13 21 12 20 37 38 18 39 41 50 48 47 23 35 36 11 44 42 40 46 10 43 49 32 24 34 45 9 33 31 8 25 7 30 6 26 29 5 27 0 28 3 4 1 2 interface residue backbone close in space Find labeling of vertices in {interface, not interface} based on vertex and edge features.

Limits of GeneralViterbi Problems GeneralViterbi too inefficient on very complex graphs (problem inherent limitation: NP-hard) also can use marginals like P(x v = l y) for training CRFs But: my approach not directly transferrable to marginalization Forward algorithm working, but not clear what corresponding Backwards variables would be. 1.11

Eliminate Limits Catch 2 Birds with 1 Stone: : 1 can do maximization and marginalization with very similar algorithm 2 can tackle complex graphs by running an approximation variant of the same algorithm 1.12

Eliminate Limits Catch 2 Birds with 1 Stone: : 1 can do maximization and marginalization with very similar algorithm 2 can tackle complex graphs by running an approximation variant of the same algorithm Maybe 3rd Bird: can use higher-order clique features in PPI. E.g. patch features. : most interfaces consist of [...] buried cores surrounded by partially accessible rims amino acid composition of the core differs considerably from the rim (Shoemaker and Panchenko, 2007) 1.12

Reminder: Definition () A tree decomposition of an undirected graph G = (V, E) is a tree (T, M), in which the vertices A T (bags) are subsets of vertices in G A = V A T u, v E A T with u, v A A, B T : C on the unique path from A to B A B C (running intersection property) Example 1.13 Copyright by Hein Röhrig

Tree Width of a Graph Definition (tree width) the width of a tree decomposition is: (size of largest bag) - 1 the tree width of a graph is the minimal width of all its tree decompositions 1.14

Compute Narrow For each k there is a linear time algorithm that tests whether a graph G = (V, E) has tree width at most k and, if so, computes a tree decomposition (T, M) of width at most k. Hans Bodlaender, 1992 1.15

Compute Narrow For each k there is a linear time algorithm that tests whether a graph G = (V, E) has tree width at most k and, if so, computes a tree decomposition (T, M) of width at most k. Hans Bodlaender, 1992 Practial Solutions exakt k algorithm complicated, but linear time implementation is available: Diplomarbeit of Hein Röhrig, 1998, GNU licence, http://hein.roehrig.name/dipl/ 1.15

Compute Narrow For each k there is a linear time algorithm that tests whether a graph G = (V, E) has tree width at most k and, if so, computes a tree decomposition (T, M) of width at most k. Hans Bodlaender, 1992 Practial Solutions exakt k algorithm complicated, but linear time implementation is available: Diplomarbeit of Hein Röhrig, 1998, GNU licence, http://hein.roehrig.name/dipl/ junction tree: Can also add edges to E ( Triangulation), so that in the resulting graph G a tree of cliques exists that is a tree decomposition of G. 1.15

Compute Narrow For each k there is a linear time algorithm that tests whether a graph G = (V, E) has tree width at most k and, if so, computes a tree decomposition (T, M) of width at most k. Hans Bodlaender, 1992 Practial Solutions exakt k algorithm complicated, but linear time implementation is available: Diplomarbeit of Hein Röhrig, 1998, GNU licence, http://hein.roehrig.name/dipl/ junction tree: Can also add edges to E ( Triangulation), so that in the resulting graph G a tree of cliques exists that is a tree decomposition of G. may chose to remove edges 1.15

Overview: Setting Have: undirected graphical model with potentials ψ C (x c ), tree decomposition of graph Want: marginal distributions P(x W ) for bags W of tree decomposition 1.16

Overview: Setting Have: undirected graphical model with potentials ψ C (x c ), tree decomposition of graph Want: marginal distributions P(x W ) for bags W of tree decomposition Remark: will give us posterior probs of vertices and edges needed for CRF training with IIS: P(x v y), P(x {v,w} y) 1.16

Overview: Setting Have: undirected graphical model with potentials ψ C (x c ), tree decomposition of graph Want: marginal distributions P(x W ) for bags W of tree decomposition Remark: will give us posterior probs of vertices and edges needed for CRF training with IIS: P(x v y), P(x {v,w} y) Idea: algorithm will maintain a distribution φ W for each bag ( potential ) 1.16

Overview: Setting Have: undirected graphical model with potentials ψ C (x c ), tree decomposition of graph Want: marginal distributions P(x W ) for bags W of tree decomposition Remark: will give us posterior probs of vertices and edges needed for CRF training with IIS: P(x v y), P(x {v,w} y) Idea: algorithm will maintain a distribution φ W for each bag ( potential ) will iteratively and locally update potentials, by passing messages between neighboring bags 1.16

Overview: Setting Have: undirected graphical model with potentials ψ C (x c ), tree decomposition of graph Want: marginal distributions P(x W ) for bags W of tree decomposition Remark: will give us posterior probs of vertices and edges needed for CRF training with IIS: P(x v y), P(x {v,w} y) Idea: algorithm will maintain a distribution φ W for each bag ( potential ) will iteratively and locally update potentials, by passing messages between neighboring bags after a fixed number of steps potentials are true marginals 1.16

Separators For each edge {X, Y } in the tree decomposition we add a separator node S between X and Y with all vertices in the intersection: S = X Y Example X S Y abcd cd cdef d dgh separator bag Why? 1.17

Separators For each edge {X, Y } in the tree decomposition we add a separator node S between X and Y with all vertices in the intersection: S = X Y Example X S Y abcd cd cdef d dgh separator bag Local Consistency When potentials are true marginals of P(x), then they agree on their common vertices. In particular, φ X and φ y must eventually agree on S. 1.17

Local Consistency Consistency on Separator We need X\S φ X = φ S = Y \S φ Y Example X S Y abcd cd cdef φ X (a, b, c, d) = φ S (c, d) = φ Y (c, d, e, f ) a,b e,f 1.18

Global Consistency Must have for any bags X, Y with intersection I, that φ X = φ Y. X\I Y \I 1.19

Global Consistency Must have for any bags X, Y with intersection I, that φ X = φ Y. X\I Y \I running intersection property & local consistency = global consistency 1.19

Messages Example φ X φ S φ Y X S Y 1.20

Messages Example φ X φ S φ Y X S Y Message from bag X to bag Y update φ S and φ Y according to φ S X\S φ X (1) φ Y φ Y φ S φ S (2) φ S φ S (3) 1.20

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Forward pass: leaves to root 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Forward pass: leaves to root 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Forward pass: leaves to root 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Forward pass: leaves to root 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Forward pass: leaves to root 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Forward pass: leaves to root 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Forward pass: leaves to root 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Forward pass: leaves to root 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Backward pass: root to leaves 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Backward pass: root to leaves 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Backward pass: root to leaves 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Backward pass: root to leaves 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Backward pass: root to leaves 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Backward pass: root to leaves 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example Backward pass: root to leaves 1.21

Two Passes: Leaves to Root to Leaves 1 initialize potentials: φ A = ψ A on bags, φ 1 on separators 2 choose root bag arbitrarily 3 pass messages from leaves to root (depth first search order) 4 pass messages from root to leaves (reverse order) 5 potentials are exact marginals, Z can be obtained by summming any potential Example 1.21 Have local consistency between X and Y after backward pass!

Party Planning Example binary classification of variables S, O, D, M, H: 1=invited, 0=not invited Stephan Oliver Dana BANG Marc Hamed P = 1 Z ψ S ψ O ψ Dψ Mψ Hψ SO ψ OD ψ MO ψ DH v : ψ v(0) = 1, ψ v(1) = 2 S O φ DH 0 0 2 0 1 1 1 0 1 1 1 2 O D φ DH 0 0 1 0 1 1 1 0 2 1 1 2 D H φ DH 0 0 3 0 1 3 1 0 3 1 1 1 M O φ DH 0 0 1 0 1 1 1 0 1 1 1 0

Party Planning Example: Brute Force 1.23 distribution O D M S H 0 0 0 0 0 6 0 0 0 0 1 12 0 0 0 1 0 6 0 0 0 1 1 12 0 0 1 0 0 12 0 0 1 0 1 24 0 0 1 1 0 12 0 0 1 1 1 24 0 1 0 0 0 12 0 1 0 0 1 8 0 1 0 1 0 12 0 1 0 1 1 8 0 1 1 0 0 24 0 1 1 0 1 16 0 1 1 1 0 24 0 1 1 1 1 16 1 0 0 0 0 12 1 0 0 0 1 24 1 0 0 1 0 48 1 0 0 1 1 96 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1 0 1 1 0 0 0 24 1 1 0 0 1 16 1 1 0 1 0 96 1 1 0 1 1 64 1 1 1 0 0 0 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 0 sum 608 marginals P(S = 1) = 418/608 P(O = 1) = 380/608 P(D = 1) = 320/608 P(M = 1) = 152/608 P(H = 1) = 320/608

Party Planning Example: SO O OD D DH O MO 1.24

Party Planning Example: SO O OD D DH O MO P = 1 Z ψ SOψ OD ψ MO ψ DH define potentials ψ to achieve equivalence: (arbitrarily) assign original potentials ψ to bags S O φ SO 0 0 2 0 1 1 1 0 2 1 1 4 O D φ OD 0 0 1 0 1 2 1 0 4 1 1 8 D H φ DH 0 0 3 0 1 6 1 0 3 1 1 2 M O φ MO 0 0 1 0 1 1 1 0 2 1 1 0 1.24

Party Planning Example S O φ SO 0 0 2 0 1 1 1 0 2 1 1 4 O φ O 0 1 1 1 O D φ OD 0 0 1 0 1 2 1 0 4 1 1 8 D φ D 0 1 1 1 D H φ DH 0 0 3 0 1 6 1 0 3 1 1 2 SO O OD D DH O O φ O 0 1 1 1 MO M O φ MO 0 0 1 0 1 1 1 0 2 1 1 0 1. Initialize potentials: bags with original potentials ψ C separators with unity

Party Planning Example S O φ SO 0 0 2 0 1 1 1 0 2 1 1 4 O φ O 0 1 1 1 O D φ OD 0 0 1 0 1 2 1 0 4 1 1 8 D φ D 0 1 1 1 D H φ DH 0 0 3 0 1 6 1 0 3 1 1 2 SO O OD D DH O O φ O 0 1 1 1 MO M O φ MO 0 0 1 0 1 1 1 0 2 1 1 0 2. choose root

Party Planning Example S O φ SO 0 0 2 0 1 1 1 0 2 1 1 4 O φ O 0 4 1 5 O D φ OD 0 0 4 0 1 8 1 0 20 1 1 40 D φ D 0 1 1 1 D H φ DH 0 0 3 0 1 6 1 0 3 1 1 2 SO O OD D DH O O φ O 0 1 1 1 MO M O φ MO 0 0 1 0 1 1 1 0 2 1 1 0 3. pass messages from leaves to root

Party Planning Example S O φ SO 0 0 2 0 1 1 1 0 2 1 1 4 O φ O 0 4 1 5 O D φ OD 0 0 12 0 1 24 1 0 20 1 1 40 D φ D 0 1 1 1 D H φ DH 0 0 3 0 1 6 1 0 3 1 1 2 SO O OD D DH O O φ O 0 3 1 1 MO M O φ MO 0 0 1 0 1 1 1 0 2 1 1 0 3. pass messages from leaves to root

Party Planning Example S O φ SO 0 0 2 0 1 1 1 0 2 1 1 4 O φ O 0 4 1 5 O D φ OD 0 0 4 0 1 8 1 0 20 1 1 40 D φ D 0 32 1 64 D H φ DH 0 0 96 0 1 192 1 0 192 1 1 128 SO O OD D DH O O φ O 0 3 1 1 MO M O φ MO 0 0 1 0 1 1 1 0 2 1 1 0 3. pass messages from leaves to root

Party Planning Example S O φ SO 0 0 2 0 1 1 1 0 2 1 1 4 O φ O 0 4 1 5 O D φ OD 0 0 4 0 1 8 1 0 20 1 1 40 D φ D 0 32 1 64 D H φ DH 0 0 96 0 1 192 1 0 192 1 1 128 Z 608 SO O OD D DH O O φ O 0 3 1 1 MO M O φ MO 0 0 1 0 1 1 1 0 2 1 1 0

Party Planning Example S O φ SO 0 0 2 0 1 1 1 0 2 1 1 4 O φ O 0 4 1 5 O D φ OD 0 0 108 0 1 120 1 0 180 1 1 200 Z 608 D φ D 0 288 1 320 Z 608 D H φ DH 0 0 96 0 1 192 1 0 192 1 1 128 Z 608 SO O OD D DH O O φ O 0 3 1 1 MO M O φ MO 0 0 1 0 1 1 1 0 2 1 1 0 4. pass messages from root to leaves

Party Planning Example S O φ SO 0 0 2 0 1 1 1 0 2 1 1 4 O φ O 0 4 1 5 O D φ OD 0 0 108 0 1 120 1 0 180 1 1 200 Z 608 D φ D 0 288 1 320 Z 608 D H φ DH 0 0 96 0 1 192 1 0 192 1 1 128 Z 608 SO O OD D DH O O φ O 0 228 1 380 Z 608 MO M O φ MO 0 0 76 0 1 380 1 0 152 1 1 0 Z 608 4. pass messages from root to leaves

Party Planning Example S O φ SO 0 0 114 0 1 76 1 0 114 1 1 304 Z 608 O φ O 0 228 1 380 Z 608 O D φ OD 0 0 108 0 1 120 1 0 180 1 1 200 Z 608 D φ D 0 288 1 320 Z 608 D H φ DH 0 0 96 0 1 192 1 0 192 1 1 128 Z 608 SO O OD D DH O O φ O 0 228 1 380 Z 608 MO M O φ MO 0 0 76 0 1 380 1 0 152 1 1 0 Z 608 4. pass messages from root to leaves

Party Planning Example S O φ SO 0 0 114 0 1 76 1 0 114 1 1 304 Z 608 O φ O 0 228 1 380 Z 608 O D φ OD 0 0 108 0 1 120 1 0 180 1 1 200 Z 608 D φ D 0 288 1 320 Z 608 D H φ DH 0 0 96 0 1 192 1 0 192 1 1 128 Z 608 SO O OD D DH O O φ O 0 228 1 380 Z 608 MO M O φ MO 0 0 76 0 1 380 1 0 152 1 1 0 Z 608 4. pass messages from root to leaves

Results We get... Z = 608 P(S = 1) = 418/608 P(O = 1) = 380/608 P(D = 1) = 320/608 P(M = 1) = 152/608 P(H = 1) = 320/608 just as in the brute force results. 1.26

Maximization Viterbi arg max P(x) x can be computed with a conceptually very similar algorithm in same running time. 1.27

Approximation rough procedure 1 use a graph of bags (that need be not a tree decomposition) 2 pass messages according to some schedule 3 stop passing messages at will 1.28

Literature Christopher M. Bishop: Pattern Recognition and Machine Learning, 2006 (book, chapter online available at http://research.microsoft.com/ cmbishop/prml) The Junction Tree Algorithm, script of Chris Williams, School of Informatics, University of Edinburgh, 2009 Graphical models, message-passing algorithms, and variational methods: Part I, Martin Wainwright, www.eecs.berkeley.edu/ewainwrig/ 1.29