COMPSCI 276 Fall 2007

Similar documents
Exact Inference Algorithms Bucket-elimination

Exact Inference Algorithms Bucket-elimination

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Reasoning with Deterministic and Probabilistic graphical models Class 2: Inference in Constraint Networks Rina Dechter

Bayesian Networks Representation and Reasoning

Bayesian Networks: Representation, Variable Elimination

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Bounded Inference Non-iteratively; Mini-Bucket Elimination

CS281A/Stat241A Lecture 19

Machine Learning 4771

Variable Elimination: Algorithm

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

COMP538: Introduction to Bayesian Networks

Inference as Optimization

Weighted Heuristic Anytime Search

Lecture 17: May 29, 2002

Variable Elimination (VE) Barak Sternberg

Probabilistic Graphical Models (I)

Statistical Approaches to Learning and Discovery

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

Lecture 9: PGM Learning

Machine Learning Lecture 14

CS 188: Artificial Intelligence. Bayes Nets

Inference in Bayesian Networks

Variable Elimination: Algorithm

4 : Exact Inference: Variable Elimination

Sampling Algorithms for Probabilistic Graphical models

Bayesian Networks. Exact Inference by Variable Elimination. Emma Rollon and Javier Larrosa Q

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Inference and Representation

Junction Tree, BP and Variational Methods

Bayes Networks 6.872/HST.950

Capturing Independence Graphically; Undirected Graphs

On finding minimal w-cutset problem

Undirected Graphical Models

Conditional Independence

Alternative Parameterizations of Markov Networks. Sargur Srihari

Methods of Exact Inference. Cutset Conditioning. Cutset Conditioning Example 1. Cutset Conditioning Example 2. Cutset Conditioning Example 3

Computational Complexity of Inference

Cutset sampling for Bayesian networks

Exact Inference: Clique Trees. Sargur Srihari

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Uncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes

CSC 412 (Lecture 4): Undirected Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models

Directed Graphical Models or Bayesian Networks

CS6220: DATA MINING TECHNIQUES

Graphical Models and Kernel Methods

Probabilistic Graphical Models

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Variational algorithms for marginal MAP

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Graph structure in polynomial systems: chordal networks

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Graphical Model Inference with Perfect Graphs

12 : Variational Inference I

Chordal networks of polynomial ideals

Alternative Parameterizations of Markov Networks. Sargur Srihari

Mini-Buckets: A General Scheme for Bounded Inference

Message Passing Algorithms and Junction Tree Algorithms

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Example: multivariate Gaussian Distribution

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

5. Sum-product algorithm

Learning MN Parameters with Approximation. Sargur Srihari

Graph structure in polynomial systems: chordal networks

Bayesian Networks: Belief Propagation in Singly Connected Networks

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Undirected Graphical Models: Markov Random Fields

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Capturing Independence Graphically; Undirected Graphs

Bayesian Machine Learning

CS6220: DATA MINING TECHNIQUES

Lecture 6: Graphical Models

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Intelligent Systems (AI-2)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Introduc)on to Ar)ficial Intelligence

Inference in Bayes Nets

Context-specific independence Parameter learning: MLE

6.867 Machine learning, lecture 23 (Jaakkola)

Probabilistic Graphical Models

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

Bayes Nets: Independence

Lecture 8: PGM Inference

13 : Variational Inference: Loopy Belief Propagation

Rapid Introduction to Machine Learning/ Deep Learning

9 Forward-backward algorithm, sum-product on factor graphs

Chapter 16. Structured Probabilistic Models for Deep Learning

Machine Learning 4771

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Directed and Undirected Graphical Models

Graphical Models. Andrea Passerini Statistical relational learning. Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian networks. Instructor: Vincent Conitzer

Probabilistic Graphical Models

{ p if x = 1 1 p if x = 0

Transcription:

Exact Inference lgorithms for Probabilistic Reasoning; OMPSI 276 Fall 2007 1

elief Updating Smoking lung ancer ronchitis X-ray Dyspnoea P lung cancer=yes smoking=no, dyspnoea=yes =? 2

Probabilistic Inference Tasks elief updating: ELXi = PXi = x i evidence Finding most probable explanation MPE x* = argmaxpx, e Finding maximum a-posteriory hypothesis * * a1,...,ak = argmax Px, e a X/ Finding maximum-expected-utility MEU decision * * d1,..., dk = arg max Px, eux d X/D x X : hypothesis variables D X : decision variables U x : utility function 3

elief updating is NP-hard Each sat formula can be mapped to a bayesian network query. Example: u,~v,w and ~u,~w,y sat? 4

Motivation Given a chain show how it works How can we compute PD? PD =0? P D=0? D rute force Ok^4 5

elief updating: PX evidence=? Pa e=0 Pa,e=0= D E PaPb apc apd b,ape b,c= e = 0, d, c, b Moral graph Pa e=0 d Pc a c b Pb apd b,ape b,c Variable Elimination h a, d, c, e 6

7

8

9

10

ucket elimination lgorithm elim-bel Dechter 1996 b bucket : bucket : bucket D: bucket E: bucket : Pb a Pd b,a Pe b,c Pc a e=0 Pa h Pa e=0 a, h a, h D h E d, a, d, a e e c, e Elimination operator W*=4 induced width max clique size D E 11

12

E D D E 13

omplexity of elimination w * d O n exp w * d the induced width of moral graph along ordering d The effect of the ordering: E D D E D E Moral graph * * w d 1 = 4 w d 2 = 2 14

Finding small induced-width NP-complete tree has induced-width of? Greedy algorithms: Min width Min induced-width Max-cardinality Fill-in thought as the best See anytime min-width Gogate and Dechter 15

Different Induced graphs F E E D D D E F E D F F a b c d 16

17

The impact of observations G G G G D D D D F F F F a b c d 18

Use the ancestral graph only 19

From ucket elimination to bucket-tree elimination ucket G: PG F ucket F: PF, F λ G F ucket D: PD, ucket : P ucket : P ucket : P λ G λ F, λ,, D λ λ F PF, P P λ F, λ F λ G F, G PG F D PD, λ D λ, P T 20

21

22

23

ucket-tree propagation x 1 x 2 u hu,v v x n cluster 2 u =ψ u { h x1, u, h x, u,..., h xn, u, h v, u} ompute the message : h u, v = elim u, v f cluster u { h v, u} f 24

25

TE: full Execution ucket G: PG F ucket F: PF, ucket D: PD, ucket : P ucket : P F λ G λ F ucket : P λ F, λ,, D λ G F F D F,,, λ F F PF,, P λ P F Π,, Π F λ G F Π G Π F F λ D, G PG F D PD,, P λ D Π, 26

From buckets to superbucket to clusters G,F G,F G,F F F F F,, D,, F,, D,,,,,D,F,,,,,,,,, D F G 27

luster Tree Decomposition and elimination TE cluster-tree decomposition is a set of subsets of variables clusters connected by a tree structure: 1. Every function PT has at least one cluster that contains its scope. The function is assigned to one such cluster. 2. The cluster-tree obeys the running intersection property. Proposition: If T is a cluster-tree decomposition, then any tree obtained by merging adjacent clusters the variable set and the functions is also a tree-decomposition. Join-Tree: a tree-decomposition where all clusters are maximal. 28

Tree Decomposition for belief updating pa pb a pe b, f pc a, b D Pd b E pf c, d F pg e, f G 29

Tree Decomposition pa pa, pb a, pc a,b pc a, b pb a D Pd b E pe b, f D F pd b, pf c,d F E F pe b,f pf c, d F EF pg e, f G E F G pg e,f 30

Tree decompositions more formal tree decomposition for a belief network N =< X,D,G,P > is a triple < T, χ, ψ >, where T = V,E is a tree andχ andψ are labeling functions, associating with each vertex v V two sets, χv X and ψv P satisfying : 1. For each function pi P there is exactly one vertex such that pi ψv and scopepi χv 2. For each variable X X the set {v V X χv} forms a i connected subtree running intersection property i pa, pb a, pc a,b D F pd b, pf c,d F E F pe b,f EF elief network F D E G E F G pg e,f Tree decomposition 31

Same Message Passing x 1 x 2 u hu,v v x n cluster 2 u =ψ u { h x1, u, h x, u,..., h xn, u, h v, u} ompute the message : h u, v = elim u, v f cluster u { h v, u} f 32

luster Tree Elimination = h 1,2 b, c p a p b a p c a, b a 1 pa, pb a, pc a,b 2 D D F D F 2pd b, pd b, h h 1,2 1,2 b,c, pf c,d pf c,d 1 h 2,3 b, f = p d b h1,2 b, c p f c, d c, d F sep2,3 = {,F} elim2,3 = {,D} D E 3 E F pe b,f EF F G 4 E F G pg e,f 33

,, 1,2 b a c p a b p a p c b h a =,,, 3,2, 2,1 f b h d c f p b d p c b h f d = 2 1 DF TE: luster Tree Elimination 34,,, 1,2, 2,3 c b h d c f p b d p f b h d c =,,, 4,3 3,2 f e h f b e p f b h e =,,, 2,3 3,4 f b h f b e p f e h b =,, 4,3 f e g G p f e h e = = G E F D 4 3 EF EFG EF F Time: O expw+1 Space: O expsep For each cluster PX e is computed

Tree-Width & Separator 35

Finding tree-decompositions Good Join-trees using triangulation Tree-width can be generated using induced-width ordering heuristics 36

TE - properties orrectness and completeness: lgorithm TE is correct, i.e. it computes the exact joint probability of a single variable and the evidence. Time complexity: O deg n+n d w*+1 Space complexity: O N d sep where deg = the maximum degree of a node n = number of variables = number of PTs N = number of nodes in the tree decomposition d = the maximum domain size of a variable w* = the induced width sep = the separator size 37

Inference on trees is easy and distributed m XY X m YX X PY X m YT Y m TY Y PX m XZ X m ZX X Z = Z = m ZX X = P Z X m MZ Z m PZ X Z m YR Y Z m ZM Z m X = P X m X m RY Y m ZL Z Z m LZ m MZ PT Y PR Y PL Z PM Z m m m m MZ LZ XZ ZL ZM Z = Z = M L X X P M Z P L Z YX P Z X m XZ P Z X m XZ LZ X m X m Z MZ LZ Z Z elief updating = sum-prod Inference is time and space linear on trees 38

Pearl s elief Propagation polytree: a tree with Larger families polytree decomposition λ Z u = 1 P z 1 1 u 1 Z 1 Z 2 Z 3 λ Z 2 u 2 λ Z3 u 3 U 1 U 2 U 3 u1 Pu2 Pu3 X 1 Pz1 u1 Pz2 u2 Pz3 u3 Y 1 PX1,u1,1,u2,u3 Running TE = running Pearl s P over the dual graph Dual-graph: nodes are cpts, arcs connect non-empty intersections. Time and space linear propagatin Py1 x1 39

Iterative elief Propagation IP Loopy belief propagation elief propagation is exact for poly-trees IP - applying P iteratively to cyclic networks One step: update ELU 1 λ X 1 u 1 U 1 π U2 x 1 U 2 U 3 π U3 x 1 X 1 λ X u 2 1 X 2 No guarantees for convergence Works well for many coding networks 40

41

Finding MPE lgorithm elim-mpe Dechter 1996 MPE = max Px max b Elimination operator bucket : Pb a Pd b,a Pe b,c bucket : bucket D: bucket E: bucket : = is replaced by max : max P a P c a P b a P d a, e, d, c, b Pc a e=0 Pa h h MPE a, h D h E a, d, a, a d, e e c, x e a, b P e W*=4 induced width max clique size b, c D E 42

Generating the MPE-tuple 5. b' = arg max b Pd' b, a' Pb a' Pe' b, c' : Pb a Pd b,a Pe b,c 4. c' = arg max Pc h c a',d',c, e' a' : Pc a h a, d, c, e 3. d' = arg max h d a',d,e' D: h a,d,e 2. e' = 0 E: e=0 h D a, e 1. a' = arg max a Pa h E a : Pa h E a Return a', b',c',d',e' 43

44

45

46

47

onditioning generates the probability tree = = = 0,, 0, e b c b c b e P b a d P a c P a b P a P e a P 48 omplexity of conditioning: exponential time, linear space

onditioning+elimination = = = 0,, 0, e d c b c b e P b a d P a c P a b P a P e a P 49 Idea: conditioning until of a subproblem gets small w *

50

Loop-cutset decomposition You condition until you get a polytree =0 =1 =0 =0 =0 =1 =1 =1 F F F P F=0 = P, =0 F=0+P,=1 F=0 Loop-cutset method is time exp in loop-cutset size nd linear space 51

W-cutset algorithms Elim-cond-bel: Identify a w-cutset, _w, of the network For each assignment to the cutset solve by TE the conditioned subproblems ggregate the solutions over all cutset assignments. Time complexity: exp _w +w Space complexity: expw 52

ll algorithms genaralize to any graphical models Through general operations of combination and marginalization General E, TE, TE, P pplicable to Markov networks, to constraint optimization, to counting number of solutions in a ST formula, etc. 53

Tree-solving elief updating sum-prod SP consistency projection-join m XY X m YX X PX m XZ X m ZX X PY X m YT Y m YR Y m TY Y m RY Y PZ X m ZL Z m ZM Z m LZ Z m MZ Z PT Y PR Y PL Z PM Z MPE max-prod #SP sum-prod Inference is time and space linear on trees 54

Graphical Models graphical model X,D,F: X = {X 1, X n } variables D = {D 1, D n } domains F = {f 1,,f r } Operators: functions constraints, PTS, NFs combination elimination projection onditional Probability Table PT F PF, 0 0 0 0.14 0 0 1 0.96 0 1 0 0.40 0 1 1 0.60 1 0 0 0.35 1 0 1 0.65 1 1 0 0.72 1 1 1 0.68 E D Relation F red green blue blue red red blue blue green green red blue F f i : = F = + Primal graph interaction graph Tasks: elief updating: Σ X-y j P i MPE: max X j P j SP: X j j Max-SP: min X Σ j F j ll these tasks are NP-hard exploit problem structure identify special cases approximate 55

56

57

luster-tree Decomposition Let P=<X,D,F,,,{Z i }> be an automated reasoning problem. tree decomposition is <T,χ,ψ>, such that T=V,E is a tree χχ associates a set of variables χv X with each node ψ associates a set of functions ψv F with each node such that f i F, there is exactly one v such that f i ψv and scopef i χv. x X, the set {v V x χv} induces a connected subtree. i Z i χv for some v V. 58

Example: luster-tree 6 6 χ6 = {1,5,6} ψ 6 = { f 1 5,6, f 2 1,6} ij = X i X j 5 4 χ5 = {1,2,5} ψ 5 = { 2,5} f 3 5 3 χ3 = {2,3} ψ 3 = { 2,3} f 5 6 5 2 3 3 2 χ2 = {1,2} ψ 2 = { 1,2} f 6 2 4 χ4 = {1,4} ψ 4 = { 1,4} f 4 1 4 1 χ 1 = {1} ψ 1 = Ø 1 a b c Tree-width = 3 sep5,6 = {1, 5} 59

luster-tree Elimination TE m u,w = sepu,w [ j {m vj,u} ψu] χu ψu m v1,u... m v2,u m vi,u 60

Example: luster-tree Elimination 6 χ 6 = {1,5,6} ψ 6 = { f 1 5,6, f 2 1,6} χ 5 = {1,2,5} χ 3 = {2,3} ψ 5 = { 2,5} 5 3 ψ 3 = { 2,3} f 3 f 5 m m m 6,5 5,6 5,2 {1,5,6} { f 1, f 2 } 1,5 : = min 6{ f15,6 + 6 + f 1,6} 1,5 : = min { f 2 + m 3 2 2,5 1,2 : = min { f 5 + m 3 6,5 2,5 + 1,2} {1,2,5} { f 3 } 2,5 + 1,5} 5 {2,3} { f 5 } f 5 3 m 3,2 2 : = min 3{ f52,3} χ2 = {1,2} ψ 2 = { 1,2} f 6 χ 1 = {1} ψ 1 = Ø 2 1 4 χ4 = {1,4} ψ 4 = { 1,4} f 4 m 2,1 m2,5 1,2 : = f61,2 + + m 2 + m 1 + m 3,2 1 : = min { f 5,2 2 1,2 + m 6 1,2 1,2 + 3,2 2} m 1,2 1 : = m4,1 1 {1,2} { f 6 {1} } 2 1 m 2,3 2 : = min { f + m 5,2 1,2 + m {1,4} { f 4 1 } 6 1,2 + 1,2 4 1} m 4,1 1 : = min 4{ f41,4} m 1,4 1 : = m2,1 1 61