Exact Inference Algorithms Bucket-elimination COMPSCI 276, Spring 2013 Class 5: Rina Dechter (Reading: class notes chapter 4, Darwiche chapter 6) 1
Belief Updating Smoking lung Cancer Bronchitis X-ray Dyspnoea P (lung cancer=yes smoking=no, dyspnoea=yes ) =? 2
Probabilistic Inference Tasks Belief updating: E is a subset {X1,,Xn}, Y subset X-E, P(Y=y E=e) P(e)? BEL(X i ) = P(Xi = xi evidence) Finding most probable explanation (MPE) x* = argmax P(x,e) Finding maximum a-posteriory hypothesis * * (a1,...,ak) = argmax P(x, e) a X/A Finding maximum-expected-utility (MEU) decision * * (d1,...,dk ) = argmax P(x, e)u(x) d X/D x A X : hypothesis variables D X : decision variables U( x) : utility function 4
Belief updating is NP-hard 5
A simple network Given: A B C D How can we compute P(D)?, P(D A=0)? P(A D=0)? Brute force O(k^4) Maybe O(4k^2) 6
Belief updating: P(X evidence)=? A P(a e=0) P(a,e=0)= B C e = 0, d, c, b P(a)P(b a)p(c a)p(d b,a)p(e b,c)= D E Moral graph P(a) e=0 d P(c a) c b P(b a)p(d b,a)p(e b,c) Variable Elimination h B ( a, d, c, e) 10
A B C D E 11
A B C D E 12
A B C D E Using a different 13
14
A Bayesian network ordering: C,B,E,D,G 20
A Bayesian network ordering: C,B,E,D,G 21
A different ordering 22
A Bayesian network processed along 2 orderings 23
A Bucket elimination Algorithm BE-bel (Dechter 1996) b bucket B: bucket C: bucket D: bucket E: bucket A: P(b a) P(d b,a) P(e b,c) P(c a) e=0 P(a) P(a e=0) h B (a,d,c,e) h C (a,d,e) h D (a,e) h E (a) Elimination operator W*=4 induced width (max clique size) D B B C D E A E C 24
BE-BEL 25
26
Student Network example Difficulty Intelligence P(J)? Grade SAT Letter Apply Job
E D C B A B C D E A 28
Complexity of elimination w * ( d ) O( n exp( w * ( d)) the induced width of moral graph along ordering d The effect of the ordering: A B C E D B C D C E B D E Moral graph A A * * w ( d 1 ) = 4 w ( d 2 ) = 2 31
BE-BEL More accurately: O(r exp(w*(d)) where r is the number of cpts. For Bayesian networks r=n. For Markov networks? 32
34
The impact of observations G G G G B B B B C C C C D D D D F F F F A A A A (a) (b) (c) (d) Ordered graph Induced graph Ordered conditioned graph 35
A B C D E Moral graph BE-BEL Use the ancestral graph only 36
Probabilistic Inference Tasks Belief updating: BEL(X i ) = P(Xi = x i evidence) Finding most probable explanation (MPE) x* = argmax P(x,e) Finding maximum a-posteriory hypothesis * * (a1,...,ak) = argmax P(x, e) a x X/A A X : hypothesis variables 42
43 Finding Algorithm BE-mpe max P(x) MPE x = ), ( ), ( ) ( ) ( ) ( max by replaced is,,,, c b e P b a d P a b P a c P a P MPE : b c d e a = max A D E C B
Finding x Algorithm elim-mpe (Dechter 1996) MPE = bucket B: bucket C: bucket D: bucket E: bucket A: max b P(b a) P(d b,a) P(e b,c) P(c a) e=0 P(a) MPE MPE = is replaced by max : max P( a) P( c a) P( b a) P( d a, e, d, c, b h B (a,d,c,e) h C (a,d,e) h D (a,e) h E (a) max P(x) a, b) P( e Elimination operator W*=4 induced width (max clique size) D b, c) B B C D E A A E C 44
Generating the MPE-tuple 5. b' = arg max P(b a' ) b P(d' b,a' ) P(e' b,c' ) B: P(b a) P(d b,a) P(e b,c) 4. c' = h arg max P(c a' B c (a',d',c,e' ) ) C: P(c a) h B (a,d,c,e) 3. d' = arg max h d C (a',d,e' ) D: h C (a,d,e) 2. e' = 0 E: e=0 h D (a, e) 1. a' = arg max P(a) h a E (a) A: P(a) h E (a) Return (a',b',c',d',e' ) 45
47
49 Finding MAP Algorithm BE-map ), ( ), ( ) ( ) ( ) ( max, c b e P b a d P a b P a c P a P MPE e,d,b c a = : max and A D E C B
50
Algorithm BE-MAP Variable ordering: Restricted: Max buckets should Be processed after sum buckets 52
BE for Markov networks queries 53
O(nexp(w*+1)) and O(n exp(w*)), respectively More accurately: O(r exp(w*(d)) where r is the number of cpts. For Bayesian networks r=n. For Markov networks? 54
Finding small induced-width NP-complete A tree has induced-width of? Greedy algorithms: Min width Min induced-width Max-cardinality Fill-in (thought as the best) See anytime min-width (Gogate and Dechter) 56
Type of graphs 57
The induced width 58
Min-width ordering Proposition: algorithm min-width finds a min-width ordering of a graph 59
Greedy orderings heuristics Theorem: A graph is a tree iff it has both width and induced-width of 1. 60
Different Induced-graphs 61
Induced-width for chordal graphs Definition: A graph is chordal if every cycle of length at least 4 has a chord Finding w* over chordal graph is easy using the maxcardinality ordering: order vertices from 1 to n, always assigning the next number to the node connected to a largest set of previously numbered nodes. Lets d be such an ordering A graph along max-cardinality order has no fill-in edges iff it is chordal. On chordal graphs width=induced-width. 62
Max-cardinality ordering What is the complexity of min-fill? Min-induced-width? 63
K-trees 64
Which greedy algorithm is best? 65
Recent work in my group Vibhav Gogate and Rina Dechter. "A Complete Anytime Algorithm for Treewidth". In UAI 2004. Andrew E. Gelfand, Kalev Kask, and Rina Dechter. "Stopping Rules for Randomized Greedy Triangulation Schemes" in Proceedings of AAAI 2011. Potential project 66