Exact Inference lgorithms for Probabilistic Reasoning; OMPSI 276 Fall 2007 1
elief Updating Smoking lung ancer ronchitis X-ray Dyspnoea P lung cancer=yes smoking=no, dyspnoea=yes =? 2
Probabilistic Inference Tasks elief updating: ELXi = PXi = x i evidence Finding most probable explanation MPE x* = argmaxpx, e Finding maximum a-posteriory hypothesis * * a1,...,ak = argmax Px, e a X/ Finding maximum-expected-utility MEU decision * * d1,..., dk = arg max Px, eux d X/D x X : hypothesis variables D X : decision variables U x : utility function 3
elief updating is NP-hard Each sat formula can be mapped to a bayesian network query. Example: u,~v,w and ~u,~w,y sat? 4
Motivation Given a chain show how it works How can we compute PD? PD =0? P D=0? D rute force Ok^4 5
elief updating: PX evidence=? Pa e=0 Pa,e=0= D E PaPb apc apd b,ape b,c= e = 0, d, c, b Moral graph Pa e=0 d Pc a c b Pb apd b,ape b,c Variable Elimination h a, d, c, e 6
7
8
9
10
ucket elimination lgorithm elim-bel Dechter 1996 b bucket : bucket : bucket D: bucket E: bucket : Pb a Pd b,a Pe b,c Pc a e=0 Pa h Pa e=0 a, h a, h D h E d, a, d, a e e c, e Elimination operator W*=4 induced width max clique size D E 11
12
E D D E 13
omplexity of elimination w * d O n exp w * d the induced width of moral graph along ordering d The effect of the ordering: E D D E D E Moral graph * * w d 1 = 4 w d 2 = 2 14
Finding small induced-width NP-complete tree has induced-width of? Greedy algorithms: Min width Min induced-width Max-cardinality Fill-in thought as the best See anytime min-width Gogate and Dechter 15
Different Induced graphs F E E D D D E F E D F F a b c d 16
17
The impact of observations G G G G D D D D F F F F a b c d 18
Use the ancestral graph only 19
From ucket elimination to bucket-tree elimination ucket G: PG F ucket F: PF, F λ G F ucket D: PD, ucket : P ucket : P ucket : P λ G λ F, λ,, D λ λ F PF, P P λ F, λ F λ G F, G PG F D PD, λ D λ, P T 20
21
22
23
ucket-tree propagation x 1 x 2 u hu,v v x n cluster 2 u =ψ u { h x1, u, h x, u,..., h xn, u, h v, u} ompute the message : h u, v = elim u, v f cluster u { h v, u} f 24
25
TE: full Execution ucket G: PG F ucket F: PF, ucket D: PD, ucket : P ucket : P F λ G λ F ucket : P λ F, λ,, D λ G F F D F,,, λ F F PF,, P λ P F Π,, Π F λ G F Π G Π F F λ D, G PG F D PD,, P λ D Π, 26
From buckets to superbucket to clusters G,F G,F G,F F F F F,, D,, F,, D,,,,,D,F,,,,,,,,, D F G 27
luster Tree Decomposition and elimination TE cluster-tree decomposition is a set of subsets of variables clusters connected by a tree structure: 1. Every function PT has at least one cluster that contains its scope. The function is assigned to one such cluster. 2. The cluster-tree obeys the running intersection property. Proposition: If T is a cluster-tree decomposition, then any tree obtained by merging adjacent clusters the variable set and the functions is also a tree-decomposition. Join-Tree: a tree-decomposition where all clusters are maximal. 28
Tree Decomposition for belief updating pa pb a pe b, f pc a, b D Pd b E pf c, d F pg e, f G 29
Tree Decomposition pa pa, pb a, pc a,b pc a, b pb a D Pd b E pe b, f D F pd b, pf c,d F E F pe b,f pf c, d F EF pg e, f G E F G pg e,f 30
Tree decompositions more formal tree decomposition for a belief network N =< X,D,G,P > is a triple < T, χ, ψ >, where T = V,E is a tree andχ andψ are labeling functions, associating with each vertex v V two sets, χv X and ψv P satisfying : 1. For each function pi P there is exactly one vertex such that pi ψv and scopepi χv 2. For each variable X X the set {v V X χv} forms a i connected subtree running intersection property i pa, pb a, pc a,b D F pd b, pf c,d F E F pe b,f EF elief network F D E G E F G pg e,f Tree decomposition 31
Same Message Passing x 1 x 2 u hu,v v x n cluster 2 u =ψ u { h x1, u, h x, u,..., h xn, u, h v, u} ompute the message : h u, v = elim u, v f cluster u { h v, u} f 32
luster Tree Elimination = h 1,2 b, c p a p b a p c a, b a 1 pa, pb a, pc a,b 2 D D F D F 2pd b, pd b, h h 1,2 1,2 b,c, pf c,d pf c,d 1 h 2,3 b, f = p d b h1,2 b, c p f c, d c, d F sep2,3 = {,F} elim2,3 = {,D} D E 3 E F pe b,f EF F G 4 E F G pg e,f 33
,, 1,2 b a c p a b p a p c b h a =,,, 3,2, 2,1 f b h d c f p b d p c b h f d = 2 1 DF TE: luster Tree Elimination 34,,, 1,2, 2,3 c b h d c f p b d p f b h d c =,,, 4,3 3,2 f e h f b e p f b h e =,,, 2,3 3,4 f b h f b e p f e h b =,, 4,3 f e g G p f e h e = = G E F D 4 3 EF EFG EF F Time: O expw+1 Space: O expsep For each cluster PX e is computed
Tree-Width & Separator 35
Finding tree-decompositions Good Join-trees using triangulation Tree-width can be generated using induced-width ordering heuristics 36
TE - properties orrectness and completeness: lgorithm TE is correct, i.e. it computes the exact joint probability of a single variable and the evidence. Time complexity: O deg n+n d w*+1 Space complexity: O N d sep where deg = the maximum degree of a node n = number of variables = number of PTs N = number of nodes in the tree decomposition d = the maximum domain size of a variable w* = the induced width sep = the separator size 37
Inference on trees is easy and distributed m XY X m YX X PY X m YT Y m TY Y PX m XZ X m ZX X Z = Z = m ZX X = P Z X m MZ Z m PZ X Z m YR Y Z m ZM Z m X = P X m X m RY Y m ZL Z Z m LZ m MZ PT Y PR Y PL Z PM Z m m m m MZ LZ XZ ZL ZM Z = Z = M L X X P M Z P L Z YX P Z X m XZ P Z X m XZ LZ X m X m Z MZ LZ Z Z elief updating = sum-prod Inference is time and space linear on trees 38
Pearl s elief Propagation polytree: a tree with Larger families polytree decomposition λ Z u = 1 P z 1 1 u 1 Z 1 Z 2 Z 3 λ Z 2 u 2 λ Z3 u 3 U 1 U 2 U 3 u1 Pu2 Pu3 X 1 Pz1 u1 Pz2 u2 Pz3 u3 Y 1 PX1,u1,1,u2,u3 Running TE = running Pearl s P over the dual graph Dual-graph: nodes are cpts, arcs connect non-empty intersections. Time and space linear propagatin Py1 x1 39
Iterative elief Propagation IP Loopy belief propagation elief propagation is exact for poly-trees IP - applying P iteratively to cyclic networks One step: update ELU 1 λ X 1 u 1 U 1 π U2 x 1 U 2 U 3 π U3 x 1 X 1 λ X u 2 1 X 2 No guarantees for convergence Works well for many coding networks 40
41
Finding MPE lgorithm elim-mpe Dechter 1996 MPE = max Px max b Elimination operator bucket : Pb a Pd b,a Pe b,c bucket : bucket D: bucket E: bucket : = is replaced by max : max P a P c a P b a P d a, e, d, c, b Pc a e=0 Pa h h MPE a, h D h E a, d, a, a d, e e c, x e a, b P e W*=4 induced width max clique size b, c D E 42
Generating the MPE-tuple 5. b' = arg max b Pd' b, a' Pb a' Pe' b, c' : Pb a Pd b,a Pe b,c 4. c' = arg max Pc h c a',d',c, e' a' : Pc a h a, d, c, e 3. d' = arg max h d a',d,e' D: h a,d,e 2. e' = 0 E: e=0 h D a, e 1. a' = arg max a Pa h E a : Pa h E a Return a', b',c',d',e' 43
44
45
46
47
onditioning generates the probability tree = = = 0,, 0, e b c b c b e P b a d P a c P a b P a P e a P 48 omplexity of conditioning: exponential time, linear space
onditioning+elimination = = = 0,, 0, e d c b c b e P b a d P a c P a b P a P e a P 49 Idea: conditioning until of a subproblem gets small w *
50
Loop-cutset decomposition You condition until you get a polytree =0 =1 =0 =0 =0 =1 =1 =1 F F F P F=0 = P, =0 F=0+P,=1 F=0 Loop-cutset method is time exp in loop-cutset size nd linear space 51
W-cutset algorithms Elim-cond-bel: Identify a w-cutset, _w, of the network For each assignment to the cutset solve by TE the conditioned subproblems ggregate the solutions over all cutset assignments. Time complexity: exp _w +w Space complexity: expw 52
ll algorithms genaralize to any graphical models Through general operations of combination and marginalization General E, TE, TE, P pplicable to Markov networks, to constraint optimization, to counting number of solutions in a ST formula, etc. 53
Tree-solving elief updating sum-prod SP consistency projection-join m XY X m YX X PX m XZ X m ZX X PY X m YT Y m YR Y m TY Y m RY Y PZ X m ZL Z m ZM Z m LZ Z m MZ Z PT Y PR Y PL Z PM Z MPE max-prod #SP sum-prod Inference is time and space linear on trees 54
Graphical Models graphical model X,D,F: X = {X 1, X n } variables D = {D 1, D n } domains F = {f 1,,f r } Operators: functions constraints, PTS, NFs combination elimination projection onditional Probability Table PT F PF, 0 0 0 0.14 0 0 1 0.96 0 1 0 0.40 0 1 1 0.60 1 0 0 0.35 1 0 1 0.65 1 1 0 0.72 1 1 1 0.68 E D Relation F red green blue blue red red blue blue green green red blue F f i : = F = + Primal graph interaction graph Tasks: elief updating: Σ X-y j P i MPE: max X j P j SP: X j j Max-SP: min X Σ j F j ll these tasks are NP-hard exploit problem structure identify special cases approximate 55
56
57
luster-tree Decomposition Let P=<X,D,F,,,{Z i }> be an automated reasoning problem. tree decomposition is <T,χ,ψ>, such that T=V,E is a tree χχ associates a set of variables χv X with each node ψ associates a set of functions ψv F with each node such that f i F, there is exactly one v such that f i ψv and scopef i χv. x X, the set {v V x χv} induces a connected subtree. i Z i χv for some v V. 58
Example: luster-tree 6 6 χ6 = {1,5,6} ψ 6 = { f 1 5,6, f 2 1,6} ij = X i X j 5 4 χ5 = {1,2,5} ψ 5 = { 2,5} f 3 5 3 χ3 = {2,3} ψ 3 = { 2,3} f 5 6 5 2 3 3 2 χ2 = {1,2} ψ 2 = { 1,2} f 6 2 4 χ4 = {1,4} ψ 4 = { 1,4} f 4 1 4 1 χ 1 = {1} ψ 1 = Ø 1 a b c Tree-width = 3 sep5,6 = {1, 5} 59
luster-tree Elimination TE m u,w = sepu,w [ j {m vj,u} ψu] χu ψu m v1,u... m v2,u m vi,u 60
Example: luster-tree Elimination 6 χ 6 = {1,5,6} ψ 6 = { f 1 5,6, f 2 1,6} χ 5 = {1,2,5} χ 3 = {2,3} ψ 5 = { 2,5} 5 3 ψ 3 = { 2,3} f 3 f 5 m m m 6,5 5,6 5,2 {1,5,6} { f 1, f 2 } 1,5 : = min 6{ f15,6 + 6 + f 1,6} 1,5 : = min { f 2 + m 3 2 2,5 1,2 : = min { f 5 + m 3 6,5 2,5 + 1,2} {1,2,5} { f 3 } 2,5 + 1,5} 5 {2,3} { f 5 } f 5 3 m 3,2 2 : = min 3{ f52,3} χ2 = {1,2} ψ 2 = { 1,2} f 6 χ 1 = {1} ψ 1 = Ø 2 1 4 χ4 = {1,4} ψ 4 = { 1,4} f 4 m 2,1 m2,5 1,2 : = f61,2 + + m 2 + m 1 + m 3,2 1 : = min { f 5,2 2 1,2 + m 6 1,2 1,2 + 3,2 2} m 1,2 1 : = m4,1 1 {1,2} { f 6 {1} } 2 1 m 2,3 2 : = min { f + m 5,2 1,2 + m {1,4} { f 4 1 } 6 1,2 + 1,2 4 1} m 4,1 1 : = min 4{ f41,4} m 1,4 1 : = m2,1 1 61