Symbolic Methods for Probabilistic Inference, Optimization, and Decision-making Scott Sanner

Size: px

Start display at page:

Download "Symbolic Methods for Probabilistic Inference, Optimization, and Decision-making Scott Sanner"

Gerald Craig
6 years ago
Views:

collaborators: Zahra Zamani, Ehsan Abbasnejad, Karina Valdivia

1 Symbolic Methods for Probabilistic Inference, Optimization, and Decision-making Scott Sanner With much thanks to research collaborators: Zahra Zamani, Ehsan Abbasnejad, Karina Valdivia Delgado, Leliane Nunes de Barros, Luis Gustavo Rocha Vianna, Cheng (Simon) Fang

2 Graphical Models are Pervasive in AI Medical Pathfinder: Expert System BUGS: Epidemiology Text LDA and 1,000 Extensions Vision Ising Model! Robotics Dynamics and sensor models

3 Graphical Models + Symbolic Methods Specify a graphical model for problem at hand Text, vision, robotics, etc. Goal: efficient inference and optimization in this model Be it discrete or continuous Symbolic methods (e.g., decision diagrams) facilitate this! Useful as building block in any inference algorithm Exploit structure for compactness, efficient computation Automagically! Exploit more structure than graphical model alone Still partially a dream, but many recent advances

4 Tutorial Outline Part I: Symbolic Methods for Discrete Inference Graphical Models and Influence Diagrams Symbolic Inference with Decision Diagrams Part II: Extensions to Continuous Inference Case Calculus Extended ADD (XADD) Part III: Applications Graphical Model Inference Sequential Decision-making Constrained Optimization

5 Part I: Symbolic Methods for Discrete Inference and Optimization

6 Directed Graphical Models Bayesian Network: compact (factored) specification of joint probability e.g., have binary variables B, F, A, H,P: Bubonic Plague Stomach Flu Appendicitis Severe Headache Abdominal Pain joint factorizes according to model structure: P(B,F,A,H,P) = P(H B,F) P(P F,A) P(B) P(F) P(A)

7 Undirected Graphical Models Markov Random Fields (MRFs) Variables take on possible configurations V 1 V 2 V 3 V 4 Factors denote compatibility of configurations P(V 1,V 2,V 3,V 4 ) = 1 Z F(V 1,V 2 ) F(V 2,V 4 ) F(V 1,V 3 ) F(V 3,V 4 ) Conditional MRFs (CRFs) P(V 1,V 2 V 3,V 4 ) = How to compute normalizer Z? 1 Z(V 3,V F(V 4 ) 1,V 2 ) F(V 2,V 4 ) F(V 1,V 3 ) F(V 3,V 4 ) Note: representation above is factor graph (FG), which works for directed models as well. For FGs of directed models, what conditions also hold?)

8 Dynamical Models & Influence Diagrams Dynamical models t t+1 A Represent times t, t+1 Assume stationary distribution X 1 X 2 X 1 X 2 U Influence diagrams Action nodes [squares] Not random variables Rather controlled variables Utility nodes <diamonds> A utility conditioned on state, e.g. U(X 1,X 2 ) = if (X 1 =X 2 ) then 10 else 0

9 Discrete Inference & Optimization Probabilistic Queries in Bayesian networks: Want P(Query Evidence), e.g. E A B X P(E X) = P(E,X) / P(X) A B P(A,B,E,X) = A B P(A B,E) P(X B) P(E) P(X) A t+1 t X X Maximizing Exp. Utility in Influence Diagrams: Want optimal action a* = argmax a E[U A=a, ], e.g. a* = argmax a E[U A=a,X=x] = argmax a X U(X )P(X A=a,X=x)

10 Manipulating Discrete Distributions Marginalization X P(A; b) = P(A) b X b A B Pr = A Pr

11 Manipulating Discrete Distributions Maximization max b P(A; b) = P(A) A B Pr b = max A Pr 0.3 (B=1) 1.4 (B=0)

12 Manipulating Discrete Distributions Binary Multiplication P(AjB) P(BjC) = P(A; BjC) A B Pr B C Pr A B C Pr = Same principle holds for all binary ops +, -, /, max, etc

13 Discrete Inference & Optimization Observation 1: all discrete functions can be tables P(A,B) = A B Pr Observation 2: all operations computable in closed-form f 1 f 2, f 1 f 2 max( f 1, f 2 ), min( f 1, f 2 ) x f(x) (arg)max x f(x), (arg)min x f(x) Now can do exact inference and optimization in discrete graphical models and influence diagrams!

14 Discrete Inference & Optimization Probabilistic Queries in Bayesian networks: Want P(Query Evidence), e.g. E A B X P(E X) = P(E,X) / P(X) Just products and marginals. A B P(A,B,E,X) = A B P(A B,E) P(X B) P(E) P(X) A t+1 t X X Maximizing Exp. Utility in Influence Diagrams: Want optimal action a* = argmax a E[U A=a, ], e.g. a* = argmax a E[U A=a,X=x] = argmax a X U(X )P(X A=a,X=x) Product, max, and marginals.

15 Where are we? We can specify discrete models We know operations needed for inference Can we optimize the order of operations?

16 Variable Elimination When marginalizing over y, try to factor out all probabilities independent of y: P (X 1 ) = X x 2 ;:::;x n ;y = P (X 1 ) {z } O(1) P (yjx 1 ; :::; x n )P (X 1 ) P (x n ) X X X P (x 2 ) P (x n ) x 2 x n {z } =O(n) y P (yjx 1 ; :::; x n ) {z } =O(1) Curly braces show number of FLOPS In tabular case.

17 VE: Variable Order Matters Sum commutes, can change order of elimination: P (X 1 ) = Original query, changed var order X P (yjx 1 ; :::; x n )P (X 1 ) P (x n ) y;x n ;:::;x 2 = X X P (X 1 )P (x 3 ) P (x n ) P (x 2 )P (yjx 1 ; :::; x n ) y;x n ;:::;x 3 x 2 {z } =O(2 n+1 ) With different variable order: O(n) O(2 n+1 ) Good variable order: minimizes #vars in largest intermediate factor a.k.a., ~tree width (TW) = n+1 Graphical model inference is ~O(2 TW ) Actually TW+1 but the point is exponential

18 Recap Graphical Models and Influence Diagrams why should you care? Versus non-factored models GMs and IDs allow exponential space savings in representation exponential time savings in inference exploit factored structure in variable elimination exponential data reduction for learning smaller models = fewer parameters = less data

19 Where are we? We can specify discrete models We know operations needed for inference We know how to optimize order of operations Is this it? Is there more structure to exploit?

20 Symbolic Inference with Decision Diagrams For Discrete Models

21 DD Definition Decision diagrams (DDs): DAG variant of decision tree b a b Decision tests ordered Used to represent: f: B n B (boolean BDD, set of subsets {{a,b},{a}} ZDD) f: B n Z (integer MTBDD / ADD) f: B n R (real ADD) 1 0 We ll focus on ADDs in this tutorial.

22 What s the Big Deal? More than compactness f a g b Ordered decision tests in DDs support efficient operations b 1 b 0 c 0 d 1 ADD: -f, f g, f g, max(f, g) BDD: f, f g, f g ZDD: f \ g, f g, f g Efficient operations key to inference

23 Function Representation (Tables) How to represent functions: B n R? How about a fully enumerated table OK, but can we be more compact? a b c F(a,b,c)

Function Representation (Trees) How about a tree? Sure, can simplify. a b c F(a,b,c) 0 0 0 0.00 0 0 1 0.

24 Function Representation (Trees) How about a tree? Sure, can simplify. a b c F(a,b,c) Context-specific independence! c a b c

25 Function Representation (ADDs) Why not a directed acyclic graph (DAG)? a b c F(a,b,c) c 1 0 a Algebraic b Decision Diagram (ADD) c Exploits context-specific independence (CSI) and shared substructure.

26 Function Representation (ADDs) Why not a directed acyclic graph (DAG)? a b c F(a,b,c) a c 1 0 b Algebraic Decision Diagram (ADD) Exploits context-specific independence (CSI) and shared substructure.

27 Trees vs. ADDs AND OR XOR x 1 x 1 x 1 x 2 x 2 x 2 x 2 x 3 x 3 x 3 x Trees can compactly represent AND / OR But not XOR (linear as ADD, exponential as tree) Why? Trees must represent every path

28 Binary Operations (ADDs) Why do we order variable tests? Enables us to do efficient binary operations a b a c Result: ADD operations can avoid state enumeration a b c 0 2 c

29 Summary We need B n B / Z / R We need compact representations We need efficient operations DDs are a promising candidate Great, tell me all about DDs OK Not claiming DDs solve all problems but often better than tabular approach

30 Decision Diagrams: Reduce (how to build canonical DDs)

31 How to Reduce Ordered Tree to ADD? Recursively build bottom up Hash terminal nodes R ID leaf cache Hash non-terminal functions (v, ID 0, ID 1 ) ID special case: (v,id,id) ID others: keep in (reduce) cache (x 2,2,2) 2 (x 1,1,0) 2 (x 1,1,0)

32 GetNode Removes redundant branches Maintains cache of internal nodes v 1 F F Algorithm 1: GetNode(v; F h ; F l i)! F r input : v; F h ; F l : Var and node ids for high/low branches output: F r : Return values for o set, multiplier, and canonical node id begin // If branches redundant, return child if (F l = F h ) then return F l ; // Make new node if not in cache if (hv; F h ; F l! id is not in node cache) then id := currently unallocated id; insert hv; F h ; F l ii! id in cache; // Return the cached, canonical node return id ; end

33 Reduce Algorithm Algorithm 1: Reduce(F )! F r input : F : Node id output: F r : Canonical node id for reduced ADD begin // Check for terminal node if (F is terminal node) then return canonical terminal node for value of F ; // Check reduce cache if ( F! F r is not in reduce cache) then // Not in cache, so recurse F h := Reduce(F h ); F l := Reduce(F l ); // Retrieve canonical form F r := GetNode(F var ; F h ; F l ); // Put in cache insert F! F r in reduce cache; // Return canonical reduced node return F r ; end

34 Reduce Complexity Linear in size of input Input can be tree or DAG x 1 x 2 x 2 Because of caching Explores each node once Does not need to explore all branches x 3 x 3 x 4 x 4 x 5 x 5 1 0

35 Canonicity of ADDs via Reduce Claim: if two functions are identical, Reduce will assign both functions same ID By induction on var order Base case: Canonical for 0 vars: terminal nodes Inductive: Assume canonical for k-1 vars v k GetNode result canonical for k th var (only one way to represent) x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 ID 1 ID 0 GetNode Unique ID 1 0

36 Impact of Variable Orderings Good orders can matter Original var labels Vars relabeled Good orders typically have related vars together e.g., low tree-width order in transition graphical model Graph-Based Algorithms for Boolean Function Manipulation Randal E. Bryant; IEEE Transactions on Computers 1986.

37 In-diagram Reordering Rudell s sifting algorithm Global reordering as pairwise swapping Only need to redirect arcs Helps to use pointers then don t need to redirect parents, e.g., a b b b Swap(a,b) a a ID 1 ID 2 ID 3 ID 4 ID 1 ID 3 ID 2 ID 4 Can also do reorder using Apply later

38 Beyond Binary Variables Multivalued (MV-)DDs A DD with multivalued variables straightforward k-branch extension e.g., k=6 R R=1 R=2 R=6 ID 1 ID 2 ID 6 Works for ADD extensions as well

39 Decision Diagrams: Apply (how to do efficient operations on DDs)

40 Recap Recall the Apply recursion a b a c Result: ADD operations can avoid state enumeration a b Need to handle recursive cases c 0 2 c 1 0 Need to handle base cases 0 2

41 Apply Recursion Need to compute F 1 op F 2 e.g., op {,,, } Case 1: F 1 & F 2 match vars F h = Apply(F 1;h ; F 2;h ; op) F l = Apply(F 1;l ; F 2;l ; op) F r = GetNode(F var 1 ; F h ; F l ) F r F 1 F 2 v 1 F 1,h F 1,l (op) v 1 F 2,h F 2,l v 1 Apply (F 1,h, F 2,h,op) Apply (F 1,l, F 2,l,op)

42 Apply Recursion Need to compute F 1 op F 2 e.g., op {,,, } Case 2: Non-matching var: v 1 v 2 F h = Apply(F 1 ; F 2;h ; op) F l = Apply(F 1 ; F 2;l ; op) F r = GetNode(F var 2 ; F h ; F l ) F 1 F 2 v 2 (op) v 1 F r F 1,h F 1,l F 2,h F 2,l v 1 Apply (F 1, F 2,h,op) Apply (F 1, F 2,l,op)

43 Table 1: Input and output summaries of ComputeResult. Apply Base Case: ComputeResult F 1 (op) F 2 Constant (terminal) nodes and some other cases can be computed without recursion ComputeResult(F 1 ; F 2 ; op)! F r Operation and Conditions Return Value F 1 op F 2 ; F 1 = C 1 ; F 2 = C 2 C 1 op C 2 F 1 F 2 ; F 2 = 0 F 1 F 1 F 2 ; F 1 = 0 F 2 F 1 ª F 2 ; F 2 = 0 F 1 F 1 F 2 ; F 2 = 1 F 1 F 1 F 2 ; F 1 = 1 F 2 F 1 F 2 ; F 2 = 1 F 1 min(f 1 ; F 2 ); max(f 1 ) min(f 2 ) F 1 min(f 1 ; F 2 ); max(f 2 ) min(f 1 ) F 2 similarly for max other null

44 Apply Algorithm Note: Apply works for any binary operation! Why? Algorithm 1: Apply(F 1 ; F 2 ; op)! F r input : F 1 ; F 2 ; op : ADD nodes and op output: F r : ADD result node to return begin // Check if result can be immediately computed if (ComputeResult(F 1 ; F 2 ; op)! F r is not null ) then return F r ; // Check if result already in apply cache if ( hf 1 ; F 2 ; opi! F r is not in apply cache) then // Not terminal, so recurse var := GetEarliestVar(F 1 var ; F2 var ); // Set up nodes for recursion if (F 1 is non-terminal ^ var = F1 var ) then Fl v1 := F 1;l ; Fh v1 := F 1;h; else F v1 l=h := F 1; if (F 2 is non-terminal ^ var = F2 var ) then Fl v2 := F 2;l ; Fh v2 := F 2;h; else F v2 l=h := F 2; // Recurse and get cached result F l := Apply(Fl v1 ; Fl v2 ; op); F h := Apply(Fh v1; F h v2; op); F r := GetNode(var; F h ; F l ); // Put result in apply cache and return insert hf 1 ; F 2 ; opi! F r into apply cache; return F r ; end

45 Apply Properties Apply uses Apply cache (F 1,F 2,op) F R Complexity Quadratic: O( F 1 F 2 ) F measured in node count Why? Cache implies touch every pair of nodes at most once! 1 x 1 x 2 0 x 3 1 x 2 0 Canonical? Same inductive argument as Reduce

46 Reduce-Restrict Important operation Have F(x,y,z) Want G(x,y) = F z=0 F z ID 1 ID 0 Trivial when restricted var is root node Restrict(z=0) Restrict F v=value operation performs a Reduce Just returns branch for v=value whenever v reached Need Restrict-Reduce cache for O( F ) complexity G ID 0

47 Marginalization, etc. Use Apply + Reduce-Restrict x F (x, ) = F x=0 F x=1, e.g. F x x ID 1 ID 0 ID 1 ID 0 Likewise for similar operations ADD: min x F (x, ) = min( F x=0, F x=1 ) BDD: x F (x, ) = F x=0 F x=1 BDD: x F (x, ) = F x=0 F x=1

48 Apply Tricks I Build F(x 1,, x n ) = i=1..n x i Don t build a tree and then call Reduce! Just use indicator DDs and Apply to compute x 1 x 2 x n In general: Build any arithmetic expression bottom-up using Apply! x 1 + (x 2 + 4x 3 ) * (x 4 ) x 1 (x 2 (4 x 3 )) (x 4 )

49 Apply Tricks II Build ordered DD from unordered DD z is out of order z result will have z in order! z ID 1 ID z 0 1 ID 1 ID 0

50 Affine ADDs

51 ADD Inefficiency Are ADDs enough? Or do we need more compactness? Ex. 1: Additive reward/utility functions R(a,b,c) = R(a) + R(b) + R(c) = 4a + 2b + c Ex. 2: Multiplicative value functions V(a,b,c) = V(a) V(b) V(c) = (4a + 2b + c) c b a b c c c c b a b c c c

52 (Sanner, McAllester, IJCAI-05) Affine ADD (AADD) Define a new decision diagram Affine ADD Edges labeled by offset (c) and multiplier (b): a <c 1,b 1 > <c 2,b 2 > F 1 F 2 Semantics: if (a) then (c 1 +b 1 F 1 ) else (c 2 +b 2 F 2 )

53 Affine ADD (AADD) Maximize sharing by normalizing nodes [0,1] Example: if (a) then (4) else (2) Need top-level affine transform to recover original range a <4,0> <2,0> Normalize <2,2> a <1,0> <0,0> 0 0

54 AADD Reduce Key point: automatically finds additive structure

55 AADD Examples Automatically Constructed! Back to our previous examples Ex. 1: Additive reward/utility functions R(a,b) = R(a) + R(b) = 2a + b Ex. 2: Multiplicative value functions V(a,b) = V(a) V(b) = (2a + b) ; <1 a <2/3,1/3> <0,1/3> b 0 <0,3> <1,0> <0,0> <0, 2-3 > 1-3 a < 3, 1-3 > b <0,0> <1,0> 0 < - 3, 1- >

56 AADD Apply & Normalized Caching Need to normalize Apply cache keys, e.g., given h3 + 4F 1 i h5 + 6F 2 i before lookup in Apply cache, normalize h0 + 1F 1 i h0 + 1:5F 2 i GetNormCacheKey(hc 1 ; b 1 ; F 1 i; hc 2 ; b 2 ; F 2 i; op)! hhc 0 1; b 0 1ihc 0 2; b 0 2ii and ModifyResult(hc r ; b r ; F r i)! hc 0 r; b 0 r; F 0 ri Operation and Conditions Normalized Cache Key and Computation Result Modi cation hc 1 + b 1 F 1 i hc 2 + b 2 F 2 i; F 1 6= 0 hc r + b r F r i = h0 + 1F 1 i h0 + (b 2 =b 1 )F 2 i h(c 1 + c 2 + b 1 c r ) + b 1 b r F r i hc 1 + b 1 F 1 i ª hc 2 + b 2 F 2 i; F 1 6= 0 hc r + b r F r i = h0 + 1F 1 i ª h0 + (b 2 =b 1 )F 2 i h(c 1 c 2 + b 1 c r ) + b 1 b r F r i hc 1 + b 1 F 1 i hc 2 + b 2 F 2 i; F 1 6= 0 hc r + b r F r i = h(c 1 =b 1 ) + F 1 i h(c 2 =b 2 ) + F 2 i hb 1 b 2 c r + b 1 b 2 b r F r i hc 1 + b 1 F 1 i hc 2 + b 2 F 2 i; F 1 6= 0 hc r + b r F r i = h(c 1 =b 1 ) + F 1 i h(c 2 =b 2 ) + F 2 i h(b 1 =b 2 )c r + (b 1 =b 2 )b r F r i max(hc 1 + b 1 F 1 i; hc 2 + b 2 F 2 i); hc r + b r F r i = max(h0 + 1F 1 i; h(c 2 c 1 )=b 1 + (b 2 =b 1 )F 2 i) h(c 1 + b 1 c r ) + b 1 b r F r i F 1 6= 0, Note: same for min any hopi not matching above: hc 1 + b 1 F 1 i hopi hc 2 + b 2 F 2 i hc r + b r F r i = hc 1 + b 1 F 1 i hopi hc 2 + b 2 F 2 i hc r + b r F r i

57 ADDs vs. AADDs Additive functions: i=1..n x i Note: no context-specific independence, but subdiagrams shared: result size O(n 2 )

58 ADDs vs. AADDs Additive functions: i 2 i x i Best case result for ADD (exp.) vs. AADD (linear)

59 ADDs vs. AADDs Additive functions: i=0..n-1 F(x i,x (i+1) % n ) x 1 x 2 x 3 x 4 Pairwise factoring evident in AADD structure

60 Main AADD Theorem AADD can yield exponential time/space improvement over ADD and never performs worse! But Apply operations on AADDs can be exponential Why? Reconvergent diagrams possible in AADDs (edge labels), but not ADDs Sometimes Apply explores all paths if no hits in normalized Apply cache

61 Recap: Symbolic Inference with DDs Probabilistic Queries in Bayesian networks: E A B X P(E X) = P(E,X) / P(X) A B P(A,B,E,X) = A B P(A B,E) P(X B) P(E) P(X) DD DD DD DD Maximizing Exp. Utility in Influence Diagrams: t+1 t A a* = argmax a E[U A=a,X=x] = argmax a X U(X )P(X A=a,X=x) X X DD DD DDs can be used in any algorithm: use in VE, Loopy BP, Junction tree, etc. DDs automatically exploit structure in factors / msgs.

62 Approximate Inference Sometimes no DD is compact, but bounded approximation is

63 Problem: Value ADD Too Large Sum: ( i= i x i ) + x 4 -Noise How to approximate?

64 (St-Aubin, Hoey, Boutilier, NIPS-00) Solution: APRICODD Trick Merge leaves and reduce: Error is bounded!

65 Can use ADD to Maintain Bounds! Change leaf to represent range [L,U] Normal leaf is like [V,V] When merging leaves keep track of min and max values contributing For operations, see interval arithmetic :

66 More Compactness? AADDs? Sum: ( i= i x i ) + x 4 -Noise How to approximate? Error-bounded merge

67 (Sanner, Uther, Delgado, AAMAS-10) Solution: MADCAP Trick Merge nodes from bottom up:

68 Key Approximation Message automatic, efficient methods for finding logical, CSI, additive, and multiplicative structure in bounded approximations!

69 Example Inference Results using Decision Diagrams Do they really work well?

70 Empirical Comparison: Table/ADD/AADD n n Sum: i=1 2 i x i i=1 2 i x i Prod: i=1 ^(2 i x i ) i=1 ^(2 i x i ) n n ADD Table AADD ADD Table AADD ADD Table AADD ADD Table AADD

71 Application: Bayes Net Inference Use variable elimination Replace CPTs with ADDs or AADDs Could do same for clique/junction-tree algorithms Exploits Context-specific independence Probability has logical structure: P(a b,c) = if b? 1 : if c?.7 :.3 Additive CPTs Probability is discretized linear function: P(a b 1 b n ) = c + b i 2 i b i Multiplicative CPTs Noisy-or (multiplicative AADD): P(e c 1 c n ) = 1 - i (1 p i ) If factor has above compact form, AADD exploits it

72 Bayes Net Results: Various Networks Bayes Net Table ADD AADD # Entries Time # Nodes Time # Nodes Time Alarm 1, s s s Barley 470,294 EML * 139,856 EML * 60, m Carpo s s s Hailfinder 9, s 4, s 2, s Insurance 2, s 1, s s Noisy-Or-15 65, s 125, s 1, s Noisy-Max , s 202, s 40, s * EML: Exceeded Memory Limit (1GB)

73 Application: MDP Solving SPUDD Factored MDP Solver (Hoey et al, 99) Originally uses ADDs, can use AADDs Implements factored value iteration DD DD V n+1 DD DD (x 1 x i ) = R(x 1 x i ) + max a x1 xi F1 Fi P(x 1 x i ) P(x i x i ) V n (x 1 x i ) DD

74 Application: SysAdmin SysAdmin MDP (GKP, 2001) Network of computers: c 1,, c k Various network topologies c 3 c 2 c 4 Every computer is running or crashed At each time step, status of c i affected by Previous state status Status of incoming connections in previous state Reward: +1 for every computer running (additive) c 5 c 1 c 6

75 Results I: SysAdmin (10% Approx)

76 Results II: SysAdmin

77 Traffic Domain Binary cell transmission model (CTM) Actions Light changes Objective: Maximize empty cells in network

78 Results Traffic

79 Application: POMDPs Provided an AADD implementation for Guy Shani s factored POMDP solver Final value function size results: ADD AADD Network Management Rock Sample

80 Inference with Decision Diagrams vs. Compilations (d-dnnf, etc.) Important Distinctions

81 Defintions / Diagrams from A Knowledge Compilation Map, Darwiche and Marquis. JAIR 02 BDDs in NNF Can express BDD as NNF formula Can represent NNF diagrammatically

82 d-dnnf Defintions / Diagrams from A Knowledge Compilation Map, Darwiche and Marquis. JAIR 02 Decomposable NNF: sets of leaf vars of conjuncts are disjoint Deterministic NNF: formula for disjuncts have disjoint models (conjunction is unsatisfiable)

83 d-dnnf Defintions / Diagrams from A Knowledge Compilation Map, Darwiche and Marquis. JAIR 02 D-DNNF used to compile single formula d-dnnf does not support efficient binary operations (,, ) d-dnnf shares some polytime operations with OBDD / ADD (weighted) model counting (CT) used in many inference tasks Size(d-DNNF) Size(OBDD) so more efficient on d-dnnf child is subset of parent Children inherit polytime operations of parents Size of children parents Ordered BDD, in previous slides I call this a BDD

84 Compilations vs Decision Diagrams Typically not a good idea in sequential probabilistic inference or decision-making Summary If you can compile problem into single formula then compilation is likely preferable to DDs provided you only need ops that compilation supports Not all compilations efficient for all binary operations e.g., all ops needed for progression / regression approaches fixed ordering of DDs help support these operations Note: other compilations (e.g., arithmetic circuits) Great software:

85 Part I Summary: Symbolic Methods for Discrete Inference Graphical Models and Influence Diagrams Representation: products of discrete factors Inference: operations on discrete factors Order of operations matters Symbolic Inference with Decision Diagrams (DDs) DDs more compact than tables or trees Logical (AND, OR, XOR) Context-specific independence (CSI) & shared substructure Additive and multiplicative structure (Affine ADD) DD operations exploit structure DDs support efficient bounded approximations Gogate and Domingos (UAI-13) have recent approx. prob. inference contributions and a good reference list.

Decision Diagrams in Automated Planning and Scheduling

ICAPS 2014 Tutorial Decision Diagrams in Automated Planning and Scheduling Scott Sanner DD Definition Decision diagrams (DDs): DAG variant of decision tree b a b Decision tests ordered Used to represent: