Treedy: A Heuristic for Counting and Sampling Subsets

Size: px

Start display at page:

Download "Treedy: A Heuristic for Counting and Sampling Subsets"

Gregory Harvey
6 years ago
Views:

1 1 / 27 HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET UNIVERSITY OF HELSINKI Treedy: A Heuristic for Counting and Sampling Subsets Teppo Niinimäki, Mikko Koivisto July 12, 2013 University of Helsinki Department of Computer Science

2 2 / 27 Outline Problem definition Application: Bayesian network learning Algorithms Experiments

3 3 / 27 Outline Problem definition Application: Bayesian network learning Algorithms Experiments

4 4 / 27 Problem overview N Subset counting problem: Compute Q W (Q) = S Q w(s) C

5 5 / 27 Problem setting Given: Ground set N of n elements Example N = {A, B, C, D} (n = 4)

6 5 / 27 Problem setting Given: Ground set N of n elements Collection C P(N) (downward closed) Example N = {A, B, C, D} (n = 4) C = {, A, B, C, D, AB, AC, AD, BC, BD, CD}

7 Problem setting Given: Ground set N of n elements Collection C P(N) (downward closed) Weights { w(s) 0 w(s) = 0 for S C otherwise Example (continued) S C w(s) 80 A 85 B 70 C 13 D 50 AB 99 AC 60 AD 90 BC 11 BD 14 CD 12 5 / 27

8 6 / 27 Problem definition Query set Q N Example Q = {B, C, D}

9 6 / 27 Problem definition Query set Q N Counting problem: Compute W (Q) = S Q w(s) Example Q = {B, C, D} S Q w(s) 80 B 70 C 13 D 50 BC 11 BD 14 CD 12 BCD 0 W (Q) = 250

10 Problem definition Query set Q N Counting problem: Compute W (Q) = S Q w(s) Sampling problem: Draw S Q s.t. Pr(S) = w(s) W (Q) Example Q = {B, C, D} S Q w(s) Pr(S) B C D BC BD CD BCD W (Q) = / 27

11 Problem definition Too slow Approximate with tolerance d [0, 1] Query set Q N Counting problem: Compute W (Q) = S Q w(s) within relative error d. Sampling problem: Draw S Q s.t. Pr(S) = w(s) W (Q) within total variation distance d. Example Q = {B, C, D} d = 0.2 S Q w(s) Pr(S) B C D BC BD CD BCD W (Q) = 250 [200, 300] 6 / 27

12 7 / 27 Outline Problem definition Application: Bayesian network learning Algorithms Experiments

13 8 / 27 Bayesian network structure: directed acyclic graph S v v conditional probabilities p(x) = v N p(x v x Sv )

14 9 / 27 Structure learning Task: Given data, find the probability of arc a b variables sample ? 3 8 Exact computation infeasible for large instances approximate with MCMC sampler

15 Order-MCMC (Friedman & Koller, 2003) Sample node orderings Draws v 1 v 2 v n from the (posterior) distribution n Pr(v 1 v 2 v n ) = W i ({v 1,..., v i 1 }) i=1 where W i (Q) = S i Q w i(s i ) C i = { potential parent sets for v i } w i depends on the data Q v i n subset counting queries 10 / 27

16 Order-MCMC (Friedman & Koller, 2003) Sample structures {v 1,..., v i 1 } v i For every node v i, samples a parent set S i {v 1,..., v i 1 } such that Pr(S i ) = w i (S i ) W i ({v 1,..., v i 1 }) n subset sampling queries 11 / 27

17 12 / 27 Outline Problem definition Application: Bayesian network learning Algorithms Experiments

18 Algorithm outline Set S is relevant if both S C and S Q. W (Q) = sum over relevant sets Collector algorithm for counting Visit some sets S N in some order. Add up weights of relevant sets. Stop once approximation tolerance d met. 13 / 27

19 Algorithm outline Set S is relevant if both S C and S Q. W (Q) = sum over relevant sets Collector algorithm for counting Visit some sets S N in some order. Add up weights of relevant sets. Stop once approximation tolerance d met. From counting to sampling Sample sets visited by the collector algorithm. distribution within total variation distance d 13 / 27

20 14 / 27 Four counting algorithms Exact An exact baseline algorithm. Visits all relevant sets.

21 14 / 27 Four counting algorithms Exact An exact baseline algorithm. Visits all relevant sets. Ideal Idealized approximation. Visits the minimum number of heaviest relevant sets.

22 14 / 27 Four counting algorithms Exact An exact baseline algorithm. Visits all relevant sets. Ideal Idealized approximation. Visits the minimum number of heaviest relevant sets. Sorted Improved version of the heuristic used by Friedman and Koller in the order-mcmc.

23 14 / 27 Four counting algorithms Exact An exact baseline algorithm. Visits all relevant sets. Ideal Idealized approximation. Visits the minimum number of heaviest relevant sets. Sorted Improved version of the heuristic used by Friedman and Koller in the order-mcmc. Treedy A novel heuristic based on tree traversal.

24 15 / 27 Sorted Preprocessing: Sort C by weight. Compute partial sums W 1, W 2,..., W C. Example (Q =BCD, d =0.2) S C w(s) W i AB AD A B AC D BD C CD BC 11 11

25 Sorted Preprocessing: Sort C by weight. Compute partial sums W 1, W 2,..., W C. Query: Traverse C starting from heaviest. Add up relevant weights to W. Stop once the weight of the remaining sets small enough. Example (Q =BCD, d =0.2) S C w(s) W i AB AD A B AC D BD C CD BC W = / 27

26 Treedy Preprocessing: Start with a lexicographical tree for C. 16 / 27

27 16 / 27 Treedy Preprocessing: Example AB w: 99 AC w: 60 AD w: 90 BC w: 11 BD w: 14 CD w: 12 A w: 85 B w: 70 C w: 13 D w: 50 w: 80

28 Treedy Preprocessing: Start with a lexicographical tree for C. Compute weight potentials φ(s). 16 / 27

29 16 / 27 Treedy Preprocessing: Example AB w: 99 φ: 99 AC w: 60 φ: 60 AD w: 90 φ: 90 BC w: 11 φ: 11 BD w: 14 φ: 14 CD w: 12 φ: 12 A w: 85 φ: 334 B w: 70 φ: 95 C w: 13 φ: 25 D w: 50 φ: 50 w: 80 φ: 584

30 Treedy Preprocessing: Start with a lexicographical tree for C. Compute weight potentials φ(s). Sort siblings by φ(s) and reorganize links. 16 / 27

31 16 / 27 Treedy Preprocessing: Example AB w: 99 φ: 99 AD w: 90 φ: 90 AC w: 60 φ: 60 BD w: 14 φ: 14 BC w: 11 φ: 11 CD w: 12 φ: 12 A w: 85 φ: 334 B w: 70 φ: 95 D w: 50 φ: 50 C w: 13 φ: 25 w: 80 φ: 584

32 16 / 27 Treedy Preprocessing: Example AB w: 99 φ: 99 AD w: 90 φ: 90 AC w: 60 φ: 60 BD w: 14 φ: 14 BC w: 11 φ: 11 CD w: 12 φ: 12 A w: 85 φ: 334 B w: 70 φ: 95 D w: 50 φ: 50 C w: 13 φ: 25 w: 80 φ: 584

33 Treedy Preprocessing: Start with a lexicographical tree for C. Compute weight potentials φ(s). Sort siblings by φ(s) and reorganize links. Compute aggregate potentials ψ(s). 16 / 27

34 16 / 27 Treedy Preprocessing: Example AB w: 99 φ: 99 ψ: 249 AD w: 90 φ: 90 ψ: 150 AC w: 60 φ: 60 ψ: 60 BD w: 14 φ: 14 ψ: 25 BC w: 11 φ: 11 ψ: 11 CD w: 12 φ: 12 ψ: 12 A w: 85 φ: 334 ψ: 504 B w: 70 φ: 95 ψ: 170 D w: 50 φ: 50 ψ: 75 C w: 13 φ: 25 ψ: 25 w: 80 φ: 584 ψ: 584

35 16 / 27 Treedy Preprocessing: Start with a lexicographical tree for C. Compute weight potentials φ(s). Sort siblings by φ(s) and reorganize links. Compute aggregate potentials ψ(s). Query: Traverse the tree greedily w.r.t. ψ.

36 16 / 27 Treedy Preprocessing: Start with a lexicographical tree for C. Compute weight potentials φ(s). Sort siblings by φ(s) and reorganize links. Compute aggregate potentials ψ(s). Query: Traverse the tree greedily w.r.t. ψ. Add up relevant weights to W.

37 16 / 27 Treedy Preprocessing: Start with a lexicographical tree for C. Compute weight potentials φ(s). Sort siblings by φ(s) and reorganize links. Compute aggregate potentials ψ(s). Query: Traverse the tree greedily w.r.t. ψ. Add up relevant weights to W. Ignore and skip irrelevant branches.

38 16 / 27 Treedy Preprocessing: Start with a lexicographical tree for C. Compute weight potentials φ(s). Sort siblings by φ(s) and reorganize links. Compute aggregate potentials ψ(s). Query: Traverse the tree greedily w.r.t. ψ. Add up relevant weights to W. Ignore and skip irrelevant branches. Stop once the weight of the remaining branches small enough.

39 17 / 27 Treedy Query: Example (Q =BCD, d =0.2) AB w: 99 φ: 99 ψ: 249 AD w: 90 φ: 90 ψ: 150 AC w: 60 φ: 60 ψ: 60 BD w: 14 φ: 14 ψ: 25 BC w: 11 φ: 11 ψ: 11 CD w: 12 φ: 12 ψ: 12 A w: 85 φ: 334 ψ: 504 B w: 70 φ: 95 ψ: 170 D w: 50 φ: 50 ψ: 75 C w: 13 φ: 25 ψ: 25 w: 80 φ: 584 ψ: 584

40 17 / 27 Treedy Query: Example (Q =BCD, d =0.2) AB w: 99 φ: 99 ψ: 249 AD w: 90 φ: 90 ψ: 150 AC w: 60 φ: 60 ψ: 60 BD w: 14 φ: 14 ψ: 25 BC w: 11 φ: 11 ψ: 11 CD w: 12 φ: 12 ψ: 12 A w: 85 φ: 334 ψ: 504 B w: 70 φ: 95 ψ: 170 D w: 50 φ: 50 ψ: 75 C w: 13 φ: 25 ψ: 25 w: 80 φ: 584 ψ: 584

41 17 / 27 Treedy Query: Example (Q =BCD, d =0.2) AB AD AC BD w: 14 φ: 14 ψ: 25 BC w: 11 φ: 11 ψ: 11 CD w: 12 φ: 12 ψ: 12 A B w: 70 φ: 95 ψ: 170 D w: 50 φ: 50 ψ: 75 C w: 13 φ: 25 ψ: 25 W': 80 Ψ: 170

42 17 / 27 Treedy Query: Example (Q =BCD, d =0.2) AB AD AC BD w: 14 φ: 14 ψ: 25 BC w: 11 φ: 11 ψ: 11 CD w: 12 φ: 12 ψ: 12 A B W': 150 Ψ: 100 D w: 50 φ: 50 ψ: 75 C w: 13 φ: 25 ψ: 25

43 17 / 27 Treedy Query: Example (Q =BCD, d =0.2) AB AD AC BD w: 14 φ: 14 ψ: 25 BC w: 11 φ: 11 ψ: 11 CD w: 12 φ: 12 ψ: 12 A B D W': 200 Ψ: 50 C w: 13 φ: 25 ψ: 25

44 18 / 27 Example summary Example (Q =BCD, d =0.2) S C w(s) Exact Ideal Sorted Treedy AB 99 1 AD 90 2 A B AC 60 6 D BD C CD 12 7 BC 11 4

45 19 / 27 Outline Problem definition Application: Bayesian network learning Algorithms Experiments

46 Experiment setting Data Artificial instances Different types of random weights: flat, steep, mixture, shuffled Bayesian network learning Data sampled from Bayesian networks: ALARM, HAILFINDER Datasets from UCI repository: Votes2, Chess, 10xPromoters, Splice 20 / 27

47 Experiment setting Counting queries Artificial instances Random query sets of different sizes Bayesian network learning Run order-mcmc using different counting algorithms Limit the size of sets in C to be at most k {3, 4, 5} measure runtime as a function of tolerance d [10 8, 0.5] 21 / 27

48 22 / 27 Artificial instances flat steep mixture shuffled Exact Ideal Sorted Treedy Runtimes for different types of artificial weight functions. (n = 60, k = 5)

49 23 / 27 Bayesian network learning Name n #Samples k #Steps Alarm Hailfinder Votes Chess xPromoters Splice

50 24 / 27 Bayesian network learning 50 samples 200 samples 1000 samples 5000 samples Exact Ideal Sorted Treedy Runtime of order-mcmc for data generated from ALARM-network. (n = 37, k = 5)

51 25 / 27 Bayesian network learning samples 200 samples 1000 samples 5000 samples Exact Ideal Sorted Treedy Runtime of order-mcmc for data generated from HAILFINDER-network. (n = 56, k = 4)

52 26 / 27 Bayesian network learning Votes2 Chess 10 Promoters Splice Exact Ideal Sorted Treedy Runtime of order-mcmc for UCI datasets.

53 27 / 27 Conclusion Problem: approximate subset counting and sampling queries Application: Bayesian network structure learning Two heuristic methods: Sorted and Treedy Significant speed-ups over the Exact algorithm However, still room for improvements (compared to the Ideal algorithm) Implementation (C++): cs.helsinki.fi/u/tzniinim/uai2013

54 27 / 27

27 / 27 Bayesian network learning Votes2 Chess 10 x Promoters Splice 10 2 10 1 10 0 10 1 10 2 Exact

55 27 / 27 Bayesian network learning Votes2 Chess 10 x Promoters Splice Exact Ideal Sorted Treedy Runtime as a function of query set size.

Local Structure Discovery in Bayesian Networks

1 / 45 HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET UNIVERSITY OF HELSINKI Local Structure Discovery in Bayesian Networks Teppo Niinimäki, Pekka Parviainen August 18, 2012 University of Helsinki Department