Stochastic Enumeration Method for Counting Trees

Size: px

Start display at page:

Download "Stochastic Enumeration Method for Counting Trees"

Eleanor Jordan
5 years ago
Views:

1 Stochastic Enumeration Method for Counting Trees Slava Vaisman (Joint work with Dirk P. Kroese) University of Queensland January 11, 2015 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

2 Overview 1 The Tree Counting Problem Is this hard? Is this interesting? Previous work 2 Knuth s estimator Problem with Knuth s estimator What can we do about this? 3 From Knuth to Stochastic Enumeration (SE) Algorithm 4 Analysis and almost sure Fully Polynomial Randomized Approximation Scheme for random trees (Super Critical Branching Process) 5 SE in practice Network Reliability 6 What next? Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

3 The Tree Counting Problem Consider a rooted tree T = (V, E) with node set V and edge set E. 7 Which each node v is associated a cost c(v) R, (it is also possible that C(v) is a random variable). 1 5 The main quantity of interest is the total cost of the tree, Cost(T ) = v V c(v), or for r.v: ( ) Cost(T ) = E v V C(v). Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

4 The Tree Counting Problem Consider a rooted tree T = (V, E) with node set V and edge set E. 7 Which each node v is associated a cost c(v) R, (it is also possible that C(v) is a random variable). 1 5 The main quantity of interest is the total cost of the tree, Cost(T ) = v V c(v), or for r.v: ( ) Cost(T ) = E v V C(v). Linear time solution? (BFS, DFS). What if the set V is large? Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

5 Figure: Complexity classes Slava Vaisman (UQ) Stochastic enumeration January 11, / 37 Is this hard? (1) The general problem of estimating the cost of a tree is at least #P, (Valiant, 1979). (Counting CNF formula solutions) An existence of computationally efficient approximation algorithm will result in the collapse of polynomial hierarchy!!!

6 Is this an interesting problem? From theoretical point of view. Theoretical research of complexity classes (#P Counting Problems). New sampling strategies for stochastic simulation algorithms. In Practice. Early estimates of the size of backtrack trees, (Knuth). Efficient evaluation of strategies in Partially Observable Markov Decision Processes. Improved sampling strategies for Monte Carlo Tree Search (MCTS) algorithms finding large rewards under rare event settings. Network Reliability and sensitivity. Many more... Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

POMDP Rock Sample (1) There are 8 rocks (some of them are good and some are bad ). The robot has sensor that can scan the rocks. The sensor results are subject to errors.

7 POMDP Rock Sample (1) There are 8 rocks (some of them are good and some are bad ). The robot has sensor that can scan the rocks. The sensor results are subject to errors. The robot can move, scan and collect rock. Collecting the good rocks or exiting results in a reward. Any movement and collection of bad rock is penalized. Our goal is to develop optimal plan that maximize the overall collected reward. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

8 POMDP Rock Sample (2) The robot operates in believe space b ( good and bad rocks) b = {b 1,..., b 8 }, where b i = P(rock i is good ), (for example b i = 1/2 at the beginning maximizes the entropy). Let π : b A be a mapping from the belief space to action space, and: π = argmax E π (reward). π Π Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

9 POMDP Rock Sample (2) The robot operates in believe space b ( good and bad rocks) b = {b 1,..., b 8 }, where b i = P(rock i is good ), (for example b i = 1/2 at the beginning maximizes the entropy). Let π : b A be a mapping from the belief space to action space, and: π = argmax E π (reward). π Π Using universal approximators (such as RBFs), one can compactly represent any π. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

10 POMDP Rock Sample (2) The robot operates in believe space b ( good and bad rocks) b = {b 1,..., b 8 }, where b i = P(rock i is good ), (for example b i = 1/2 at the beginning maximizes the entropy). Let π : b A be a mapping from the belief space to action space, and: π = argmax E π (reward). π Π Using universal approximators (such as RBFs), one can compactly represent any π. Crucial observation as soon as an approximation to π is given, E π (reward) becomes the tree counting problem. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

11 POMDP Rock Sample (2) The robot operates in believe space b ( good and bad rocks) b = {b 1,..., b 8 }, where b i = P(rock i is good ), (for example b i = 1/2 at the beginning maximizes the entropy). Let π : b A be a mapping from the belief space to action space, and: π = argmax E π (reward). π Π Using universal approximators (such as RBFs), one can compactly represent any π. Crucial observation as soon as an approximation to π is given, E π (reward) becomes the tree counting problem. In order to approximate the optimal plan, all we need to do is to optimize the parameters of the RBFs. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

12 Previous work Donald E. Knuth (1975) Estimating the Efficiency of Backtrack Programs Math. Comp. 29. Paul W. Purdom (1978) Tree Size by Partial Backtracking SIAM J. Comput. 7(4) Pang C. Chen (1992) Heuristic Sampling: A Method for Predicting the Performance of Tree Searching Programs. SIAM J. Comput. 21(2) Few additional attempts based on Knuth s estimator. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

13 Knuth s estimator Input: A tree T v of height h, rooted at v. Output: An unbiased estimator C of the total cost of tree T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C c(x 0 ). Here D is the product of all node degrees encountered in the tree. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k and let D k be the number of elements of S(X k ). If k = h or when S(X k ) is empty, set D k = 0. 3 (Terminal position?): If D k = 0, the algorithm stops, returning C as an estimator of Cost(T v ). 4 (Advance): Choose an element X k+1 S(X k ) at random, each element being equally likely. (Thus, each choice occurs with probability 1/D k.) Set D D k D, then set C C + c(x k+1 )D. Increase k by 1 and return to Step 2. k = 0, D = 1, X 0 = v 1, C = 7. v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v6, 9 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

14 Knuth s estimator Input: A tree T v of height h, rooted at v. Output: An unbiased estimator C of the total cost of tree T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C c(x 0 ). Here D is the product of all node degrees encountered in the tree. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k and let D k be the number of elements of S(X k ). If k = h or when S(X k ) is empty, set D k = 0. 3 (Terminal position?): If D k = 0, the algorithm stops, returning C as an estimator of Cost(T v ). 4 (Advance): Choose an element X k+1 S(X k ) at random, each element being equally likely. (Thus, each choice occurs with probability 1/D k.) Set D D k D, then set C C + c(x k+1 )D. Increase k by 1 and return to Step 2. S(X 0 ) = {v 2, v 3 }, D 0 = 2. v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v6, 9 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

15 Knuth s estimator Input: A tree T v of height h, rooted at v. Output: An unbiased estimator C of the total cost of tree T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C c(x 0 ). Here D is the product of all node degrees encountered in the tree. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k and let D k be the number of elements of S(X k ). If k = h or when S(X k ) is empty, set D k = 0. 3 (Terminal position?): If D k = 0, the algorithm stops, returning C as an estimator of Cost(T v ). 4 (Advance): Choose an element X k+1 S(X k ) at random, each element being equally likely. (Thus, each choice occurs with probability 1/D k.) Set D D k D, then set C C + c(x k+1 )D. Increase k by 1 and return to Step 2. S(X 0 ) = {v 2, v 3 }, D 0 = 2. v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v6, 9 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

16 Knuth s estimator Input: A tree T v of height h, rooted at v. Output: An unbiased estimator C of the total cost of tree T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C c(x 0 ). Here D is the product of all node degrees encountered in the tree. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k and let D k be the number of elements of S(X k ). If k = h or when S(X k ) is empty, set D k = 0. 3 (Terminal position?): If D k = 0, the algorithm stops, returning C as an estimator of Cost(T v ). 4 (Advance): Choose an element X k+1 S(X k ) at random, each element being equally likely. (Thus, each choice occurs with probability 1/D k.) Set D D k D, then set C C + c(x k+1 )D. Increase k by 1 and return to Step 2. k = 1, X 1 = v 3, D = 1 D 0 = 2, C = = 17. v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v6, 9 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

17 Knuth s estimator Input: A tree T v of height h, rooted at v. Output: An unbiased estimator C of the total cost of tree T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C c(x 0 ). Here D is the product of all node degrees encountered in the tree. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k and let D k be the number of elements of S(X k ). If k = h or when S(X k ) is empty, set D k = 0. 3 (Terminal position?): If D k = 0, the algorithm stops, returning C as an estimator of Cost(T v ). 4 (Advance): Choose an element X k+1 S(X k ) at random, each element being equally likely. (Thus, each choice occurs with probability 1/D k.) Set D D k D, then set C C + c(x k+1 )D. Increase k by 1 and return to Step 2. S(X 1 ) = {v 5, v 6 }, D 1 = 2. v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v6, 9 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

18 Knuth s estimator Input: A tree T v of height h, rooted at v. Output: An unbiased estimator C of the total cost of tree T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C c(x 0 ). Here D is the product of all node degrees encountered in the tree. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k and let D k be the number of elements of S(X k ). If k = h or when S(X k ) is empty, set D k = 0. 3 (Terminal position?): If D k = 0, the algorithm stops, returning C as an estimator of Cost(T v ). 4 (Advance): Choose an element X k+1 S(X k ) at random, each element being equally likely. (Thus, each choice occurs with probability 1/D k.) Set D D k D, then set C C + c(x k+1 )D. Increase k by 1 and return to Step 2. S(X 1 ) = {v 5, v 6 }, D 1 = 2. v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v6, 9 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

19 Knuth s estimator Input: A tree T v of height h, rooted at v. Output: An unbiased estimator C of the total cost of tree T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C c(x 0 ). Here D is the product of all node degrees encountered in the tree. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k and let D k be the number of elements of S(X k ). If k = h or when S(X k ) is empty, set D k = 0. 3 (Terminal position?): If D k = 0, the algorithm stops, returning C as an estimator of Cost(T v ). 4 (Advance): Choose an element X k+1 S(X k ) at random, each element being equally likely. (Thus, each choice occurs with probability 1/D k.) Set D D k D, then set C C + c(x k+1 )D. Increase k by 1 and return to Step 2. k = 2, X 2 = v 6, D = 2 D 1 = 4, C = = 53. v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v6, 9 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

20 Knuth s estimator Input: A tree T v of height h, rooted at v. Output: An unbiased estimator C of the total cost of tree T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C c(x 0 ). Here D is the product of all node degrees encountered in the tree. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k and let D k be the number of elements of S(X k ). If k = h or when S(X k ) is empty, set D k = 0. 3 (Terminal position?): If D k = 0, the algorithm stops, returning C as an estimator of Cost(T v ). 4 (Advance): Choose an element X k+1 S(X k ) at random, each element being equally likely. (Thus, each choice occurs with probability 1/D k.) Set D D k D, then set C C + c(x k+1 )D. Increase k by 1 and return to Step 2. X 2 = v 6, S(X 2 ) =, D 2 = 0. v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v6, 9 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

21 Knuth s estimator Input: A tree T v of height h, rooted at v. Output: An unbiased estimator C of the total cost of tree T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C c(x 0 ). Here D is the product of all node degrees encountered in the tree. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k and let D k be the number of elements of S(X k ). If k = h or when S(X k ) is empty, set D k = 0. 3 (Terminal position?): If D k = 0, the algorithm stops, returning C as an estimator of Cost(T v ). 4 (Advance): Choose an element X k+1 S(X k ) at random, each element being equally likely. (Thus, each choice occurs with probability 1/D k.) Set D D k D, then set C C + c(x k+1 )D. Increase k by 1 and return to Step 2. C = 53. Reached terminal node. Note that Cost(T ) = 57. v2, 1 v4, 3 v7, 4 v8, 2 v9, 1 v1, 7 v5, 1 v3, 5 v10, 14 v11, 10 v6, 9 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

22 Is this always work? (Rare-events) Consider the hair brush tree T and suppose that the costs of all vertices are zero except for v n+1, which has a cost of unity. v2 v1 v3 v2 v3 The expectation and variance of the Knuth s estimator are E (C) = 1 2 n 2n 1 + 2n 1 2 n D 0 = 1, v4 vn vn+1 vn+1 Figure: The hair brush tree. and E ( C 2) = 1 2 n (2n 1) n 1 2 n (D 0 )2 = 2 n Var (C) = 2 n 1. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

23 Is this always work? (Rare-events) Consider the hair brush tree T and suppose that the costs of all vertices are zero except for v n+1, which has a cost of unity. v2 v1 v2 The expectation and variance of the Knuth s estimator are E (C) = 1 2 n 2n 1 + 2n 1 2 n D 0 = 1, v3 v4 v3 vn and E ( C 2) = 1 2 n (2n 1) n 1 2 n (D 0 )2 = 2 n Var (C) = 2 n 1. vn+1 vn+1 Figure: The hair brush tree. CV 2 = Var (C) E (C) = 2n Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

24 Is this always work? (Rare-events) Consider the hair brush tree T and suppose that the costs of all vertices are zero except for v n+1, which has a cost of unity. v2 v1 v3 v2 v4 v3 vn The expectation and variance of the Knuth s estimator are E (C) = 1 2 n 2n 1 + 2n 1 2 n D 0 = 1, and E ( C 2) = 1 2 n (2n 1) n 1 2 n (D 0 )2 = 2 n Var (C) = 2 n 1. vn+1 vn+1 Figure: The hair brush tree. CV 2 = Var (C) E (C) = 2n Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

25 What can we do? The problem is the large variance. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

26 What can we do? The problem is the large variance. Variance reduction techniques. Common and antithetic random variables. Control variables. Conditional Monte Carlo. Stratified sampling. Importance Sampling. Multilevel Splitting. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

27 To start with Multilevel Splitting Consider (again) the hair brush tree T. B {}}{ Define some budget B 1 of parallel random walks. B/2 {}}{ v2 v1 v3 v2 v3 B/2 {}}{ Start from the root. The expected number of walks which reach the good vertex v 2 is B/2 call them the good trajectories. v4 vn vn+1 vn+1 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

28 To start with Multilevel Splitting Consider (again) the hair brush tree T. v1 Define some budget B 1 of parallel random walks. \ B/2 { }} { v2 v3 v2 v3 B {}}{ Start from the root. The expected number of walks which reach the good vertex v 2 is B/2 call them the good trajectories. v4 vn Split the good trajectories such that there are B of them again and continue to the next tree level. vn+1 vn+1 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

29 To start with Multilevel Splitting Consider (again) the hair brush tree T. \ B/2 { }} { v2 v1 v3 v2 v4 v3 B {}}{ vn+1 vn vn+1 Define some budget B 1 of parallel random walks. Start from the root. The expected number of walks which reach the good vertex v 2 is B/2 call them the good trajectories. Split the good trajectories such that there are B of them again and continue to the next tree level. Carefully choosing B (polynomial in n!), will allow us to reach the vertex of interest v n+1 with reasonably high probability. P(The process reaches the next level) = 1 1/2 B. P(The process reaches the v n+1 vertex) = (1 1/2 B ) n. B = log 2 (n) P(The process reaches the v n+1 vertex) e 1, as n. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

30 SE the main idea 1 Define a budget B N, and let B be the number of parallel random walks on the tree. 2 Using these B walks, run Knuth s Algorithm in parallel, (there are some technical issues!). 3 If some walks die, split the remaining ones to continue with B walks as usual, (multilevel splitting). Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

31 SE example with B = 2 v 1, 7 v 2, 1 v 3, 5 v 4, 3 v 5, 1 v 6, 9 v 7, 4 v 8, 2 v 9, 1 v 10, 14 v 11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

32 SE example with B = 2 v 1, 7 v 2, 1 v 3, 5 v 4, 3 v 5, 1 v 6, 9 v 7, 4 v 8, 2 v 9, 1 v 10, 14 v 11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

33 SE example with B = 2 v 1, 7 v 2, 1 v 3, 5 v 4, 3 v 5, 1 v 6, 9 v 7, 4 v 8, 2 v 9, 1 v 10, 14 v 11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

34 SE example with B = 2 v 1, 7 v 2, 1 v 3, 5 v 4, 3 v 5, 1 v 6, 9 v 7, 4 v 8, 2 v 9, 1 v 10, 14 v 11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

35 SE example with B = 2 v 1, 7 v 2, 1 v 3, 5 v 4, 3 v 5, 1 v 6, 9 v 7, 4 v 8, 2 v 9, 1 v 10, 14 v 11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

36 SE example with B = 2 v 1, 7 v 2, 1 v 3, 5 v 4, 3 v 5, 1 v 6, 9 v 7, 4 v 8, 2 v 9, 1 v 10, 14 v 11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

37 SE example with B = 2 v 1, 7 v 2, 1 v 3, 5 v 4, 3 v 5, 1 v 6, 9 v 7, 4 v 8, 2 v 9, 1 v 10, 14 v 11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

38 SE Algorithm with B = 2 Input: A forest T v of height h rooted at a hypernode v, and a budget B 1. Output: An unbiased estimator v C SE of the total cost of forest T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C SE c(x 0 )/ X 0. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k. 3 (Terminal position?): If S(X k ) = 0, the algorithm stops, returning v C SE as an estimator of Cost(T v). 4 (Advance): Choose hyper node X k+1 H(X k ) at random, each choice being equally likely. (Thus, each choice occurs with probability 1/ H(X k ).) Set D k = S(X k ) and D D X k k D, then set ( c(xk+1 ) C SE C SE + X k+1 1 and return to Step 2. ) D. Increase k by k = 0, D = 1, X 0 = {v 1 }, C SE = 7. v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v6, 9 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

39 SE Algorithm with B = 2 Input: A forest T v of height h rooted at a hypernode v, and a budget B 1. Output: An unbiased estimator v C SE of the total cost of forest T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C SE c(x 0 )/ X 0. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k. 3 (Terminal position?): If S(X k ) = 0, the algorithm stops, returning v C SE as an estimator of Cost(T v). 4 (Advance): Choose hyper node X k+1 H(X k ) at random, each choice being equally likely. (Thus, each choice occurs with probability 1/ H(X k ).) Set D k = S(X k ) and D D X k k D, then set ( c(xk+1 ) C SE C SE + X k+1 1 and return to Step 2. ) D. Increase k by S(X 0 ) = {v 2, v 3 }. v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 v6, 9 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

40 SE Algorithm with B = 2 Input: A forest T v of height h rooted at a hypernode v, and a budget B 1. Output: An unbiased estimator v C SE of the total cost of forest T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C SE c(x 0 )/ X 0. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k. 3 (Terminal position?): If S(X k ) = 0, the algorithm stops, returning v C SE as an estimator of Cost(T v). 4 (Advance): Choose hyper node X k+1 H(X k ) at random, each choice being equally likely. (Thus, each choice occurs with probability 1/ H(X k ).) Set D k = S(X k ) and D D X k k D, then set ( c(xk+1 ) C SE C SE + X k+1 1 and return to Step 2. ) D. Increase k by S(X 0 ) = {v 2, v 3 }. v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 v6, 9 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

41 SE Algorithm with B = 2 Input: A forest T v of height h rooted at a hypernode v, and a budget B 1. Output: An unbiased estimator v C SE of the total cost of forest T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C SE c(x 0 )/ X 0. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k. 3 (Terminal position?): If S(X k ) = 0, the algorithm stops, returning v C SE as an estimator of Cost(T v). 4 (Advance): Choose hyper node X k+1 H(X k ) at random, each choice being equally likely. (Thus, each choice occurs with probability 1/ H(X k ).) Set D k = S(X k ) and D D X k k D, then set ( c(xk+1 ) C SE C SE + X k+1 1 and return to Step 2. ) D. Increase k by k = 1, X 1 = {v 2, v 3 }, D = 2 1 = 2, C SE = v2, 1 v4, 3 v7, 4 v8, 2 v9, 1 v1, 7 v5, 1 v3, 5 v10, 14 v11, 10 v6, 9 = 13. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

42 SE Algorithm with B = 2 Input: A forest T v of height h rooted at a hypernode v, and a budget B 1. Output: An unbiased estimator v C SE of the total cost of forest T v. S(X 1 ) = {v 4, v 5, v 6 }. 1 (Initialization): Set k 0, D 1, X 0 = v and C SE c(x 0 )/ X 0. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k. 3 (Terminal position?): If S(X k ) = 0, the algorithm stops, returning v C SE as an estimator of Cost(T v). 4 (Advance): Choose hyper node X k+1 H(X k ) at random, each choice being equally likely. (Thus, each choice occurs with probability 1/ H(X k ).) Set D k = S(X k ) and D D X k k D, then set ( c(xk+1 ) C SE C SE + X k+1 1 and return to Step 2. ) D. Increase k by v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 v6, 9 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

43 SE Algorithm with B = 2 Input: A forest T v of height h rooted at a hypernode v, and a budget B 1. Output: An unbiased estimator v C SE of the total cost of forest T v. S(X 1 ) = {v 4, v 5, v 6 }. 1 (Initialization): Set k 0, D 1, X 0 = v and C SE c(x 0 )/ X 0. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k. 3 (Terminal position?): If S(X k ) = 0, the algorithm stops, returning v C SE as an estimator of Cost(T v). 4 (Advance): Choose hyper node X k+1 H(X k ) at random, each choice being equally likely. (Thus, each choice occurs with probability 1/ H(X k ).) Set D k = S(X k ) and D D X k k D, then set ( c(xk+1 ) C SE C SE + X k+1 1 and return to Step 2. ) D. Increase k by v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 v6, 9 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

44 SE Algorithm with B = 2 Input: A forest T v of height h rooted at a hypernode v, and a budget B 1. Output: An unbiased estimator v C SE of the total cost of forest T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C SE c(x 0 )/ X 0. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k. 3 (Terminal position?): If S(X k ) = 0, the algorithm stops, returning v C SE as an estimator of Cost(T v). 4 (Advance): Choose hyper node X k+1 H(X k ) at random, each choice being equally likely. (Thus, each choice occurs with probability 1/ H(X k ).) Set D k = S(X k ) and D D X k k D, then set ( c(xk+1 ) C SE C SE + X k+1 1 and return to Step 2. ) D. Increase k by k = 2, X 2 = {v 4, v 6 }, D = = 3, C SE = = 31. v2, 1 v4, 3 v7, 4 v8, 2 v9, 1 v1, 7 v5, 1 v3, 5 v10, 14 v11, 10 v6, 9 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

45 SE Algorithm with B = 2 Input: A forest T v of height h rooted at a hypernode v, and a budget B 1. Output: An unbiased estimator v C SE of the total cost of forest T v. S(X 2 ) = {v 7, v 8, v 9 }. 1 (Initialization): Set k 0, D 1, X 0 = v and C SE c(x 0 )/ X 0. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k. 3 (Terminal position?): If S(X k ) = 0, the algorithm stops, returning v C SE as an estimator of Cost(T v). 4 (Advance): Choose hyper node X k+1 H(X k ) at random, each choice being equally likely. (Thus, each choice occurs with probability 1/ H(X k ).) Set D k = S(X k ) and D D X k k D, then set ( c(xk+1 ) C SE C SE + X k+1 1 and return to Step 2. ) D. Increase k by v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 v6, 9 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

46 SE Algorithm with B = 2 Input: A forest T v of height h rooted at a hypernode v, and a budget B 1. Output: An unbiased estimator v C SE of the total cost of forest T v. S(X 2 ) = {v 7, v 8, v 9 }. 1 (Initialization): Set k 0, D 1, X 0 = v and C SE c(x 0 )/ X 0. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k. 3 (Terminal position?): If S(X k ) = 0, the algorithm stops, returning v C SE as an estimator of Cost(T v). 4 (Advance): Choose hyper node X k+1 H(X k ) at random, each choice being equally likely. (Thus, each choice occurs with probability 1/ H(X k ).) Set D k = S(X k ) and D D X k k D, then set ( c(xk+1 ) C SE C SE + X k+1 1 and return to Step 2. ) D. Increase k by v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 v6, 9 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

47 SE Algorithm with B = 2 Input: A forest T v of height h rooted at a hypernode v, and a budget B 1. Output: An unbiased estimator v C SE of the total cost of forest T v. 1 (Initialization): Set k 0, D 1, X 0 = v and C SE c(x 0 )/ X 0. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k. 3 (Terminal position?): If S(X k ) = 0, the algorithm stops, returning v C SE as an estimator of Cost(T v). 4 (Advance): Choose hyper node X k+1 H(X k ) at random, each choice being equally likely. (Thus, each choice occurs with probability 1/ H(X k ).) Set D k = S(X k ) and D D X k k D, then set ( c(xk+1 ) C SE C SE + X k+1 1 and return to Step 2. ) D. Increase k by k = 3, X 2 = {v 8, v 9 }, D = = 4.5, C SE = = v2, 1 v4, 3 v7, 4 v8, 2 v9, 1 v1, 7 v5, 1 v3, 5 v10, 14 v11, 10 v6, 9 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

48 SE Algorithm with B = 2 Input: A forest T v of height h rooted at a hypernode v, and a budget B 1. Output: An unbiased estimator v C SE of the total cost of forest T v. S(X 3 ) =, C SE = (Initialization): Set k 0, D 1, X 0 = v and C SE c(x 0 )/ X 0. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k. 3 (Terminal position?): If S(X k ) = 0, the algorithm stops, returning v C SE as an estimator of Cost(T v). 4 (Advance): Choose hyper node X k+1 H(X k ) at random, each choice being equally likely. (Thus, each choice occurs with probability 1/ H(X k ).) Set D k = S(X k ) and D D X k k D, then set ( c(xk+1 ) C SE C SE + X k+1 1 and return to Step 2. ) D. Increase k by v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 v6, 9 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

49 SE Algorithm with B = 2 Input: A forest T v of height h rooted at a hypernode v, and a budget B 1. Output: An unbiased estimator v C SE of the total cost of forest T v. S(X 3 ) =, C SE = (Initialization): Set k 0, D 1, X 0 = v and C SE c(x 0 )/ X 0. 2 (Compute the successors): Let S(X k ) be the set of all successors of X k. 3 (Terminal position?): If S(X k ) = 0, the algorithm stops, returning v C SE as an estimator of Cost(T v). 4 (Advance): Choose hyper node X k+1 H(X k ) at random, each choice being equally likely. (Thus, each choice occurs with probability 1/ H(X k ).) Set D k = S(X k ) and D D X k k D, then set ( c(xk+1 ) C SE C SE + X k+1 1 and return to Step 2. ) D. Increase k by v1, 7 v2, 1 v3, 5 v4, 3 v5, 1 v7, 4 v8, 2 v9, 1 v10, 14 v11, 10 v6, 9 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

50 SE Algorithm Variance for the hairbrush tree with B = 2 v1 v2 v2 v3 v3 v4 vn vn+1 vn+1 Figure: The hair brush tree. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

51 SE Algorithm Variance for the hairbrush tree with B = 2 v1 v2 v2 v3 v3 v4 vn vn+1 vn+1 Figure: The hair brush tree. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

52 SE Algorithm Variance for the hairbrush tree with B = 2 v 1 v 2 v 2 v 3 v 3 v 4 v n v n+1 v n+1 Figure: The hair brush tree. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

53 SE Algorithm Variance for the hairbrush tree with B = 2 v 1 v 2 v 2 v 3 v 3 v 4 v n v n+1 v n+1 Figure: The hair brush tree. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

54 SE Algorithm Variance for the hairbrush tree with B = 2 v 1 v 2 v 2 v 3 v 3 v 4 v n v n+1 v n+1 Figure: The hair brush tree. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

55 SE Algorithm Variance for the hairbrush tree with B = 2 v2 v1 v2 The expectation and variance of the Knuth s estimator are v3 v4 v3 E (C SE) = 1 }{{} P(visit v n+1 ) 2 }{{} D 1 2 }{{} c(xn) xn = 1, vn+1 vn vn+1 and ( ) ( E CSE 2 = ) 2 = 1 2 Var (C SE) = 0. Figure: The hair brush tree. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

56 SE Algorithm Variance for the hairbrush tree with B = 2 v1 The expectation and variance of the Knuth s estimator are v2 v3 v2 v3 E (C SE) = 1 }{{} P(visit v n+1 ) 2 }{{} D 1 2 }{{} c(xn) xn = 1, v4 vn and ( ) ( E CSE 2 = ) 2 = 1 2 Var (C SE) = 0. vn+1 vn+1 Figure: The hair brush tree. CV 2 = Var (CSE) E (C SE) 2 = 0. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

57 Analysis Radidlav Vaisman and Dirk P. Kroese (2014) Stochastic Enumeration Method for Counting Trees. se-tree-jacm.pdf Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

58 Analysis Radidlav Vaisman and Dirk P. Kroese (2014) Stochastic Enumeration Method for Counting Trees. se-tree-jacm.pdf Theorem (Unbiased Estimator) Let T v be tree rooted at v. Then, E(C SE (T v )) = Cost (T v ). Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

59 Analysis SE s variance Theorem (Stochastic Enumeration Algorithm Variance) Let v be a hyper node and let H(S(v)) = {w 1,..., w d } be its set of hyper children. Then, Var (C SE (T v )) = + ( ) 2 S(v) v d ( ) 2 S(v) v d 2 1 j d 1 i<j d Var ( C SE ( Twj )) ( Cost (T wi ) Cost ( )) 2 T wj w i w j Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

60 Analysis upper bound on SE s variance (1) v w 1 w 2 w 3 Cost(Tw1) Cost(Tw2) Cost(Tw d ) Suppose that Cost (T w1 ) Cost (T w2 ) Cost (T w3 ). Then, the SE Algorithm can be very efficient! Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

61 Analysis upper bound on SE s variance (2) Theorem Suppose without loss of generality that H(S (m) (v)) = {w 1,..., w d } and there exists constant a such that Cost (T w1 ) w 1 Cost (T w 2 ) w 2 Cost (T w d ) w d a Cost (T w 1 ). w 1 Then, the variance of SE estimator satisfies [ ] Cost Var (C SE (T v )) (β h (Tv ) 2 1), v ( ) where β = a 2 +2a+1 4a. That is, CV β h 1. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

62 Analysis upper bound on SE s variance (3) Is this good enough? CV β h 1 Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

63 Analysis upper bound on SE s variance (3) CV β h 1 Is this good enough? Unfortunately, for the majority of applications β > 1... Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

64 Some numerical results (1) Consider the following, very structured tree of height h. We define c(v) = 1 for all v V. The root has 3 children. The leftmost child becomes the root of full binary tree and the rest of the children will continue the root behavior recursively. a b c d Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

65 Some numerical results (2) For Knuth s algorithm, the following holds: CV 2 1.4h 1 16(h + 1) 2. Nevertheless, SE performance with B = h is quite satisfactory Knuth numerical cv Knuth analytical cv SE numerical cv cv h cv h Figure: The performance of Knuth s Algorithm and the SE Algorithm on counting recursive trees of different heights. Left panel: Knuth. Right panel: SE. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

66 Random trees Definition (Family of random trees) Consider a probability vector p = (p 0,..., p k ) that corresponds to the probability of a vertex to have 0,..., k successors respectively. Define a family of random trees F h p as all possible trees of hight at most h that are generated using p up to the level h. The family F h p is fully characterized by the probability vector p and the parameter h. The tree generation corresponds to a branching process. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

67 Random trees Definition (Family of random trees) Consider a probability vector p = (p 0,..., p k ) that corresponds to the probability of a vertex to have 0,..., k successors respectively. Define a family of random trees F h p as all possible trees of hight at most h that are generated using p up to the level h. The family F h p is fully characterized by the probability vector p and the parameter h. The tree generation corresponds to a branching process. Objective Let T = (V, E) be a random tree from F h p. By assigning the cost c(v) = 1 for all v V, the cost of the tree Cost(T ) is equal to V. Our objective is to analyse the behavior of Knuth s and SE s estimators under this setting. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

68 Super critical branching process Consider a random tree rooted at v 0 and let R m be the total number of children (population size) at level (generation) m and denote by M m the total progeny at generation m. Define µ = E(R 1 ) = jp j and σ 2 = Var(R 1 ) = j 2 p j µ 2. 0 j k From [Pakes 1971], ν m = E(M m ) = E j m R t 0 j k = 1 µm+1 1 µ, and ζ 2 m = Var(M m ) = σ 2 [ ] 1 µ 2m+1 (1 µ) 2 (2m + 1)µ m. 1 µ Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

69 Random trees expected performance Theorem (Knuth v.s SE) For a random tree T (h) the following holds. 1 Lower bound on Knuth s expected variance satisfies: ( ( ( E Var C T (h)) (h) )) ( T σ 2 + µ 2 ) 1 (σ 2 + µ 2) h µ 1 ( σ 2 + µ 2). 2 For ( hk 2 ln 2h(σ 2 + µ 2 σ ) 2 ) µ (µ 1) B max 3 2(µ 1) 2, hσ 2 µ 2, the upper bound on SE s expected variance satisfies: ( E Var (C SE (T (h)) (h) )) ( ) T B 2 heµ 2h σ 2 µ (µ 1) The SE Algorithm introduces an expected variance reduction that is approximately equal to ( ) h 1 + σ2 µ 2. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

70 How about the performance in practice? We expect that variance reduction is governed by ( 1 + σ2 µ 2 ) h term. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

71 How about the performance in practice? ( ) h We expect that variance reduction is governed by 1 + σ2 µ term. 2 For Model 1 we choose p = (0.3, 0.4, 0.1, 0.2), and h = 60. The true number of nodes is Knuth s performance is very bad. p = (0.3, 0.4, 0.1, 0.2) µ = 1.2, σ 2 = 2.6 ( ) Table: Knuth s Algorithm. Table: SE Algorithm Run Ĉ RE Average Run Ĉ SE RE Average Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

72 How about the performance in practice? ( ) h We expect that variance reduction is governed by 1 + σ2 µ term. 2 For Model 2 we choose p = (0.5, 0.1, 0.2, 0.2, 0.1), and h = 30. The true number of nodes is 551. Knuth s performance is very bad. p = (0.5, 0.1, 0.2, 0.2, 0.1) µ = 1.5, σ 2 = 2.05 ( ) Table: Knuth s Algorithm. Run Ĉ RE Average Table: SE Algorithm Run Ĉ SE RE Average Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

73 How about the performance in practice? ( ) h We expect that variance reduction is governed by 1 + σ2 µ term. 2 For Model 3 we choose p = (0.0, 0.7, 0.2, 0.1), and h = 30. The true number of nodes is Knuth s performance is good. p = (0.0, 0.7, 0.2, 0.1) µ = 1.4, σ 2 = 0.44 ( ) Table: Knuth s Algorithm. Run Ĉ RE Average Table: SE Algorithm Run Ĉ SE RE Average Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

74 Fully Polynomial Randomized Approximation Scheme A randomized approximation scheme for Cost(T ) is a non-deterministic algorithm which, when given an input tree T and a real number ε (0, 1), outputs a random variable K such that P ((1 ε)cost(t ) K (1 + ε)cost(t )) 3 4. Such a scheme is said to be fully polynomial if its execution time is bounded by some polynomial in the tree height and ε 1. If these conditions holds, such algorithm is a fully polynomial randomized approximation scheme or FPRAS. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

75 Random trees FPRAS Theorem (Almost sure FPRAS) Let F h p be a family of random trees such that for T F h p ( lim P Cost(T ) < 1 ) h P(h) ν h = 0, where P(h) > 0 is some polynomial function in h and ν h = 1 µh+1 1 µ is the expected number of nodes. In other words, for most instances, (almost surely), the actual number of nodes is not much smaller than the expectation. Then, under the above condition, and provided that µ > 1 + ε for any ε > 0, the SE algorithm is FPRAS for most of the instances in T F h p, that is, CV 2 = Var (CSE(T ) T ) (E (C SE(T ) T )) 2 is bounded by a polynomial in h with high probability. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

76 SE in practice Network Reliability and Sensitivity Terminal network reliability problems appear in many reallife applications, such as transportation grids, social and computer networks, communication systems, etc. This problem belongs to #P complexity class. s t Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

77 The Spectra Definition (Spectra not very formal) The probability (F (k)) of finding a failure set of size k (0 k # of edges) is called the Spectra. How many failure sets of size 2 are there? s t Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

78 The Spectra Definition (Spectra not very formal) The probability (F (k)) of finding a failure set of size k (0 k # of edges) is called the Spectra. How many failure sets of size 2 are there? F (2) = 2/ ( ) 10 2 s t Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

79 The Spectra why do we care? As soon as the Spectra is available we get the following benefits. Reliability calculating the network reliability Ψ(p) in linear time: E ( ) E Ψ(p) = F (k)p k (1 p) E k. k k=0 Sensitivity Birnbaum Importance Measure: BIM j = Ψ(p) p j. Sensitivity Joint Reliability Importance JRI (ij) = 2 Ψ(p) p i p j. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

80 Estimating the Spectra. Unfortunately, the Spectra is rarely available analytically. Crude Monte Carlo is not applicable rare events problem. The state of the art Permutation Monte Carlo (PMC) is better but still fails under the rare event settings. Our suggestion the SE algorithm. (Quite straight forward extension of PMC) Radidlav Vaisman, Dirk P. Kroese and Ilya B. Gertsbakh (2014) Improved Sampling Plans for Combinatorial Invariants of Coherent Systems IEEE transactions on reliability, submitted, minor revision. papers/se-spectra-ieee.pdf Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

81 Estimating the Spectra with SE an example (1) The hypercube graph H n is a regular graph with 2 n vertices and n2 n 1 edges. In order to construct a hypercube graph, label every 2 n vertices with n-bit binary numbers and connect two vertices by an edge whenever the Hamming distance of their labels is 1. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

82 Estimating the Spectra with SE an example (2) We consider H 5 with two terminals K = {0, 24}; that is (00000, 11000) in the binary representation. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

83 Estimating the Spectra with SE an example (2) We consider H 5 with two terminals K = {0, 24}; that is (00000, 11000) in the binary representation. Using full enumeration procedure we found that the first non zero value is F (4) and it is equal to Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

84 Estimating the Spectra with SE an example (2) We consider H 5 with two terminals K = {0, 24}; that is (00000, 11000) in the binary representation. Using full enumeration procedure we found that the first non zero value is F (4) and it is equal to For this relatively small graph the state of the art Permutation Monte Carlo (PMC) algorithm needs huge sample size. Using N = 10 9 samples takes about 25 hours on my Core i5 laptop, The related error is about 60%. Why? The minimal value that must be estimated by PMC is is rare event! Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

85 Estimating the Spectra with SE an example (2) We consider H 5 with two terminals K = {0, 24}; that is (00000, 11000) in the binary representation. Using full enumeration procedure we found that the first non zero value is F (4) and it is equal to For this relatively small graph the state of the art Permutation Monte Carlo (PMC) algorithm needs huge sample size. Using N = 10 9 samples takes about 25 hours on my Core i5 laptop, The related error is about 60%. Why? The minimal value that must be estimated by PMC is is rare event! The SE delivers very reliable estimates in 28 seconds with budget B = 10 and N = The related error is about 1%. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

86 What next? (Hard) finding more classes of trees that can be efficiently handled by SE, (that is, show proven performance guarantees like for the random tree case). (Not very hard) Adaptation of SE for estimation of general expression: E(S(x)). (Easy) Extending different Sequential Monte Carlo algorithms with SE mechanism (splitting). (???) Adaptation of SE for optimization. (???) Introducing Importance Sampling to SE estimator. Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

87 Thank you Slava Vaisman (UQ) Stochastic enumeration January 11, / 37

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Universität zu Lübeck Institut für Theoretische Informatik Lecture notes on Knowledge-Based and Learning Systems by Maciej Liśkiewicz Lecture 5: Efficient PAC Learning 1 Consistent Learning: a Bound on