Simple Learning Algorithms for. Decision Trees and Multivariate Polynomials. xed constant bounded product distribution. is depth 3 formulas.

Size: px

Start display at page:

Download "Simple Learning Algorithms for. Decision Trees and Multivariate Polynomials. xed constant bounded product distribution. is depth 3 formulas."

Roderick Bryan
5 years ago
Views:

1 Simple Learning Algorithms for Decision Trees and Multivariate Polynomials Nader H. Bshouty Department of Computer Science University of Calgary Calgary, Alberta, Canada Yishay Mansour Department of Computer Science Tel-Aviv University Tel-Aviv, Israel Abstract In this paper we develop a new approach for learning decision trees and multivariate polynomials via interpolation of multivariate polynomials. This new approach yields simple learning algorithms for multivariate polynomials and decision trees over nite elds under any constant bounded product distribution. The output hypothesis is a (single) multivariate polynomial that is an -approximation of the target under any constant bounded product distribution. The new approach demonstrates the learnability of many classes under any constant bounded product distribution and using membership queries, such as j-disjoint DNF and multivariate polynomial with bounded degree over any eld. The technique shows how to interpolate multivariate polynomials with bounded term size from membership queries only. This in particular gives a learning algorithm for O(log n)-depth decision tree from membership queries only and a new learning algorithm of any multivariate polynomial over suciently large elds from membership queries only. We show that our results for learning from membership queries only are the best possible. Introduction Two techniques were used in the literature for PAClearning decision tree with membership queries, the Fourier transform technique and the Lattice based techniques. Kushilevitz and Mansour [KM93] gave a technique for learning decision trees under the uniform distribution via the Fourier Spectrum. Jackson [J94] extended the result to learning DNF under the uniform distribution. The output hypothesis is a majority of parities. Jackson [J95] generalizes his DNF learning algorithm from uniform distribution to any xed constant bounded product distribution. Denition A product distribution is xed constant bounded if there is a constant 0 < c < =2, that is independent of the number of variables n, such that for any variable x i, c P rob[x i = ]? c. Bshouty [Bs93] gave a technique for learning decision trees under any distribution via the Monotone Theory. Schapire and Sellie [SS93] gave a Lattice based algorithm for learning multivariate polynomials over the binary eld under any distribution. In the former the output hypothesis for the decision tree is depth 3 formulas. Both techniques, the Fourier Spectrum and the Lattice based algorithms give also learnability of many other classes such as learning decision trees over parities (nodes contains parities) under constant bounded product distributions and learning CDNF (poly size DNF that has poly size CNF) under any distribution. In this paper we develop a new approach to learning decision trees and multivariate polynomials via interpolation of multivariate polynomials over GF (2). This new approach leads to simple learning algorithms for decision trees over the uniform and constant bounded product distributions, where the output hypotheses is a multivariate polynomial (parity of monotone terms). The algorithm we develop gives a single hypothesis that approximate the target with respect to any constant bounded product distribution. In fact the hypothesis is a good hypothesis under any distribution that supports small terms. Denition 2 A distribution D supports small terms of T if for every T 2 T, of size!(log n), satises Pr D [T = ] = =!(poly(n)), where n is the number of variables.

2 The new approach also solves learnability of other problems, some of which where not known to be learnable. These problems include: () PAC-learning with membership queries of multivariate polynomials over the binary eld with nonmonotone terms under any distribution that support small terms. In particular disjoint DNF (the conjunction of every two terms is 0), j- disjoint DNF (the conjunction of any j terms is 0) for constant j, decision trees and any polynomial number of Xor of them are PAC-learnable with membership queries under any distribution that supports small terms. The output hypotheses of the learning algorithm is a multivariate polynomial. Learning multivariate polynomials (with monotone terms) with membership and equivalence queries is shown in [SS93], thus multivariate polynomials are PAC-learnable under any distribution. Our contribution is to show the learnability when the terms are not monotone. It is also known that any DNF is PAC-learnable with membership queries under constant bounded product distribution [J95], where the output hypothesis is a majority of parities. Our contribution for j- disjoint DNF is to use an output hypothesis that is a parity of terms and to show that the output hypothesis is an approximation of the target against any constant bounded distribution. Our technique demonstrates the learnability of a Xor of these classes, which was not achieved using previous approaches. We also have an extension of () to any eld, showing: (2) PAC-learning with membership queries of decision trees with leaves from some eld F and any polynomial sum of them under any distribution that support small terms. We also study the learnability of multivariate polynomials from membership queries only. We show (3) Learning multivariate polynomials over n variables with maximal degree d < c for each variable, where c is constant, and with terms of size k = O d (log n + log d) from membership queries only. This result implies learning decision trees of depth O(log n) with leaves from a eld F from membership queries only. We also show that the above term size is tight, i.e., there is no membership query algorithm that learns multivariate polynomials over n variables with maximal degree d that contain terms of size greater than k. The rst result in (3) is a generalization of the result in [B95b]. In [B95b] the learning algorithm uses membership and equivalence queries. The second result is a generalization of the result in [KM93] for learning boolean decision tree from membership queries. Result (3) also gives: (4) An algorithm for learning any multivariate polynomial over elds of size q = n=(d(log n + log d)) from membership queries only. We also show that the above eld size q is tight, i.e., if =!(q) then no polynomial time learning algorithm exists. This result is a generalization of the results in [BT88, CDG+9, Z90] for learning multivariate polynomials under any eld. Previous algorithms for learning multivariate polynomial over nite elds F require asking membership queries with assignments in some extension of the eld F [CDG+9]. In [CDG+9] it is shown that an extension n of the eld is sucient to interpolate any multivariate polynomial (when membership queries with assignments from an extension eld are allowed). Our result in (4.) improves this extension bound to + log(n)= log + log n. As in many other interpolation problems, testing for zero polynomials plays a crucial rule in our algorithms. We show that: (5) The following problems for multivariate polynomials are equivalent () Distinguishing a multivariate polynomial from zero. (2) Distinguishing any two multivariate polynomials. (3) Deciding whether a multivariate polynomial depend on some variable. (4) Learning a multivariate polynomial. 2 Simple Algorithm for the Boolean Domain Let MUL(n; t; k) be the set of all multivariate polynomials over the binary eld over n variables with t terms where each term is of size k, we will assume that d = O(log n). It is not hard to see that a decision tree of depth d can be represented in MUL(n; 2 d ; d), and that a j-disjoint d-dnf can be represented in

3 MUL(n; n j ; d). So for constant j and d = O(logn) the number of terms is polynomial. We rst show how to zero-test elements in MUL(n; t; k). Let f 2 MUL(n; t; k). Choose a term T = x i x ik of maximal size in f. Randomly and uniformly choose values from f0; g for the variables not in T. The projection will not be the zero function because the term T will stay alive in the projection. Since the projection is a nonzero function with k = O(log n) variables there is at least one assignment for x i ; : : :; x ik that gives value for the function. This show that for a random and uniform assignment a, f(a) = with probability at least =2 k = =poly(n), for k = O(logn). So to zero test a function in f 2 MUL(n; t; k) randomly and uniformly choose polynomial number of assignments a i. If f(a i ) is zero for all the assignments then with high probability we have f 0. Claim 3 For f 2 M U L(n; t; O(logn)), there is a polynomial time probabilistic zero testing algorithm, that succeeds with high probability. We now show how to reduce zero-test to learning. Let f 0 = f. Since we can zero-test we can nd the minimal i such that f 0 j x 0;:::;:::;x i 0 0. This implies that f x 0;:::;:::;x i? 0 = x i f (x i+ ; : : :; x n ) for some multivariate polynomial f. We continue recursively with f = f x 0;:::;:::;x i? 0;xi until f k, in this case x i x ik is a term in f. Now dene ^f = f + x i x ik. This removes a term from f, and thus ^f 2 MUL(n; t? ; k). We continue recursively with ^f until we recover all the terms of f. Claim 4 For f 2 M U L(n; t; O(logn)), there is a polynomial time probabilistic interpolation algorithm, that succeeds with high probability. Now let MUL NEG(n; t) be the set of all boolean multivariate polynomials with t nonmonotone terms. (This class includes the class of decision trees and j- disjoint DNF.) Let f 2 MUL NEG(n; t). To PAC-learn f we randomly choose an assignment a and dene f 0 = f(x + a). A term in f of size k will have on average k=2 positive literals in f 0, and terms with k = (log n) variable will have with high probability (k) positive literals. We perform a zero-restriction, i.e. for each i, with probability =2 we substitute x i 0 in f 0. This ensures that with high probability the projection f 00 is in M U L(n; poly(n); O(logn)), i.e. a polynomial with poly(n) monotone terms. We can use the previous algorithm to learn f 00. Since we performed a zero restriction, we only deleted monotone terms from f 0, therefore, the terms of f 00 are terms in f 0. We continue to take zero-restrictions and collect terms of f 0 until the set of terms that we have denes a polynomial which is a good approximation of f 0. We get a good approximation of f 0 since we collect all the small (i.e. O(log n)) size terms. A complete analysis of this algorithm is given in section 6. Claim 5 For f 2 MUL NEG(n; t), there is a polynomial time probabilistic learning algorithm, that succeeds with high probability. 3 Multivariate Interpolation Let X f = a x xn n 2I be a multivariate polynomial over the eld F where a 2 F and ; : : :; n are integers. We will denote the class of all multivariate polynomials over the eld F and over the variables x ; : : :; x n by F[x ; : : :; x n ]. The number of terms of f is denoted by jfj. We have jfj = jij when all a are not zero. When f = 0 then jfj = 0 and when f = c 2 Fnf0g then jfj =. Let d be the maximal degree of variables in f, i.e., I [d] n where [d] = f0; ; : : :; dg. Suppose F 0 = f 0 ; : : :; d g F are d + distinct eld constants where 0 = 0 is the zero of the eld. A univariate polynomial f(x ) 2 F[x ] over the eld F of degree at most d can be interpolated from membership queries as follows. Suppose f(x ) = (d) (f)x d + + () (f)x + (0) (f) where (i) (f) is the coecient of x i in f in its polynomial representation. Then 8 >< >: f( 0 ) = (d) (f) d () (f) 0 + (0) (f) f( ) = (d) (f) d + + () (f) + (0) (f)... f( d ) = (d) (f) d + + d () (f) d + (0) (f): This is a linear system of equations and can be solved for (i) (f), as follows, det d 0 i+ 0 f( 0 ) i? 0 0 d i+ f( ) i? d d i+ d f( d ) i? d d detjv ( 0 ; : : :; d )j ()

4 where V ( 0 ; : : :; d ) is the Vandermonde matrix. If f is a multivariate polynomial then f can be written as f(x ; : : :; x n ) = (d) (f)x d + + () (f)x + (0) (f) where (i) (f) is a multivariate polynomial over the variables x 2 ; : : :; x n. We can still use to nd (i) (f). Just replace each f( i ) with f( i ; x 2 ; : : :; x n ). Notice that from the rst equation in the system, since 0 = 0, we have (0) (f) = f(0; x ; : : :; x n ): (2) >From a membership query for (i) can be simulated using d + membership queries to f. From (2), a membership query to (0) can be simulated using one membership query to f. We now extend the operators as follows: for i = (i ; : : :; i k ) 2 [d] k i = ik ik? i : Here always operates on the variable with the smallest index. So i operates on x in f to give a function f 0 that depends on x 2 ; : : :; x n. Then i2 operates on x 2 in f 0 and so on. We will also write x i for the term x i xi2 2 xik k. The weight of i, denoted by wt(i), is the number of nonzero entries in i. The operator i (f) gives the coecient of x i in f when represented in F[x 2 ; : : :; x n ] [x ], the operator i (f) gives the coecient of x i when f is represented in F[x k+ ; : : :; x n ][x ; : : :; x k ]: Suppose I [d] k be such that i f 6= 0 for all i 2 I and i f = 0 for all i 62 I, that is, x i for i 2 I are the k-suxes of all terms of f. Here the k-sux of a term x i xin n is x i xik. Since k i 2 I if and only if xi is a k-sux of some term in f, it is clear that jij jfj and we must have 8 >< >: f = X i2i( i f)x i : We now will show how to simulate membership queries for ( i f)(x k+ ; : : :; x n ), i 2 I, using a polynomial number (in n and jfj) of membership queries to f. Suppose we want to nd ( i f)(c) for some c 2 F n?k using membership query to f. We take r assignments ^ ; : : :; ^ r 2 F k and ask membership queries for (^ i ; c) for all i = ; : : :; r. If f(^ i ; c) =! i then P i2i (i f)(c)^ i =!... P i2i (i f)(c)^ r i =! r : Now if I = fi; : : :; irg and detjm[^ j ; ij]j 6= 0 for M[^ j ; ij] = ^ i ^ ir... ^ r i ^ r ir then the above linear system of equations can be solved in time poly(r) = poly(jij) poly(jfj). The solution gives ( i f)(c). The existence of ^ i where the above determinant is not zero will be proven in the next section. 4 From Zero-testing to Learning for any Field In this section we show how to use the results from the previous section to learn multivariate polynomials. Let MUL F (n; k; t; d) be the set of all multivariate polynomial over the eld F over n variables with t terms where each term is of size k and the maximal degree of each variable is at most d. We would like to answer the following questions. Let f 2 MUL F (n; k; t; d).. Is there a polynomial time algorithm that uses membership queries to f and decides whether f 0? 2. Given i n. Is there a polynomial time algorithm that uses membership queries to f and decides whether f depends on x i? 3. Given fi; : : :; irg [d] n where wt(ij) k for all j and r t. Is there an algorithm that runs in polynomial time and nds ^ ; : : :; ^ r 2 F k such that ^ i ^ ir... 6= 0: ^ r i ^ r ir 4. Is there a polynomial time algorithm that uses membership queries to f and identies f. When we say polynomial time we usually mean polynomial time in n; k; t and d but all the results of this section are also true for any time complexity T except that to solve 4 we get a blow up of poly(n; t) in the complexity. We show that,2 and 4 are equivalent and ) 3. Obviously 2 ), 4 ) and 4 ) 2. We will show ) 2, ) 3, and )

5 To prove ) 2 notice that f 2 MUL F (n; k; t; d) is independent of x i if and only if g = fj xi?fj xi 0 0. Since g is the coecient of x i in f we have g 2 MUL F (n; k; t; d). Therefore we can zero-test g in polynomial time. To prove ) 3, let ^ ; : : :; ^ s be a zero-test for functions in MUL F (n; k; t; d), that is, run the algorithm that zero-test for the input 0 and take all the membership queries in the algorithm ^ ; : : :; ^ s. We now have f 2 MUL F (n; k; t; d) is 0 if and only if f(^ i ) = 0 for all i = ; : : :; s. Consider the s r matrix with rows [^ i j ; : : :; ^ir j ]. If this matrix have rank r then we choose r linearly independent rows. If the rank is less than r then its columns are dependent and therefore there are constants c i, i = ; : : :; r such that rx i= c i^ ii j = 0 for j = ; : : :; s: This shows that the multivariate P P r polynomial c i= ix ii is 0 for r all ^ ; : : :; ^ s. Since c i= ix ii 2 MUL F (n; k; t; d) we get a contradiction. Now we show that ) 4. This will use results from the previous section. The algorithm rst checks whether f depends on x, and if yes it generates a tree with a root labeled with x that has d children. The ith child is the tree for i (f). If the function is independent of x it builds a tree with one child for the root. The child is 0 (f). We then recursively build the tree for the children. The previous section shows how to simulate membership queries at each level in polynomial time. This algorithm obviously works. It correctness follows immediately from the previous section and ()-(3). The complexity of the algorithm is the size of the tree times the membership query simulation. Since the size of the tree at each level is bounded by the number terms in f and since the depth of the tree is bounded by n, the tree has at most O(nt) nonzero nodes. The total number of nodes is at most a factor of d from the nonzero nodes. Thus the algorithm have complexity the same as zero testing with a blow up of poly(n; t; d) queries and time. Now that we have reduced the problem to zero testing we will investigate in the next section the complexity of zero testing of MUL F (n; k; t; d). 5 Zero-test of M U L F (n; k; t; d) In this section we will study the zero testing of MUL F (n; k;?; d) when the number of terms is unknown and might be exponentially large. The time complexity for the zero testing should be polynomial in n and d (we have k < n so it is also polynomial in k). We will show the following Theorem. We have MUL F (n; k;?; d), where d c, is zero testable in randomized polynomial time in n and d for some constant c if and only if k = O d (log n + log d) : The algorithm for the zero testing is simply to randomly and uniformly choose poly(n; d) points a i from F n and nd f(a i ). If for all the points a i, f is zero then with high probability f 0. The condition d < c for some constant d is a necessary condition for ecient learning. If d is close to (say d =? ) then k = O(log n= log ) < for =!(n). This bound is also tight. This theorem implies Theorem 2. The class MUL F (n; k; t; d) where d < c for some constant c is learnable in randomized polynomial time (in n, d and t) from membership queries only if and only if k = O d (log n + log d) : The proofs are given in the Appendix. 6 PAC-learning multivariate polynomials with membership queries In this section we give an algorithm that PAClearns with membership queries any multivariate polynomial with nonmonotone terms under distributions that support small terms. We will rst consider the boolean multivariate polynomials and later in this section show how to generalize it to any eld. For the analysis of the correctness of the algorithm we rst need to formalize the notion of distributions that support small terms. The following is one way to dene this notion. Denition. Let D c;t; be the set of distributions that satisfy the following: For every D 2 D c;t; and any DNF h with t terms of size greater than c log(t=) we have Pr[f = ] : D Notice that all the constant bounded product distributions D where? c 0 Pr D [x i = ] c 0 for all i are in D = log(=c 0 );t;.

6 A very rough analysis is given here to show that polynomial time algorithm exists. In the full paper a more careful analysis will be done to get the best possible constants. Let f = T + + T t be a multivariate polynomial. Suppose jt j jt 2 j jt t j. Our algorithm starts by choosing a random assignment a and denes f 0 (x) = f(x + a). All terms that are of size s (in f 0 ) will contain on average s=2 positive literals. Therefore by Cherno bound with high probability all the terms of size more than 64c log(t=) will contain at least 6c log(t=) positive literals. Also all terms of size 4c log(t=) with high probability will contain at least c log(t=) positive literals. Now we split the function f 0 into 3 functions f, f 2 and f 3. The function f = T + +T t will contain all terms that are of size at most 4c log(t=). The function f 2 = T t+ + +T t2 will contain all terms of size between 4c log(t=) and 64c log(t=) and the function f 3 = T t T t will contain all terms of size more than 64c log(t=). Now change f to a multivariate polynomial with monotone terms. Since the size of each term in f is at most 4c log(t=) the number of monotone terms in f will be at most t(t=) 4c. Now do the same for f 2 to get a multivariate polynomial with at most t(t=) 64c terms. Our algorithm will nd all the terms in f, some of the terms in f 2 and none of the terms in f 3. Therefore we will need the following claim. Claim. Let g = f + h where h is a multivariate polynomial that contains some of the terms in f 2. Then for any D 2 D c;t; we have Let Proof. The error is Pr D Pr[g 6= f] : D [(f + h) + f = ] = Pr[h + f 2 + f 3 = ]: f 2 = ^Tt+ ~ Tt+ + + ^Tt2 ~ Tt2 where T i = ^T ti Tti ~, ^T ti is the part of the term that contains positive literals and T ~ ti is the part that contains the negative literals. When we change the terms of f 2 to monotone terms every monotone term in f 2 will contain one of the terms ^Ti, t + t t 2. Therefore we can write f 2 = ^T t+ f 2; + + ^T t2 f 2;t2?t where f 2;i are multivariate polynomial with monotone terms. Since h is a multivariate polynomial that contains some of the terms in f 2 we have f 2 + h = ^T t+ h 2; + + ^Tt2 h 2;t2?t. Since j ^Ti j c log(t=) for t + i t 2 and jt i j c log(t=) for i t 2 +, by D the denition of distribution that support small terms we have Pr[(h + f 2 ) + f 3 = ] Pr[ ^T t+ ^T t2 _ T t2+ T t ] : The algorithm will proceed as follows. We randomly and uniformly choose a zero restriction p of f 0. That is, we substitute x i 0 with probability =2 for each x i. This will on average leave n=2 variables alive in f 0. Since terms in f 3 have at least 6c log(t=) positive literals with high probability we will have f 3 (p) = 0. In f + f 2 some of the terms will vanish and some will stay alive. Since the projection f 0 (p) is in MUL F (n; 64c log(t=); t 64c+ ; ) we can use the algorithm from the previous section to learn the terms from membership queries only in time (t=) co(). After we learn g = f 0 (p) we dene the function f 00 = f 0 + g. This function have fewer terms in f. Now we do the same for f 00. That is, we take another zero restriction and collect terms g 2 that stay alive after the projection. We do that until f 0 + g + g g r is -close to 0. Notice that if M i is a term in f then a zero restriction will keep M i alive with probability =(t=) 4c so on average we need (t=) 4c projections to catch this term and t(t=) 8c projections to catch all the t(t=) 4c terms in f and t 0c projections to catch them with high probability. Notice also that since each term in f 3 has 6c log t positive literals each projection will make f 3 = 0 with probability at least? t=(t=) 6c and therefore the probability that f 3 = 0 in all the projections is at least? =(t=) 5c. Therefore we can make the probability of success for collecting all the terms of f to be greater than =4. This completes the description and the correctness of the algorithm. The above analysis algorithm can also be used to learn functions of the form f = T + + t T t where i 2 F, T i are boolean terms and + is the addition of a eld F. This gives the learnability of decision trees with leaves that contain elements from the eld F. References [A88] D. Angluin. Queries and concept learning. Machine Learning, 2(4):39{342, 988. [BT88] M. Ben-Or, P. Tiwari. A deterministic algorithm for sparse multivariate polynomial interpolation In Proceedings of the 20th Annual ACM 2

7 Symposium on Theory of Computing. pages 30{ 309, May 988. [Bl92] A. Blum. Learning boolean functions in an innite attribute space. Machine Learning 9(4), pages 373{ [Bs93] N. H. Bshouty. Exact learning via the monotone theory. In Proceedings of the 34th Symposium on Foundations of Computer Science. pages 302{3, November 993. [B95a] N. H. Bshouty. Simple Learning Algorithms Using Divide and Conquer. In Proceedings of the Annual ACM Workshop on Computational Learning Theory [B95b] N. H. Bshouty. A Note on Learning Multivariate Polynomials under the Uniform Distribution. In Proceedings of the Annual ACM Workshop on Computational Learning Theory [CDG+9] M. Clausen, A. Dress, J. Grabmeier, M. Karpinski. On zero-testing and interpolation of k- sparse multivariate polynomials over nite elds. Theoretical Computer Science. 84. pages 5{64, 99. [GKS90] D. Yu. Grigoriev, M. Karpinski, M. F. Singers. Fast parallel algorithms for sparse multivariate polynomial interpolation over nite elds. SIAM J. of Comp. 9(6). pages 059{063, 990. [J94] J. Jackson. An ecient membership-query algorithm for learning DNF with respect to the uniform distribution. In Proceeding of the 35th Annual Symposium on Foundations of Computer Science, 994. [J95] J. Jackson. On Learning DNF and related circuit classes from helpfull and not-so-helpful teachers, Ph.D. thesis, CMU, 995. [KM93] E. Kushilevitz and Y. Mansour. Learning decision trees using the Fourier spectrum. SIAM J. Computing, 22(6):33{348, 993. [Ma92] Y. Mansour. Randomized interpolation and approximation of sparse polynomials. In Automata, Languages and Programming: 9th International Colloquim. pages 26{272, July 992. (Also: Siam J. on Computing, vol. 2, num. 4, 995.) [RB89] M. Ron Roth and G. Benedek. Interpolation and approximation of sparse multivariate polynomials over GF(2). SIAM J. Computing, 20(2):29{34, 99. [SS93] R. E. Schapire, L. M. Sellie. Learning sparse multivariate polynomial over a eld with queries and counterexamples. In Proceedings of the Sixth Annual ACM Workshop on Computational Learning Theory. July, 993. [Val84] L. Valiant. A theory of the learnable. Communications of the ACM, 27():34{42, November 984. [Z90] R. Zippel. Interpolating polynomials from their values. Journal of Symbolic Computation, 9. pages. 375{ Appendix. Proof of Theorem Upper Bound. ; : : :; d 2 F and dene the function f k;d = ky dy j= i= (x j? i ): Let Denote by Z(k; d) the number of zeros of f k;d. It is easy to see that Z satises the recursive equation Z(k; d) = d k? + (? d)z(k? ; d) and Z(; d) = d. It is also easy to see that Z(k; d) = k? (? d) k : Now let (n; k; d) the maximal possible number of roots of a multivariate polynomial in MUL F (n; k;?; d). We will show the following facts. (n; k; d) n?k (k; k; d): 2. (k; k; d) Z(k; d). Both facts implies that if f 6 0 we randomly uniformly choose an assignment a 2 F n we get (n; k; d) Pr[f(a) 6= 0]? a n (k; k; d)? k Z(k; d)? k? d k? e?o dk poly(n; d) : d c

8 Therefore the expected running time to detect that f is not 0 is poly(n; d). It remain to prove conditions () and (2). To prove () let f 2 MUL F (n; k;?; d) with maximal number of roots. Let m be a term in f with a maximal number of variables. Suppose, without loss of generality, m = x i xik. For any substitution a k k+; : : :; a n of the variables x k+ ; : : :; x n the term m will stay alive in the projection g = fj xi a i;i=k+;:::;n because it is maximal in f. Since g has at most (k; k; d) roots the result () follows. To prove (2) let f 2 MUL F (k; k;?; d). Write f as polynomial in F[x 2 ; : : :; x d ][x ], f = f d x d + f d?x d? + + f 0 : Let t be the number of roots of f d. Since f d 2 MUL F (k? ; k? ;?; d) we have f? (a i ) 6= 0 and 0 otherwise. Then E f?e (8i)ai2 Di f0;g " _ i I[f? (a i ) 6= 0] l E f?e ai 0 2Di 0 f0;gn[i[f? (a i0 ) 6= 0]] = le ai 0 2Di f0;gne f?[i[f? (a i0 ) 6= 0]] 0 " = le ai? pd # 0 2Di 0 f0;gn 2 3 : This shows that there exists f? 6 0 such that running algorithm A for f? it will answer the wrong answer \YES" with probability more than 2=3. This is a contradiction.2 # : t (k? ; k? ; d): For k?? t assignments a for x 2 ; : : :; x d we have f d (a) 6= 0. For those assignments we get a polynomial in x of degree d that has at most d roots for x. For t assignments a for x 2 ; : : :; x k we have f d is zero and then the possible values of x (to get a root for f) is bounded by. This implies (k; k; d) d( k?? t) + t = d k? + (? d)t d k? + (? d)(k? ; k? ; d): Now the result follows by induction on k.2 Proof of Theorem 2 Lower Bound Let A be a randomized algorithm that zero tests f 2 MUL F (n; k;?; d). Algorithm A asks membership queries to f and if f 6 0 it returns with probability at least 2=3 the answer \NO". If all the membership queries in the algorithm returns 0 the algorithm returns the answer \YES" indicating that f 0. We run the algorithm for f 0. Let D ; : : :; D l, l = (dn) k be the distribution that the membership assignments a ; : : :; a l are chosen to zero test f. Notice that since all membership answers are 0 running the algorithm again for f 0 it will again choose membership according to the distributions D ; : : :; D l. Now randomly and uniformly choose i;j 2 F, i = ; : : :; p; j = ; : : :; d and dene f? = py dy i= j= (x i? i;j ) where p = 2k (ln n + ln d). Let d I[f(a i)] = if

Polynomial time Prediction Strategy with almost Optimal Mistake Probability

Polynomial time Prediction Strategy with almost Optimal Mistake Probability Nader H. Bshouty Department of Computer Science Technion, 32000 Haifa, Israel bshouty@cs.technion.ac.il Abstract We give the